Gemini 1.5 Pro の 2M コンテキストウィンドウ、コード実行機能、Gemma 2 を利用できるようになりました。詳細

このページは Cloud Translation API によって翻訳されました。

トークンを理解してカウントする

Gemini やその他の生成 AI モデルが入出力を粒度で処理するトークンと呼ばれます。

このガイドでは、特定のモデルのコンテキストウィンドウと、テキスト入力、チャット、マルチモーダルなどのユースケースでトークンをカウントするシステムの指示とツールが含まれます。

トークンについて

トークンには、z などの 1 文字または cat などの単語全体を使用できます。長い単語いくつかのトークンに分割されますモデルで使用されるすべてのトークンのセットは、があります。テキストをトークンに分割するプロセスを「語彙」と呼びます。 トークン化。

Gemini モデルの場合、トークンは約 4 文字に相当します。 100 トークンは約 60 ～ 80 単語に相当します。

課金を有効にすると、Gemini API 呼び出しの費用入力トークンと出力トークンの数によって決まるため、カウントトークンが有用です。

ai.google.dev で表示

Google Colab で実行

GitHub のソースを表示

コンテキストウィンドウ

Gemini API を通じて利用できるモデルには、コンテキストウィンドウがあり、トークン単位。コンテキストウィンドウでは、ユーザーが提供できる入力の量が定義される生成できる出力の量です広告ユニットのサイズは API を使用するか、モデルのドキュメントをご覧ください。

次の例では、gemini-1.0-pro-001 モデルに入力上限は約 30,000 トークン、出力上限は約 2,000 トークンであるため、約 32,000 トークンのコンテキストウィンドウを意味します。

model_info = genai.get_model("models/gemini-1.0-pro-001")

# Returns the "context window" for the model,
# which is the combined input and output token limits.
print(f"{model_info.input_token_limit=}")
print(f"{model_info.output_token_limit=}")
# ( input_token_limit=30720, output_token_limit=2048 )count_tokens.py

別の例として、次のようなモデルのトークン制限をリクエストしたとします。 gemini-1.5-flash-001 の場合、2M コンテキストウィンドウが表示されます。

トークンをカウントする

Gemini API からのすべての入力と出力はトークン化されます。これにはテキスト、画像、ファイル、その他の非テキストモダリティです。

トークンは次の方法でカウントできます。

テキストトークンをカウントする

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "The quick brown fox jumps over the lazy dog."

# Call `count_tokens` to get the input token count (`total_tokens`).
print("total_tokens: ", model.count_tokens(prompt))
# ( total_tokens: 10 )

response = model.generate_content(prompt)

# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 11, candidates_token_count: 73, total_token_count: 84 )count_tokens.py

マルチターン（チャット）トークンをカウントする

model = genai.GenerativeModel("models/gemini-1.5-flash")

chat = model.start_chat(
    history=[
        {"role": "user", "parts": "Hi my name is Bob"},
        {"role": "model", "parts": "Hi Bob!"},
    ]
)
# Call `count_tokens` to get the input token count (`total_tokens`).
print(model.count_tokens(chat.history))
# ( total_tokens: 10 )

response = chat.send_message(
    "In one sentence, explain how a computer works to a young child."
)

# On the response for `send_message`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 25, candidates_token_count: 21, total_token_count: 46 )

from google.generativeai.types.content_types import to_contents

# You can call `count_tokens` on the combined history and content of the next turn.
print(model.count_tokens(chat.history + to_contents("What is the meaning of life?")))
# ( total_tokens: 56 )count_tokens.py

マルチモーダルトークンをカウントする

Gemini API へのすべての入力は、テキストや画像ファイルなど、さまざまな非テキストモダリティです。トークン化の大まかな要点は次のとおりです。マルチモーダル入力の特徴を示しています。

画像は固定サイズとみなされるため、一定数の画像を最大 258 個のトークン（現在は 258 個のトークン）を
動画ファイルと音声ファイルは、次の固定レートでトークンに変換されます。 263 トークン/秒の動画と 32 トークン/秒の音声です。

画像ファイル

処理中に Gemini API は画像は固定サイズとみなされるため、使用できるトークン数に関係なく、一定数（現在は 258 トークン）のトークンを表示またはファイルサイズを設定できます

File API からアップロードされた画像を使用する例:

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "Tell me about this image"
your_image_file = genai.upload_file(path="image.jpg")

# Call `count_tokens` to get the input token count
# of the combined text and file (`total_tokens`).
# An image's display or file size does not affect its token count.
# Optionally, you can call `count_tokens` for the text and file separately.
print(model.count_tokens([prompt, your_image_file]))
# ( total_tokens: 263 )

response = model.generate_content([prompt, your_image_file])
response.text
# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 264, candidates_token_count: 80, total_token_count: 345 )count_tokens.py

画像をインラインデータとして提供する例:

import PIL.Image

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "Tell me about this image"
your_image_file = PIL.Image.open("image.jpg")

# Call `count_tokens` to get the input token count
# of the combined text and file (`total_tokens`).
# An image's display or file size does not affect its token count.
# Optionally, you can call `count_tokens` for the text and file separately.
print(model.count_tokens([prompt, your_image_file]))
# ( total_tokens: 263 )

response = model.generate_content([prompt, your_image_file])

# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 264, candidates_token_count: 80, total_token_count: 345 )count_tokens.py

動画または音声ファイル

音声と動画はそれぞれ、次の固定レートでトークンに変換されます。

動画: 263 トークン/秒
オーディオ: 1 秒あたり 32 トークン

import time

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "Tell me about this video"
your_file = genai.upload_file(path=media / "Big_Buck_Bunny.mp4")

# Videos need to be processed before you can use them.
while your_file.state.name == "PROCESSING":
    print("processing video...")
    time.sleep(5)
    your_file = genai.get_file(your_file.name)

# Call `count_tokens` to get the input token count
# of the combined text and video/audio file (`total_tokens`).
# A video or audio file is converted to tokens at a fixed rate of tokens per second.
# Optionally, you can call `count_tokens` for the text and file separately.
print(model.count_tokens([prompt, your_file]))
# ( total_tokens: 300 )

response = model.generate_content([prompt, your_file])

# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 301, candidates_token_count: 60, total_token_count: 361 )
count_tokens.py

システム指示とツール

システムの指示やツールも、サービスのトークン総数に表示されます。

システム指示を使用すると、それを反映して total_tokens カウントが増加します。 system_instruction の追加。

model = genai.GenerativeModel(model_name="gemini-1.5-flash")

prompt = "The quick brown fox jumps over the lazy dog."

print(model.count_tokens(prompt))
# total_tokens: 10

model = genai.GenerativeModel(
    model_name="gemini-1.5-flash", system_instruction="You are a cat. Your name is Neko."
)

# The total token count includes everything sent to the `generate_content` request.
# When you use system instructions, the total token count increases.
print(model.count_tokens(prompt))
# ( total_tokens: 21 )count_tokens.py

関数呼び出しを使用すると、total_tokens の数は tools の追加。

model = genai.GenerativeModel(model_name="gemini-1.5-flash")

prompt = "I have 57 cats, each owns 44 mittens, how many mittens is that in total?"

print(model.count_tokens(prompt))
# ( total_tokens: 22 )

def add(a: float, b: float):
    """returns a + b."""
    return a + b

def subtract(a: float, b: float):
    """returns a - b."""
    return a - b

def multiply(a: float, b: float):
    """returns a * b."""
    return a * b

def divide(a: float, b: float):
    """returns a / b."""
    return a / b

model = genai.GenerativeModel(
    "models/gemini-1.5-flash-001", tools=[add, subtract, multiply, divide]
)

# The total token count includes everything sent to the `generate_content` request.
# When you use tools (like function calling), the total token count increases.
print(model.count_tokens(prompt))
# ( total_tokens: 206 )count_tokens.py