了解词元并计算词元数量

<ph type="x-smartling-placeholder"></ph>


Gemini 和其他生成式 AI 模型可精细地处理输入和输出 称为“令牌”

本指南将介绍如何获取 特定模型的上下文窗口以及如何 计算词元数量,适用于文本输入、聊天、多模态等应用场景 输入、系统说明和工具。

关于令牌

词元可以是单个字符(如 z),也可以是整个字词(如 cat)。长字词 被分解为多个词元。模型使用的所有词元的集合是 这个过程称为词汇表,将文本拆分为词元的过程称为 词元化

对于 Gemini 模型,一个词元大约相当于 4 个字符。 100 个词元大约相当于 60-80 个英语单词。

启用结算功能后,Gemini API 的通话费用为 部分取决于输入和输出词元的数量, 会很有帮助。

前往 ai.google.dev 查看 在 Google Colab 中运行 在 GitHub 上查看源代码

上下文窗口

Gemini API 提供的模型具有上下文窗口, 以词元为单位。上下文窗口定义了您可以提供多少输入 以及模型可以生成多少输出。您可以确定 上下文窗口使用 API,或查看 models 文档。

在以下示例中,您可以看到 gemini-1.0-pro-001 模型具有 输入上限为约 3 万个词元,输出上限为约 2, 000 个词元, 表示上下文窗口约为 32,000 个词元。

model_info = genai.get_model("models/gemini-1.0-pro-001")

# Returns the "context window" for the model,
# which is the combined input and output token limits.
print(f"{model_info.input_token_limit=}")
print(f"{model_info.output_token_limit=}")
# ( input_token_limit=30720, output_token_limit=2048 )

再举一个例子,如果您改为请求 gemini-1.5-flash-001 时,会看到它具有 200 万个词元的上下文窗口。

计算词元数量

Gemini API 的所有输入和输出都经过词元化处理,包括文本、图片 文件和其他非文本模式。

您可以通过以下方式统计词元数量:

计算文本词元数量

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "The quick brown fox jumps over the lazy dog."

# Call `count_tokens` to get the input token count (`total_tokens`).
print("total_tokens: ", model.count_tokens(prompt))
# ( total_tokens: 10 )

response = model.generate_content(prompt)

# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 11, candidates_token_count: 73, total_token_count: 84 )

统计多轮(聊天)令牌

model = genai.GenerativeModel("models/gemini-1.5-flash")

chat = model.start_chat(
    history=[
        {"role": "user", "parts": "Hi my name is Bob"},
        {"role": "model", "parts": "Hi Bob!"},
    ]
)
# Call `count_tokens` to get the input token count (`total_tokens`).
print(model.count_tokens(chat.history))
# ( total_tokens: 10 )

response = chat.send_message(
    "In one sentence, explain how a computer works to a young child."
)

# On the response for `send_message`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 25, candidates_token_count: 21, total_token_count: 46 )

from google.generativeai.types.content_types import to_contents

# You can call `count_tokens` on the combined history and content of the next turn.
print(model.count_tokens(chat.history + to_contents("What is the meaning of life?")))
# ( total_tokens: 56 )

统计多模态令牌

Gemini API 的所有输入都经过词元化处理,包括文本、图片文件和其他 非文本模态。请注意以下有关令牌化的简要要点 处理多模态输入的过程:

  • 图片被视为固定尺寸,因此它们消耗的 令牌(目前为 258 个词元),无论其显示大小或文件大小如何。

  • 视频和音频文件会按以下固定费率转换为令牌: 视频(每秒 263 个词元)和音频(32 个词元)。

图片文件

在处理过程中,Gemini API 会将图片视为固定大小,因此它们 消耗的令牌数量是固定的(目前是 258 个), 显示大小或文件大小

使用通过 File API 上传的图片的示例:

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "Tell me about this image"
your_image_file = genai.upload_file(path="image.jpg")

# Call `count_tokens` to get the input token count
# of the combined text and file (`total_tokens`).
# An image's display or file size does not affect its token count.
# Optionally, you can call `count_tokens` for the text and file separately.
print(model.count_tokens([prompt, your_image_file]))
# ( total_tokens: 263 )

response = model.generate_content([prompt, your_image_file])
response.text
# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 264, candidates_token_count: 80, total_token_count: 345 )

以内嵌数据形式提供图片的示例:

import PIL.Image

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "Tell me about this image"
your_image_file = PIL.Image.open("image.jpg")

# Call `count_tokens` to get the input token count
# of the combined text and file (`total_tokens`).
# An image's display or file size does not affect its token count.
# Optionally, you can call `count_tokens` for the text and file separately.
print(model.count_tokens([prompt, your_image_file]))
# ( total_tokens: 263 )

response = model.generate_content([prompt, your_image_file])

# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 264, candidates_token_count: 80, total_token_count: 345 )

视频或音频文件

音频和视频会各自按照以下固定速率转换为令牌:

  • 视频:每秒 263 个词元
  • 音频:每秒 32 个词元
import time

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "Tell me about this video"
your_file = genai.upload_file(path=media / "Big_Buck_Bunny.mp4")

# Videos need to be processed before you can use them.
while your_file.state.name == "PROCESSING":
    print("processing video...")
    time.sleep(5)
    your_file = genai.get_file(your_file.name)

# Call `count_tokens` to get the input token count
# of the combined text and video/audio file (`total_tokens`).
# A video or audio file is converted to tokens at a fixed rate of tokens per second.
# Optionally, you can call `count_tokens` for the text and file separately.
print(model.count_tokens([prompt, your_file]))
# ( total_tokens: 300 )

response = model.generate_content([prompt, your_file])

# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 301, candidates_token_count: 60, total_token_count: 361 )

系统说明和工具

系统指令和工具也会计入 输入。

如果您使用系统指令,total_tokens 计数会增加以反映 system_instruction 的添加。

model = genai.GenerativeModel(model_name="gemini-1.5-flash")

prompt = "The quick brown fox jumps over the lazy dog."

print(model.count_tokens(prompt))
# total_tokens: 10

model = genai.GenerativeModel(
    model_name="gemini-1.5-flash", system_instruction="You are a cat. Your name is Neko."
)

# The total token count includes everything sent to the `generate_content` request.
# When you use system instructions, the total token count increases.
print(model.count_tokens(prompt))
# ( total_tokens: 21 )

如果您使用函数调用,total_tokens 计数会增加以反映 添加了 tools

model = genai.GenerativeModel(model_name="gemini-1.5-flash")

prompt = "I have 57 cats, each owns 44 mittens, how many mittens is that in total?"

print(model.count_tokens(prompt))
# ( total_tokens: 22 )

def add(a: float, b: float):
    """returns a + b."""
    return a + b

def subtract(a: float, b: float):
    """returns a - b."""
    return a - b

def multiply(a: float, b: float):
    """returns a * b."""
    return a * b

def divide(a: float, b: float):
    """returns a / b."""
    return a / b

model = genai.GenerativeModel(
    "models/gemini-1.5-flash-001", tools=[add, subtract, multiply, divide]
)

# The total token count includes everything sent to the `generate_content` request.
# When you use tools (like function calling), the total token count increases.
print(model.count_tokens(prompt))
# ( total_tokens: 206 )