토큰 이해 및 계산

Gemini 및 기타 생성형 AI 모델은 세밀한 입력과 출력을 처리합니다. 토큰이라고 합니다.

이 가이드에서는 특정 모델의 컨텍스트 윈도우와 텍스트 입력, 채팅, 멀티모달과 같은 사용 사례를 위한 토큰 계산 입력, 시스템 안내 및 도구가 포함됩니다.

토큰 정보

토큰은 단일 문자(예: z) 또는 전체 단어(예: cat)일 수 있습니다. 긴 단어 여러 개의 토큰으로 나뉩니다. 모델이 사용하는 모든 토큰의 집합은 텍스트를 토큰으로 분할하는 과정을 토큰화

Gemini 모델의 경우 토큰은 약 4자(영문 기준)에 해당합니다. 100 토큰은 약 60~80개의 영어 단어와 같습니다.

결제가 사용 설정된 경우 Gemini API 호출 비용은 다음과 같습니다. 부분적으로 입력 및 출력 토큰의 수에 의해 결정되므로, 도움이 될 수 있습니다

ai.google.dev에서 보기

Google Colab에서 실행

GitHub에서 소스 보기

컨텍스트 윈도우

Gemini API를 통해 사용할 수 있는 모델에는 토큰으로 측정됩니다 컨텍스트 윈도우는 개발자가 제공할 수 있는 입력의 양을 정의합니다. 모델이 생성할 수 있는 출력의 양과 같습니다. 포드의 크기를 API를 사용하거나 models 문서를 참조하세요.

다음 예에서는 gemini-1.0-pro-001 모델에 약 30,000개 토큰의 입력 제한과 약 2,000개 토큰의 출력 제한이 있습니다. 약 32,000개 토큰이 포함된 컨텍스트 윈도우를 의미합니다

model_info = genai.get_model("models/gemini-1.0-pro-001")

# Returns the "context window" for the model,
# which is the combined input and output token limits.
print(f"{model_info.input_token_limit=}")
print(f"{model_info.output_token_limit=}")
# ( input_token_limit=30720, output_token_limit=2048 )count_tokens.py

또 다른 예로, gemini-1.5-flash-001에 200만 개의 컨텍스트 윈도우가 있음을 알 수 있습니다.

토큰 수 계산

Gemini API의 모든 입력과 출력은 텍스트, 이미지를 포함하여 토큰화됩니다. 파일, 기타 비텍스트 모달리티 등입니다.

다음과 같은 방법으로 토큰을 계산할 수 있습니다.

텍스트 토큰 계산

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "The quick brown fox jumps over the lazy dog."

# Call `count_tokens` to get the input token count (`total_tokens`).
print("total_tokens: ", model.count_tokens(prompt))
# ( total_tokens: 10 )

response = model.generate_content(prompt)

# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 11, candidates_token_count: 73, total_token_count: 84 )count_tokens.py

멀티턴 (채팅) 토큰 계산

model = genai.GenerativeModel("models/gemini-1.5-flash")

chat = model.start_chat(
    history=[
        {"role": "user", "parts": "Hi my name is Bob"},
        {"role": "model", "parts": "Hi Bob!"},
    ]
)
# Call `count_tokens` to get the input token count (`total_tokens`).
print(model.count_tokens(chat.history))
# ( total_tokens: 10 )

response = chat.send_message(
    "In one sentence, explain how a computer works to a young child."
)

# On the response for `send_message`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 25, candidates_token_count: 21, total_token_count: 46 )

from google.generativeai.types.content_types import to_contents

# You can call `count_tokens` on the combined history and content of the next turn.
print(model.count_tokens(chat.history + to_contents("What is the meaning of life?")))
# ( total_tokens: 56 )count_tokens.py

멀티모달 토큰 수 계산

텍스트, 이미지 파일, 기타 정보 등 Gemini API에 대한 모든 입력은 토큰화됩니다. 있습니다. 토큰화에 대한 다음과 같은 개략적인 핵심 사항을 참고하세요. 다음과 같은 멀티모달 입력을 처리합니다.

이미지는 고정된 크기로 간주되므로 토큰 (현재 258개 토큰)을 제공합니다.
동영상 및 오디오 파일은 다음과 같은 고정 요율에 따라 토큰으로 변환됩니다. 오디오는 초당 263개 토큰, 초당 32개 토큰으로 작동합니다.

이미지 파일

처리 중에 Gemini API는 이미지를 고정된 크기로 간주하므로 기존 토큰과 관계없이 고정된 수의 토큰 (현재 258개의 토큰)을 지정할 수 있습니다.

다음은 File API에서 업로드된 이미지를 사용하는 예입니다.

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "Tell me about this image"
your_image_file = genai.upload_file(path="image.jpg")

# Call `count_tokens` to get the input token count
# of the combined text and file (`total_tokens`).
# An image's display or file size does not affect its token count.
# Optionally, you can call `count_tokens` for the text and file separately.
print(model.count_tokens([prompt, your_image_file]))
# ( total_tokens: 263 )

response = model.generate_content([prompt, your_image_file])
response.text
# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 264, candidates_token_count: 80, total_token_count: 345 )count_tokens.py

이미지를 인라인 데이터로 제공하는 예는 다음과 같습니다.

import PIL.Image

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "Tell me about this image"
your_image_file = PIL.Image.open("image.jpg")

# Call `count_tokens` to get the input token count
# of the combined text and file (`total_tokens`).
# An image's display or file size does not affect its token count.
# Optionally, you can call `count_tokens` for the text and file separately.
print(model.count_tokens([prompt, your_image_file]))
# ( total_tokens: 263 )

response = model.generate_content([prompt, your_image_file])

# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 264, candidates_token_count: 80, total_token_count: 345 )count_tokens.py

동영상 또는 오디오 파일

오디오와 동영상은 다음과 같은 고정 요율에 따라 각각 토큰으로 변환됩니다.

동영상: 초당 토큰 263개
오디오: 초당 토큰 32개

import time

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "Tell me about this video"
your_file = genai.upload_file(path=media / "Big_Buck_Bunny.mp4")

# Videos need to be processed before you can use them.
while your_file.state.name == "PROCESSING":
    print("processing video...")
    time.sleep(5)
    your_file = genai.get_file(your_file.name)

# Call `count_tokens` to get the input token count
# of the combined text and video/audio file (`total_tokens`).
# A video or audio file is converted to tokens at a fixed rate of tokens per second.
# Optionally, you can call `count_tokens` for the text and file separately.
print(model.count_tokens([prompt, your_file]))
# ( total_tokens: 300 )

response = model.generate_content([prompt, your_file])

# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 301, candidates_token_count: 60, total_token_count: 361 )
count_tokens.py

시스템 안내 및 도구

시스템 안내 및 도구는 있습니다.

시스템 안내를 사용하면 total_tokens 수가 증가하여 system_instruction가 추가되었습니다.

model = genai.GenerativeModel(model_name="gemini-1.5-flash")

prompt = "The quick brown fox jumps over the lazy dog."

print(model.count_tokens(prompt))
# total_tokens: 10

model = genai.GenerativeModel(
    model_name="gemini-1.5-flash", system_instruction="You are a cat. Your name is Neko."
)

# The total token count includes everything sent to the `generate_content` request.
# When you use system instructions, the total token count increases.
print(model.count_tokens(prompt))
# ( total_tokens: 21 )count_tokens.py

함수 호출을 사용하면 total_tokens 수가 증가하여 tools가 추가되었습니다.

model = genai.GenerativeModel(model_name="gemini-1.5-flash")

prompt = "I have 57 cats, each owns 44 mittens, how many mittens is that in total?"

print(model.count_tokens(prompt))
# ( total_tokens: 22 )

def add(a: float, b: float):
    """returns a + b."""
    return a + b

def subtract(a: float, b: float):
    """returns a - b."""
    return a - b

def multiply(a: float, b: float):
    """returns a * b."""
    return a * b

def divide(a: float, b: float):
    """returns a / b."""
    return a / b

model = genai.GenerativeModel(
    "models/gemini-1.5-flash-001", tools=[add, subtract, multiply, divide]
)

# The total token count includes everything sent to the `generate_content` request.
# When you use tools (like function calling), the total token count increases.
print(model.count_tokens(prompt))
# ( total_tokens: 206 )count_tokens.py