הסבר על אסימונים וספירה שלהם

Gemini ומודלים אחרים של AI גנרטיבי מעבדים את הקלט והפלט ברמת פירוט שנקרא אסימון.

במדריך הזה נסביר איך לקבל חלונות הקשר של מודלים ספציפיים, וגם הנחיות ספירת אסימונים לתרחישים לדוגמה כמו קלט טקסט, צ'אט, ריבוי מצבים הקלט, ההוראות והכלים של המערכת.

מידע על אסימונים

האסימונים יכולים להיות תווים בודדים כמו z או מילים שלמות כמו cat. מילים ארוכות מפורקות לכמה אסימונים. קבוצת כל האסימונים שבהם המודל משתמש שנקרא אוצר המילים, והתהליך של פיצול טקסט לאסימונים נקרא יצירת אסימון (טוקניזציה).

במודלים של Gemini, האסימון מקביל ל-4 תווים בערך. 100 אסימונים שווה בערך ל-60-80 מילים באנגלית.

כשהחיוב מופעל, עלות הקריאה ל-Gemini API היא נקבע בחלקו לפי מספר אסימוני הקלט והפלט, כך שידענו איך אסימוני ספירה יכולים לעזור.

להצגה ב-ai.google.dev

הפעלה ב-Google Colab

הצגת המקור ב-GitHub

חלונות הקשר

המודלים שזמינים דרך Gemini API כוללים חלונות הקשר נמדדים באסימונים. חלון ההקשר מגדיר כמה קלט אפשר לספק וכמה פלט המודל יכול ליצור. אפשר לקבוע את הגודל של בחלון ההקשר באמצעות ה-API או על ידי במסמכי התיעוד של מודלים.

בדוגמה הבאה אפשר לראות שלמודל gemini-1.0-pro-001 יש מגבלת קלט של כ-30,000 אסימונים ומגבלת פלט של כ-2,000 אסימונים, הוא חלון הקשר של כ-32,000 אסימונים.

model_info = genai.get_model("models/gemini-1.0-pro-001")

# Returns the "context window" for the model,
# which is the combined input and output token limits.
print(f"{model_info.input_token_limit=}")
print(f"{model_info.output_token_limit=}")
# ( input_token_limit=30720, output_token_limit=2048 )count_tokens.py

דוגמה נוספת, אם ביקשתם במקום זאת את מגבלות האסימון עבור מודל כמו gemini-1.5-flash-001, ניתן לראות שיש לו חלון הקשר של 2 מיליון.

ספירת אסימונים

כל הקלט והפלט מ-Gemini API עוברים הצפנה באמצעות אסימון, כולל טקסט ותמונה קבצים וקבצים אחרים שאינם טקסט.

אפשר לספור אסימונים בדרכים הבאות:

ספירת אסימוני טקסט

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "The quick brown fox jumps over the lazy dog."

# Call `count_tokens` to get the input token count (`total_tokens`).
print("total_tokens: ", model.count_tokens(prompt))
# ( total_tokens: 10 )

response = model.generate_content(prompt)

# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 11, candidates_token_count: 73, total_token_count: 84 )count_tokens.py

ספירת אסימונים עם מספר פניות (צ'אט)

model = genai.GenerativeModel("models/gemini-1.5-flash")

chat = model.start_chat(
    history=[
        {"role": "user", "parts": "Hi my name is Bob"},
        {"role": "model", "parts": "Hi Bob!"},
    ]
)
# Call `count_tokens` to get the input token count (`total_tokens`).
print(model.count_tokens(chat.history))
# ( total_tokens: 10 )

response = chat.send_message(
    "In one sentence, explain how a computer works to a young child."
)

# On the response for `send_message`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 25, candidates_token_count: 21, total_token_count: 46 )

from google.generativeai.types.content_types import to_contents

# You can call `count_tokens` on the combined history and content of the next turn.
print(model.count_tokens(chat.history + to_contents("What is the meaning of life?")))
# ( total_tokens: 56 )count_tokens.py

ספירת אסימונים מרובי מצבים

כל הקלט ב-Gemini API עובר הצפנה באמצעות אסימון, כולל טקסט, קובצי תמונה ועוד ללא טקסט. חשוב לשים לב לנקודות העיקריות הכלליות הבאות בנושא יצירת אסימונים של קלט רב-אופני במהלך העיבוד באמצעות Gemini API:

תמונות נחשבות לגודל קבוע, ולכן הן צורכות מספר קבוע של אסימונים (כרגע 258 אסימונים), ללא קשר לגודל התצוגה או הקובץ שלהם.
קובצי וידאו ואודיו מומרים לאסימונים בקצב קבוע הבא: וידאו במהירות של 263 אסימונים לשנייה, ואודיו במהירות של 32 אסימונים לשנייה.

קובצי תמונות

במהלך העיבוד, Gemini API מתייחס לתמונות כגודל קבוע, לכן לצרוך מספר קבוע של אסימונים (כרגע 258 אסימונים), בלי קשר גודל הקובץ או התצוגה.

דוגמה לשימוש בתמונה שהועלתה מ-File API:

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "Tell me about this image"
your_image_file = genai.upload_file(path="image.jpg")

# Call `count_tokens` to get the input token count
# of the combined text and file (`total_tokens`).
# An image's display or file size does not affect its token count.
# Optionally, you can call `count_tokens` for the text and file separately.
print(model.count_tokens([prompt, your_image_file]))
# ( total_tokens: 263 )

response = model.generate_content([prompt, your_image_file])
response.text
# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 264, candidates_token_count: 80, total_token_count: 345 )count_tokens.py

דוגמה שמציגה את התמונה כנתונים מוטבעים:

import PIL.Image

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "Tell me about this image"
your_image_file = PIL.Image.open("image.jpg")

# Call `count_tokens` to get the input token count
# of the combined text and file (`total_tokens`).
# An image's display or file size does not affect its token count.
# Optionally, you can call `count_tokens` for the text and file separately.
print(model.count_tokens([prompt, your_image_file]))
# ( total_tokens: 263 )

response = model.generate_content([prompt, your_image_file])

# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 264, candidates_token_count: 80, total_token_count: 345 )count_tokens.py

קובצי וידאו או אודיו

האודיו והווידאו מומרים לאסימונים בשיעורים הקבועים הבאים:

סרטון: 263 אסימונים לשנייה
אודיו: 32 אסימונים לשנייה

import time

model = genai.GenerativeModel("models/gemini-1.5-flash")

prompt = "Tell me about this video"
your_file = genai.upload_file(path=media / "Big_Buck_Bunny.mp4")

# Videos need to be processed before you can use them.
while your_file.state.name == "PROCESSING":
    print("processing video...")
    time.sleep(5)
    your_file = genai.get_file(your_file.name)

# Call `count_tokens` to get the input token count
# of the combined text and video/audio file (`total_tokens`).
# A video or audio file is converted to tokens at a fixed rate of tokens per second.
# Optionally, you can call `count_tokens` for the text and file separately.
print(model.count_tokens([prompt, your_file]))
# ( total_tokens: 300 )

response = model.generate_content([prompt, your_file])

# On the response for `generate_content`, use `usage_metadata`
# to get separate input and output token counts
# (`prompt_token_count` and `candidates_token_count`, respectively),
# as well as the combined token count (`total_token_count`).
print(response.usage_metadata)
# ( prompt_token_count: 301, candidates_token_count: 60, total_token_count: 361 )
count_tokens.py

הוראות וכלים של המערכת

גם ההוראות והכלים של המערכת נספרים בספירת האסימונים הכוללת מהקלט.

אם משתמשים בהוראות המערכת, מספר total_tokens גדל כדי לשקף התוספת של system_instruction.

model = genai.GenerativeModel(model_name="gemini-1.5-flash")

prompt = "The quick brown fox jumps over the lazy dog."

print(model.count_tokens(prompt))
# total_tokens: 10

model = genai.GenerativeModel(
    model_name="gemini-1.5-flash", system_instruction="You are a cat. Your name is Neko."
)

# The total token count includes everything sent to the `generate_content` request.
# When you use system instructions, the total token count increases.
print(model.count_tokens(prompt))
# ( total_tokens: 21 )count_tokens.py

אם משתמשים בקריאות לפונקציה, המספר של total_tokens גדל כדי לשקף את תוספת של tools.

model = genai.GenerativeModel(model_name="gemini-1.5-flash")

prompt = "I have 57 cats, each owns 44 mittens, how many mittens is that in total?"

print(model.count_tokens(prompt))
# ( total_tokens: 22 )

def add(a: float, b: float):
    """returns a + b."""
    return a + b

def subtract(a: float, b: float):
    """returns a - b."""
    return a - b

def multiply(a: float, b: float):
    """returns a * b."""
    return a * b

def divide(a: float, b: float):
    """returns a / b."""
    return a / b

model = genai.GenerativeModel(
    "models/gemini-1.5-flash-001", tools=[add, subtract, multiply, divide]
)

# The total token count includes everything sent to the `generate_content` request.
# When you use tools (like function calling), the total token count increases.
print(model.count_tokens(prompt))
# ( total_tokens: 206 )count_tokens.py