Instructivo: Comienza a usar la API de Gemini


Ver en la IA de Google Ejecutar en Google Colab Ver el código fuente en GitHub

En esta guía de inicio rápido, se muestra cómo usar el SDK de Python para la API de Gemini, que te da acceso a los modelos grandes de lenguaje de Gemini de Google. En esta guía de inicio rápido, Aprenderás a hacer lo siguiente:

  1. Configurar tu entorno de desarrollo y el acceso a la API para usar Gemini
  2. Generar respuestas de texto a partir de entradas de texto
  3. Genera respuestas de texto a partir de entradas multimodales (imágenes y texto).
  4. Usar Gemini para conversaciones de varios turnos (chat)
  5. Usa incorporaciones para modelos de lenguaje grandes.

Requisitos previos

Puedes ejecutar esta guía de inicio rápido en Google Colab, que ejecuta este notebook directamente en el navegador y no requiere configuración del entorno.

Como alternativa, para completar esta guía de inicio rápido localmente, asegúrate de que tus recursos entorno de desarrollo web cumple con los siguientes requisitos:

  • Python 3.9 y versiones posteriores
  • Una instalación de jupyter para ejecutar el notebook

Configuración

Instala el SDK de Python

El SDK de Python para la API de Gemini está incluido en la google-generativeai. Instala la dependencia con pip:

pip install -q -U google-generativeai

Importa paquetes

Importa los paquetes necesarios.

import pathlib
import textwrap

import google.generativeai as genai

from IPython.display import display
from IPython.display import Markdown

def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))
# Used to securely store your API key
from google.colab import userdata

Cómo configurar tu clave de API

Para poder usar la API de Gemini, primero debes obtener una clave de API. Si Si aún no tienes una, crea una clave con un clic en Google AI Studio.

Obtén una clave de API.

En Colab, agrega la clave al administrador de Secrets en la "Notebook" que aparece a continuación. en el panel izquierdo. Asígnale el nombre GOOGLE_API_KEY.

Una vez que tengas la clave de API, pásala al SDK. Puedes hacerlo de dos maneras:

  • Coloca la clave en la variable de entorno GOOGLE_API_KEY (el SDK recogerlos automáticamente a partir de allí).
  • Pasa la clave a genai.configure(api_key=...)
# Or use `os.getenv('GOOGLE_API_KEY')` to fetch an environment variable.
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

genai.configure(api_key=GOOGLE_API_KEY)

Enumera modelos

Ya está todo listo para que llames a la API de Gemini. Usa list_models para ver los recursos disponibles Modelos de Gemini:

  • gemini-1.5-flash: Nuestro modelo multimodal más rápido
  • gemini-1.5-pro: nuestro modelo multimodal más inteligente y capaz
for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

Genera texto a partir de entradas de texto

Para instrucciones de solo texto, usa un modelo de Gemini 1.5 o el modelo de Gemini 1.0 Pro:

model = genai.GenerativeModel('gemini-1.5-flash')

El método generate_content puede controlar una gran variedad de casos de uso, como chat de varios turnos y entrada multimodal, según el modelo subyacente admite. Los modelos disponibles solo admiten imágenes y texto como entrada, y texto como salida.

En el caso más simple, puedes pasar una cadena de instrucción al GenerativeModel.generate_content método:

%%time
response = model.generate_content("What is the meaning of life?")
CPU times: user 110 ms, sys: 12.3 ms, total: 123 ms
Wall time: 8.25 s

En casos simples, solo necesitas el descriptor de acceso response.text. Para mostrar texto de Markdown con formato, usa la función to_markdown:

to_markdown(response.text)
The query of life's purpose has perplexed people across centuries, cultures, and continents. While there is no universally recognized response, many ideas have been put forth, and the response is frequently dependent on individual ideas, beliefs, and life experiences.

1.  **Happiness and Well-being:** Many individuals believe that the goal of life is to attain personal happiness and well-being. This might entail locating pursuits that provide joy, establishing significant connections, caring for one's physical and mental health, and pursuing personal goals and interests.

2.  **Meaningful Contribution:** Some believe that the purpose of life is to make a meaningful contribution to the world. This might entail pursuing a profession that benefits others, engaging in volunteer or charitable activities, generating art or literature, or inventing.

3.  **Self-realization and Personal Growth:** The pursuit of self-realization and personal development is another common goal in life. This might entail learning new skills, pushing one's boundaries, confronting personal obstacles, and evolving as a person.

4.  **Ethical and Moral Behavior:** Some believe that the goal of life is to act ethically and morally. This might entail adhering to one's moral principles, doing the right thing even when it is difficult, and attempting to make the world a better place.

5.  **Spiritual Fulfillment:** For some, the purpose of life is connected to spiritual or religious beliefs. This might entail seeking a connection with a higher power, practicing religious rituals, or following spiritual teachings.

6.  **Experiencing Life to the Fullest:** Some individuals believe that the goal of life is to experience all that it has to offer. This might entail traveling, trying new things, taking risks, and embracing new encounters.

7.  **Legacy and Impact:** Others believe that the purpose of life is to leave a lasting legacy and impact on the world. This might entail accomplishing something noteworthy, being remembered for one's contributions, or inspiring and motivating others.

8.  **Finding Balance and Harmony:** For some, the purpose of life is to find balance and harmony in all aspects of their lives. This might entail juggling personal, professional, and social obligations, seeking inner peace and contentment, and living a life that is in accordance with one's values and beliefs.

Ultimately, the meaning of life is a personal journey, and different individuals may discover their own unique purpose through their experiences, reflections, and interactions with the world around them.

Si la API no pudo devolver un resultado, usa GenerateContentResponse.prompt_feedback para ver si se bloqueó por cuestiones de seguridad relacionadas con el mensaje.

response.prompt_feedback
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}

Gemini puede generar múltiples respuestas posibles para una sola instrucción. Estos respuestas posibles se denominan candidates, y puedes revisarlas para seleccionar la más adecuada como respuesta.

Consulta los candidatos de respuesta con GenerateContentResponse.candidates

response.candidates
[
  content {
    parts {
      text: "The query of life\'s purpose has perplexed people across centuries, cultures, and continents. While there is no universally recognized response, many ideas have been put forth, and the response is frequently dependent on individual ideas, beliefs, and life experiences.\n\n1. **Happiness and Well-being:** Many individuals believe that the goal of life is to attain personal happiness and well-being. This might entail locating pursuits that provide joy, establishing significant connections, caring for one\'s physical and mental health, and pursuing personal goals and interests.\n\n2. **Meaningful Contribution:** Some believe that the purpose of life is to make a meaningful contribution to the world. This might entail pursuing a profession that benefits others, engaging in volunteer or charitable activities, generating art or literature, or inventing.\n\n3. **Self-realization and Personal Growth:** The pursuit of self-realization and personal development is another common goal in life. This might entail learning new skills, pushing one\'s boundaries, confronting personal obstacles, and evolving as a person.\n\n4. **Ethical and Moral Behavior:** Some believe that the goal of life is to act ethically and morally. This might entail adhering to one\'s moral principles, doing the right thing even when it is difficult, and attempting to make the world a better place.\n\n5. **Spiritual Fulfillment:** For some, the purpose of life is connected to spiritual or religious beliefs. This might entail seeking a connection with a higher power, practicing religious rituals, or following spiritual teachings.\n\n6. **Experiencing Life to the Fullest:** Some individuals believe that the goal of life is to experience all that it has to offer. This might entail traveling, trying new things, taking risks, and embracing new encounters.\n\n7. **Legacy and Impact:** Others believe that the purpose of life is to leave a lasting legacy and impact on the world. This might entail accomplishing something noteworthy, being remembered for one\'s contributions, or inspiring and motivating others.\n\n8. **Finding Balance and Harmony:** For some, the purpose of life is to find balance and harmony in all aspects of their lives. This might entail juggling personal, professional, and social obligations, seeking inner peace and contentment, and living a life that is in accordance with one\'s values and beliefs.\n\nUltimately, the meaning of life is a personal journey, and different individuals may discover their own unique purpose through their experiences, reflections, and interactions with the world around them."
    }
    role: "model"
  }
  finish_reason: STOP
  index: 0
  safety_ratings {
    category: HARM_CATEGORY_SEXUALLY_EXPLICIT
    probability: NEGLIGIBLE
  }
  safety_ratings {
    category: HARM_CATEGORY_HATE_SPEECH
    probability: NEGLIGIBLE
  }
  safety_ratings {
    category: HARM_CATEGORY_HARASSMENT
    probability: NEGLIGIBLE
  }
  safety_ratings {
    category: HARM_CATEGORY_DANGEROUS_CONTENT
    probability: NEGLIGIBLE
  }
]

De forma predeterminada, el modelo devuelve una respuesta después de completar toda la generación. el proceso de administración de recursos. También puedes transmitir la respuesta a medida que se genera, y la el modelo devolverá fragmentos de la respuesta en cuanto se generen.

Para transmitir respuestas, usa GenerativeModel.generate_content(..., stream=True).

%%time
response = model.generate_content("What is the meaning of life?", stream=True)
CPU times: user 102 ms, sys: 25.1 ms, total: 128 ms
Wall time: 7.94 s
for chunk in response:
  print(chunk.text)
  print("_"*80)
The query of life's purpose has perplexed people across centuries, cultures, and
________________________________________________________________________________
 continents. While there is no universally recognized response, many ideas have been put forth, and the response is frequently dependent on individual ideas, beliefs, and life experiences
________________________________________________________________________________
.

1.  **Happiness and Well-being:** Many individuals believe that the goal of life is to attain personal happiness and well-being. This might entail locating pursuits that provide joy, establishing significant connections, caring for one's physical and mental health, and pursuing personal goals and aspirations.

2.  **Meaning
________________________________________________________________________________
ful Contribution:** Some believe that the purpose of life is to make a meaningful contribution to the world. This might entail pursuing a profession that benefits others, engaging in volunteer or charitable activities, generating art or literature, or inventing.

3.  **Self-realization and Personal Growth:** The pursuit of self-realization and personal development is another common goal in life. This might entail learning new skills, exploring one's interests and abilities, overcoming obstacles, and becoming the best version of oneself.

4.  **Connection and Relationships:** For many individuals, the purpose of life is found in their relationships with others. This might entail building
________________________________________________________________________________
 strong bonds with family and friends, fostering a sense of community, and contributing to the well-being of those around them.

5.  **Spiritual Fulfillment:** For those with religious or spiritual beliefs, the purpose of life may be centered on seeking spiritual fulfillment or enlightenment. This might entail following religious teachings, engaging in spiritual practices, or seeking a deeper understanding of the divine.

6.  **Experiencing the Journey:** Some believe that the purpose of life is simply to experience the journey itself, with all its joys and sorrows. This perspective emphasizes embracing the present moment, appreciating life's experiences, and finding meaning in the act of living itself.

7.  **Legacy and Impact:** For others, the goal of life is to leave a lasting legacy or impact on the world. This might entail making a significant contribution to a particular field, leaving a positive mark on future generations, or creating something that will be remembered and cherished long after one's lifetime.

Ultimately, the meaning of life is a personal and subjective question, and there is no single, universally accepted answer. It is about discovering what brings you fulfillment, purpose, and meaning in your own life, and living in accordance with those values.
________________________________________________________________________________

Cuando transmites, algunos atributos de respuesta no están disponibles hasta que hayas iterado en todos los bloques de respuestas. Esto se demuestra a continuación:

response = model.generate_content("What is the meaning of life?", stream=True)

El atributo prompt_feedback funciona:

response.prompt_feedback
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}

Sin embargo, los atributos como text no hacen lo siguiente:

try:
  response.text
except Exception as e:
  print(f'{type(e).__name__}: {e}')
IncompleteIterationError: Please let the response complete iteration before accessing the final accumulated
attributes (or call `response.resolve()`)

Genera texto a partir de entradas de imagen y texto

Gemini proporciona varios modelos que pueden controlar entradas multimodales (Gemini 1.5). modelos) para que puedas ingresar texto e imágenes. Asegúrate de revisar el requisitos de imágenes para las instrucciones.

Cuando la entrada de la instrucción incluya tanto texto como imágenes, usa Gemini 1.5 con Método GenerativeModel.generate_content para generar salida de texto:

Incluyamos una imagen:

curl -o image.jpg https://t0.gstatic.com/licensed-image?q=tbn:ANd9GcQ_Kevbk21QBRy-PgB4kQpS79brbmmEG7m3VOTShAn4PecDU5H5UxrJxE3Dw1JiaG17V88QIol19-3TM2wCHw
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  405k  100  405k    0     0  6982k      0 --:--:-- --:--:-- --:--:-- 7106k
import PIL.Image

img = PIL.Image.open('image.jpg')
img

png

Usa un modelo de Gemini 1.5 y pasa la imagen al modelo con generate_content.

model = genai.GenerativeModel('gemini-1.5-flash')
response = model.generate_content(img)

to_markdown(response.text)
Chicken Teriyaki Meal Prep Bowls with brown rice, roasted broccoli and bell peppers.

Para incluir imágenes y texto en una instrucción, pasa una lista que contenga las cadenas e imágenes:

response = model.generate_content(["Write a short, engaging blog post based on this picture. It should include a description of the meal in the photo and talk about my journey meal prepping.", img], stream=True)
response.resolve()
to_markdown(response.text)
Meal prepping is a great way to save time and money, and it can also help you to eat healthier. This meal is a great example of a healthy and delicious meal that can be easily prepped ahead of time.

This meal features brown rice, roasted vegetables, and chicken teriyaki. The brown rice is a whole grain that is high in fiber and nutrients. The roasted vegetables are a great way to get your daily dose of vitamins and minerals. And the chicken teriyaki is a lean protein source that is also packed with flavor.

This meal is easy to prepare ahead of time. Simply cook the brown rice, roast the vegetables, and cook the chicken teriyaki. Then, divide the meal into individual containers and store them in the refrigerator. When you're ready to eat, simply grab a container and heat it up.

This meal is a great option for busy people who are looking for a healthy and delicious way to eat. It's also a great meal for those who are trying to lose weight or maintain a healthy weight.

If you're looking for a healthy and delicious meal that can be easily prepped ahead of time, this meal is a great option. Give it a try today!

Conversaciones de chat

Gemini te permite tener conversaciones de formato libre en múltiples turnos. El La clase ChatSession simplifica el proceso, ya que administra el estado de la conversación, así que, a diferencia de generate_content, no tienes que almacenar la del historial de conversaciones en una lista.

Inicializa el chat:

model = genai.GenerativeModel('gemini-1.5-flash')
chat = model.start_chat(history=[])
chat
<google.generativeai.generative_models.ChatSession at 0x7b7b68250100>

El ChatSession.send_message método muestra el mismo tipo GenerateContentResponse que GenerativeModel.generate_content También se adjunta tu mensaje y la respuesta al historial de chat:

response = chat.send_message("In one sentence, explain how a computer works to a young child.")
to_markdown(response.text)
A computer is like a very smart machine that can understand and follow our instructions, help us with our work, and even play games with us!
chat.history
[
  parts {
    text: "In one sentence, explain how a computer works to a young child."
  }
  role: "user",
  parts {
    text: "A computer is like a very smart machine that can understand and follow our instructions, help us with our work, and even play games with us!"
  }
  role: "model"
]

Puedes seguir enviando mensajes para continuar la conversación. Usa el stream=True para transmitir el chat:

response = chat.send_message("Okay, how about a more detailed explanation to a high schooler?", stream=True)

for chunk in response:
  print(chunk.text)
  print("_"*80)
A computer works by following instructions, called a program, which tells it what to
________________________________________________________________________________
 do. These instructions are written in a special language that the computer can understand, and they are stored in the computer's memory. The computer's processor
________________________________________________________________________________
, or CPU, reads the instructions from memory and carries them out, performing calculations and making decisions based on the program's logic. The results of these calculations and decisions are then displayed on the computer's screen or stored in memory for later use.

To give you a simple analogy, imagine a computer as a
________________________________________________________________________________
 chef following a recipe. The recipe is like the program, and the chef's actions are like the instructions the computer follows. The chef reads the recipe (the program) and performs actions like gathering ingredients (fetching data from memory), mixing them together (performing calculations), and cooking them (processing data). The final dish (the output) is then presented on a plate (the computer screen).

In summary, a computer works by executing a series of instructions, stored in its memory, to perform calculations, make decisions, and display or store the results.
________________________________________________________________________________

Los objetos glm.Content contienen una lista de objetos glm.Part que cada uno contiene un texto (cadena) o un inline_data (glm.Blob), donde un BLOB contiene objetos binarios datos y un mime_type. El historial de chat está disponible como una lista de glm.Content objetos en ChatSession.history:

for message in chat.history:
  display(to_markdown(f'**{message.role}**: {message.parts[0].text}'))
**user**: In one sentence, explain how a computer works to a young child.

**model**: A computer is like a very smart machine that can understand and follow our instructions, help us with our work, and even play games with us!

**user**: Okay, how about a more detailed explanation to a high schooler?

**model**: A computer works by following instructions, called a program, which tells it what to do. These instructions are written in a special language that the computer can understand, and they are stored in the computer's memory. The computer's processor, or CPU, reads the instructions from memory and carries them out, performing calculations and making decisions based on the program's logic. The results of these calculations and decisions are then displayed on the computer's screen or stored in memory for later use.

To give you a simple analogy, imagine a computer as a chef following a recipe. The recipe is like the program, and the chef's actions are like the instructions the computer follows. The chef reads the recipe (the program) and performs actions like gathering ingredients (fetching data from memory), mixing them together (performing calculations), and cooking them (processing data). The final dish (the output) is then presented on a plate (the computer screen).

In summary, a computer works by executing a series of instructions, stored in its memory, to perform calculations, make decisions, and display or store the results.

Contar tokens

Los modelos grandes de lenguaje tienen una ventana de contexto y la longitud del contexto suele ser medirse en términos de la cantidad de tokens. Con la API de Gemini, puedes hacer lo siguiente: Determina la cantidad de tokens por cualquier objeto genai.protos.Content. En la caso más simple, puedes pasar una cadena de consulta al GenerativeModel.count_tokens de la siguiente manera:

model.count_tokens("What is the meaning of life?")
total_tokens: 7

De manera similar, puedes verificar token_count para tu ChatSession:

model.count_tokens(chat.history)
total_tokens: 501

Usa incorporaciones

Incorporación es una técnica utilizada para representar información como una lista de números de punto flotante en un array. Con Gemini, puedes representar texto (palabras, oraciones y bloques) del texto) en un formato vectorizado, lo que facilita la comparación y el contraste de las incorporaciones. Por ejemplo, dos textos que comparten un tema similar o la opinión debe tener incorporaciones similares, que pueden identificarse técnicas de comparación matemática, como la similitud coseno. Para obtener más información y por qué deberías usar incorporaciones, consulta el artículo Incorporaciones .

Usa el método embed_content para generar incorporaciones. El método controla incorporación para las siguientes tareas (task_type):

Tipo de tarea Descripción
RETRIEVAL_QUERY Especifica que el texto dado es una consulta en un parámetro de configuración de búsqueda/recuperación.
RETRIEVAL_DOCUMENT Especifica que el texto dado de un documento en un parámetro de configuración de búsqueda y recuperación. Para usar este tipo de tarea, se requiere una title.
SEMANTIC_SIMILARITY Especifica que el texto dado se usará para la similitud textual semántica (STS).
CLASIFICACIÓN Especifica que las incorporaciones se usarán para la clasificación.
Agrupamiento en clústeres Especifica que las incorporaciones se usarán para el agrupamiento en clústeres.

A continuación, se genera una incorporación de una sola cadena para la recuperación de documentos:

result = genai.embed_content(
    model="models/embedding-001",
    content="What is the meaning of life?",
    task_type="retrieval_document",
    title="Embedding of single string")

# 1 input > 1 vector output
print(str(result['embedding'])[:50], '... TRIMMED]')
[-0.003216741, -0.013358698, -0.017649598, -0.0091 ... TRIMMED]

Para controlar lotes de cadenas, pasa una lista de cadenas en content:

result = genai.embed_content(
    model="models/embedding-001",
    content=[
      'What is the meaning of life?',
      'How much wood would a woodchuck chuck?',
      'How does the brain work?'],
    task_type="retrieval_document",
    title="Embedding of list of strings")

# A list of inputs > A list of vectors output
for v in result['embedding']:
  print(str(v)[:50], '... TRIMMED ...')
[0.0040260437, 0.004124458, -0.014209415, -0.00183 ... TRIMMED ...
[-0.004049845, -0.0075574904, -0.0073463684, -0.03 ... TRIMMED ...
[0.025310587, -0.0080734305, -0.029902633, 0.01160 ... TRIMMED ...

Aunque la función genai.embed_content acepta cadenas o listas de cadenas, se construyen en torno al tipo genai.protos.Content (como GenerativeModel.generate_content). Los objetos glm.Content son las unidades de conversación principales de la API.

Mientras que el objeto genai.protos.Content es multimodal, la embed_content solo admite incorporaciones de texto. Este diseño le brinda a la API la posibilidad de expandirse a incorporaciones multimodales.

response.candidates[0].content
parts {
  text: "A computer works by following instructions, called a program, which tells it what to do. These instructions are written in a special language that the computer can understand, and they are stored in the computer\'s memory. The computer\'s processor, or CPU, reads the instructions from memory and carries them out, performing calculations and making decisions based on the program\'s logic. The results of these calculations and decisions are then displayed on the computer\'s screen or stored in memory for later use.\n\nTo give you a simple analogy, imagine a computer as a chef following a recipe. The recipe is like the program, and the chef\'s actions are like the instructions the computer follows. The chef reads the recipe (the program) and performs actions like gathering ingredients (fetching data from memory), mixing them together (performing calculations), and cooking them (processing data). The final dish (the output) is then presented on a plate (the computer screen).\n\nIn summary, a computer works by executing a series of instructions, stored in its memory, to perform calculations, make decisions, and display or store the results."
}
role: "model"
result = genai.embed_content(
    model = 'models/embedding-001',
    content = response.candidates[0].content)

# 1 input > 1 vector output
print(str(result['embedding'])[:50], '... TRIMMED ...')
[-0.013921871, -0.03504407, -0.0051786783, 0.03113 ... TRIMMED ...

De manera similar, el historial de chat contiene una lista de objetos genai.protos.Content, que puedes pasar directamente a la función embed_content:

chat.history
[
  parts {
    text: "In one sentence, explain how a computer works to a young child."
  }
  role: "user",
  parts {
    text: "A computer is like a very smart machine that can understand and follow our instructions, help us with our work, and even play games with us!"
  }
  role: "model",
  parts {
    text: "Okay, how about a more detailed explanation to a high schooler?"
  }
  role: "user",
  parts {
    text: "A computer works by following instructions, called a program, which tells it what to do. These instructions are written in a special language that the computer can understand, and they are stored in the computer\'s memory. The computer\'s processor, or CPU, reads the instructions from memory and carries them out, performing calculations and making decisions based on the program\'s logic. The results of these calculations and decisions are then displayed on the computer\'s screen or stored in memory for later use.\n\nTo give you a simple analogy, imagine a computer as a chef following a recipe. The recipe is like the program, and the chef\'s actions are like the instructions the computer follows. The chef reads the recipe (the program) and performs actions like gathering ingredients (fetching data from memory), mixing them together (performing calculations), and cooking them (processing data). The final dish (the output) is then presented on a plate (the computer screen).\n\nIn summary, a computer works by executing a series of instructions, stored in its memory, to perform calculations, make decisions, and display or store the results."
  }
  role: "model"
]
result = genai.embed_content(
    model = 'models/embedding-001',
    content = chat.history)

# 1 input > 1 vector output
for i,v in enumerate(result['embedding']):
  print(str(v)[:50], '... TRIMMED...')
[-0.014632266, -0.042202696, -0.015757175, 0.01548 ... TRIMMED...
[-0.010979066, -0.024494737, 0.0092659835, 0.00803 ... TRIMMED...
[-0.010055617, -0.07208932, -0.00011750793, -0.023 ... TRIMMED...
[-0.013921871, -0.03504407, -0.0051786783, 0.03113 ... TRIMMED...

Casos de uso avanzados

En las siguientes secciones, se analizan casos de uso avanzados y detalles de nivel inferior de la SDK de Python para la API de Gemini.

Configuración de seguridad

El argumento safety_settings te permite configurar lo que el modelo bloquea y permite en las instrucciones y las respuestas. La configuración de seguridad bloquea contenido de forma predeterminada con una probabilidad media o alta de que el contenido no sea seguro en todos dimensiones. Más información sobre Seguridad Configuración.

Ingresa una instrucción cuestionable y ejecuta el modelo con la configuración de seguridad predeterminada. y no mostrará candidatos:

response = model.generate_content('[Questionable prompt here]')
response.candidates
[
  content {
    parts {
      text: "I\'m sorry, but this prompt involves a sensitive topic and I\'m not allowed to generate responses that are potentially harmful or inappropriate."
    }
    role: "model"
  }
  finish_reason: STOP
  index: 0
  safety_ratings {
    category: HARM_CATEGORY_SEXUALLY_EXPLICIT
    probability: NEGLIGIBLE
  }
  safety_ratings {
    category: HARM_CATEGORY_HATE_SPEECH
    probability: NEGLIGIBLE
  }
  safety_ratings {
    category: HARM_CATEGORY_HARASSMENT
    probability: NEGLIGIBLE
  }
  safety_ratings {
    category: HARM_CATEGORY_DANGEROUS_CONTENT
    probability: NEGLIGIBLE
  }
]

El prompt_feedback te indicará qué filtro de seguridad bloqueó el mensaje:

response.prompt_feedback
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}

Ahora proporciona la misma instrucción al modelo con los nuevos parámetros de configuración de seguridad y es posible que recibas una respuesta.

response = model.generate_content('[Questionable prompt here]',
                                  safety_settings={'HARASSMENT':'block_none'})
response.text

Además, ten en cuenta que cada candidato tiene su propio safety_ratings, en caso de que el mensaje pero las respuestas individuales no aprueban las verificaciones de seguridad.

Codificar mensajes

Las secciones anteriores se basaban en el SDK para facilitarte el envío de instrucciones. a la API. En esta sección, se ofrece un equivalente completo a las versiones anteriores ejemplo, para que puedas comprender mejor los detalles de bajo nivel sobre cómo El SDK codifica los mensajes.

El SDK intenta convertir tu mensaje en un objeto genai.protos.Content. que incluye una lista de objetos genai.protos.Part, cada uno con lo siguiente:

  1. un text (cadena)
  2. inline_data (genai.protos.Blob), donde un BLOB contiene un objeto binario data y un objeto mime_type.
  3. y otros tipos de datos.

También puedes pasar cualquiera de estas clases como un diccionario equivalente.

Por lo tanto, el equivalente completamente escrito del ejemplo anterior es el siguiente:

model = genai.GenerativeModel('gemini-1.5-flash')
response = model.generate_content(
    genai.protos.Content(
        parts = [
            genai.protos.Part(text="Write a short, engaging blog post based on this picture."),
            genai.protos.Part(
                inline_data=genai.protos.Blob(
                    mime_type='image/jpeg',
                    data=pathlib.Path('image.jpg').read_bytes()
                )
            ),
        ],
    ),
    stream=True)
response.resolve()

to_markdown(response.text[:100] + "... [TRIMMED] ...")
Meal prepping is a great way to save time and money, and it can also help you to eat healthier. By ... [TRIMMED] ...

Conversaciones de varios turnos

Si bien la clase genai.ChatSession que se mostró antes puede controlar muchos casos de uso, hace algunas suposiciones. Si tu caso de uso no encaja en este chat implementación, recuerda que genai.ChatSession es solo un wrapper. alrededor GenerativeModel.generate_content Además de solicitudes individuales, puede controlar conversaciones de varios turnos.

Los mensajes individuales son objetos genai.protos.Content o compatibles de diccionarios, como se ve en secciones anteriores. Como diccionario, el mensaje requiere las claves role y parts. El elemento role de una conversación puede ser el user, que proporciona los mensajes, o model, que proporciona las respuestas.

Pasa una lista de objetos genai.protos.Content y se tratará como chat de varios turnos:

model = genai.GenerativeModel('gemini-1.5-flash')

messages = [
    {'role':'user',
     'parts': ["Briefly explain how a computer works to a young child."]}
]
response = model.generate_content(messages)

to_markdown(response.text)
Imagine a computer as a really smart friend who can help you with many things. Just like you have a brain to think and learn, a computer has a brain too, called a processor. It's like the boss of the computer, telling it what to do.

Inside the computer, there's a special place called memory, which is like a big storage box. It remembers all the things you tell it to do, like opening games or playing videos.

When you press buttons on the keyboard or click things on the screen with the mouse, you're sending messages to the computer. These messages travel through special wires, called cables, to the processor.

The processor reads the messages and tells the computer what to do. It can open programs, show you pictures, or even play music for you.

All the things you see on the screen are created by the graphics card, which is like a magic artist inside the computer. It takes the processor's instructions and turns them into colorful pictures and videos.

To save your favorite games, videos, or pictures, the computer uses a special storage space called a hard drive. It's like a giant library where the computer can keep all your precious things safe.

And when you want to connect to the internet to play games with friends or watch funny videos, the computer uses something called a network card to send and receive messages through the internet cables or Wi-Fi signals.

So, just like your brain helps you learn and play, the computer's processor, memory, graphics card, hard drive, and network card all work together to make your computer a super-smart friend that can help you do amazing things!

Para continuar la conversación, agrega la respuesta y otro mensaje.

messages.append({'role':'model',
                 'parts':[response.text]})

messages.append({'role':'user',
                 'parts':["Okay, how about a more detailed explanation to a high school student?"]})

response = model.generate_content(messages)

to_markdown(response.text)
At its core, a computer is a machine that can be programmed to carry out a set of instructions. It consists of several essential components that work together to process, store, and display information:

**1. Processor (CPU):**
   -   The brain of the computer.
   -   Executes instructions and performs calculations.
   -   Speed measured in gigahertz (GHz).
   -   More GHz generally means faster processing.

**2. Memory (RAM):**
   -   Temporary storage for data being processed.
   -   Holds instructions and data while the program is running.
   -   Measured in gigabytes (GB).
   -   More GB of RAM allows for more programs to run simultaneously.

**3. Storage (HDD/SSD):**
   -   Permanent storage for data.
   -   Stores operating system, programs, and user files.
   -   Measured in gigabytes (GB) or terabytes (TB).
   -   Hard disk drives (HDDs) are traditional, slower, and cheaper.
   -   Solid-state drives (SSDs) are newer, faster, and more expensive.

**4. Graphics Card (GPU):**
   -   Processes and displays images.
   -   Essential for gaming, video editing, and other graphics-intensive tasks.
   -   Measured in video RAM (VRAM) and clock speed.

**5. Motherboard:**
   -   Connects all the components.
   -   Provides power and communication pathways.

**6. Input/Output (I/O) Devices:**
   -   Allow the user to interact with the computer.
   -   Examples: keyboard, mouse, monitor, printer.

**7. Operating System (OS):**
   -   Software that manages the computer's resources.
   -   Provides a user interface and basic functionality.
   -   Examples: Windows, macOS, Linux.

When you run a program on your computer, the following happens:

1.  The program instructions are loaded from storage into memory.
2.  The processor reads the instructions from memory and executes them one by one.
3.  If the instruction involves calculations, the processor performs them using its arithmetic logic unit (ALU).
4.  If the instruction involves data, the processor reads or writes to memory.
5.  The results of the calculations or data manipulation are stored in memory.
6.  If the program needs to display something on the screen, it sends the necessary data to the graphics card.
7.  The graphics card processes the data and sends it to the monitor, which displays it.

This process continues until the program has completed its task or the user terminates it.

Configuración de generación

El argumento generation_config te permite modificar los parámetros de generación. Cada instrucción que envías al modelo incluye valores de parámetros que controlan cómo el modelo genera respuestas.

model = genai.GenerativeModel('gemini-1.5-flash')
response = model.generate_content(
    'Tell me a story about a magic backpack.',
    generation_config=genai.types.GenerationConfig(
        # Only one candidate for now.
        candidate_count=1,
        stop_sequences=['x'],
        max_output_tokens=20,
        temperature=1.0)
)
text = response.text

if response.candidates[0].finish_reason.name == "MAX_TOKENS":
    text += '...'

to_markdown(text)
Once upon a time, in a small town nestled amidst lush green hills, lived a young girl named...

¿Qué sigue?

  • El diseño de instrucciones es el proceso de crear instrucciones que provocan el interés deseado de los modelos de lenguaje. Escribir instrucciones bien estructuradas es una esencial para garantizar respuestas precisas y de alta calidad de un idioma un modelo de responsabilidad compartida. Más información sobre las prácticas recomendadas para la instrucción escritura.
  • Gemini ofrece diversas variaciones de modelos para satisfacer las necesidades de los distintos usos como tipos de entrada y complejidad, implementaciones para chat u otros las tareas de lenguaje de diálogo y las restricciones de tamaño. Obtén más información sobre los Modelos de Gemini.