This tutorial demonstrates how to access the Gemini API for your Dart or Flutter application using the Google AI Dart SDK. You can use this SDK if you don't want to work directly with REST APIs for accessing Gemini models in your app.
In this tutorial, you'll learn how to do the following:
- Set up your project, including your API key
- Generate text from text-only input
- Generate text from text-and-image input (multimodal)
- Build multi-turn conversations (chat)
- Use streaming for faster interactions
In addition, this tutorial contains sections about advanced use cases (like embeddings and counting tokens) as well as options for controlling content generation.
Prerequisites
This tutorial assumes you're familiar with building applications with Dart.
To complete this tutorial, make sure that your development environment meets the following requirements:
- Dart 3.2.0+
Set up your project
Before calling the Gemini API, you need to set up your project, which includes setting up your API key, adding the SDK to your pub dependencies, and initializing the model.
Set up your API key
To use the Gemini API, you'll need an API key. If you don't already have one, create a key in Google AI Studio.
Secure your API key
Keep your API key secure. We strongly recommend that you do not include the API key directly in your code, or check files that contain the key into version control systems. Instead, you should use a secrets store for your API key.
All the snippets in this tutorial assume that you're accessing your API key as
a process environment variable. If you're developing a Flutter app, you can use
String.fromEnvironment
and pass --dart-define=API_KEY=$API_KEY
to
flutter build
or flutter run
to compile with the API key since the process
environment will be different when running the app.
Install the SDK package
To use the Gemini API in your own application, you need to add
the
google_generative_ai
package to your Dart or Flutter app:
Dart
dart pub add google_generative_ai
Flutter
flutter pub add google_generative_ai
Initialize the generative model
Before you can make any API calls, you need to import and initialize the generative model.
import 'dart:io';
import 'package:google_generative_ai/google_generative_ai.dart';
void main() async {
// Access your API key as an environment variable (see "Set up your API key" above)
final apiKey = Platform.environment['API_KEY'];
if (apiKey == null) {
print('No \$API_KEY environment variable');
exit(1);
}
// The Gemini 1.5 models are versatile and work with most use cases
final model = GenerativeModel(model: 'gemini-1.5-flash', apiKey: apiKey);
}
When specifying a model, note the following:
Use a model that's specific to your use case (for example,
gemini-1.5-flash
is for multimodal input). Within this guide, the instructions for each implementation list the recommended model for each use case.
Implement common use cases
Now that your project is set up, you can explore using the Gemini API to implement different use cases:
- Generate text from text-only input
- Generate text from text-and-image input (multimodal)
- Build multi-turn conversations (chat)
- Use streaming for faster interactions
In the advanced use cases section, you can find information about the Gemini API and embeddings.
Generate text from text-only input
When the prompt input includes only text, use a Gemini 1.5 model or the
Gemini 1.0 Pro model with generateContent
to generate text output:
import 'dart:io';
import 'package:google_generative_ai/google_generative_ai.dart';
void main() async {
// Access your API key as an environment variable (see "Set up your API key" above)
final apiKey = Platform.environment['API_KEY'];
if (apiKey == null) {
print('No \$API_KEY environment variable');
exit(1);
}
// The Gemini 1.5 models are versatile and work with both text-only and multimodal prompts
final model = GenerativeModel(model: 'gemini-1.5-flash', apiKey: apiKey);
final content = [Content.text('Write a story about a magic backpack.')];
final response = await model.generateContent(content);
print(response.text);
}
Generate text from text-and-image input (multimodal)
Gemini provides various models that can handle multimodal input (Gemini 1.5 models) so that you can input both text and images. Make sure to review the image requirements for prompts.
When the prompt input includes both text and images, use a Gemini 1.5 model
with the generateContent
method to generate text output:
import 'dart:io';
import 'package:google_generative_ai/google_generative_ai.dart';
void main() async {
// Access your API key as an environment variable (see "Set up your API key" above)
final apiKey = Platform.environment['API_KEY'];
if (apiKey == null) {
print('No \$API_KEY environment variable');
exit(1);
}
// The Gemini 1.5 models are versatile and work with both text-only and multimodal prompts
final model = GenerativeModel(model: 'gemini-1.5-flash', apiKey: apiKey);
final (firstImage, secondImage) = await (
File('image0.jpg').readAsBytes(),
File('image1.jpg').readAsBytes()
).wait;
final prompt = TextPart("What's different between these pictures?");
final imageParts = [
DataPart('image/jpeg', firstImage),
DataPart('image/jpeg', secondImage),
];
final response = await model.generateContent([
Content.multi([prompt, ...imageParts])
]);
print(response.text);
}
Build multi-turn conversations (chat)
Using Gemini, you can build freeform conversations across multiple turns. The
SDK simplifies the process by managing the state of the conversation, so unlike
with generateContent
, you don't have to store the conversation history
yourself.
To build a multi-turn conversation (like chat), use a Gemini 1.5 model or the
Gemini 1.0 Pro model, and initialize the chat by calling startChat()
.
Then use sendMessage()
to send a new user message, which will also append the
message and the response to the chat history.
There are two possible options for role
associated with the content in a
conversation:
user
: the role which provides the prompts. This value is the default forsendMessage
calls, and the function will throw an exception if a different role is passed.model
: the role which provides the responses. This role can be used when callingstartChat()
with existinghistory
.
import 'dart:io';
import 'package:google_generative_ai/google_generative_ai.dart';
Future<void> main() async {
// Access your API key as an environment variable (see "Set up your API key" above)
final apiKey = Platform.environment['API_KEY'];
if (apiKey == null) {
print('No \$API_KEY environment variable');
exit(1);
}
// The Gemini 1.5 models are versatile and work with multi-turn conversations (like chat)
final model = GenerativeModel(
model: 'gemini-1.5-flash',
apiKey: apiKey,
generationConfig: GenerationConfig(maxOutputTokens: 100));
// Initialize the chat
final chat = model.startChat(history: [
Content.text('Hello, I have 2 dogs in my house.'),
Content.model([TextPart('Great to meet you. What would you like to know?')])
]);
var content = Content.text('How many paws are in my house?');
var response = await chat.sendMessage(content);
print(response.text);
}
Use streaming for faster interactions
By default, the model returns a response after completing the entire generation process. You can achieve faster interactions by not waiting for the entire result, and instead use streaming to handle partial results.
The following example shows how to implement streaming with the
generateContentStream
method to generate text from a text-and-image input
prompt.
// ...
final response = model.generateContentStream([
Content.multi([prompt, ...imageParts])
]);
await for (final chunk in response) {
print(chunk.text);
}
// ...
You can use a similar approach for text-only input and chat use cases.
// Use streaming with text-only input
final response = model.generateContentStream(content);
// Use streaming with multi-turn conversations (like chat)
final response = chat.sendMessageStream(content);
Implement advanced use cases
The common use cases described in the previous section of this tutorial help you become comfortable with using the Gemini API. This section describes some use cases that might be considered more advanced.
Function calling
Function calling makes it easier for you to get structured data outputs from generative models. You can then use these outputs to call other APIs and return the relevant response data to the model. In other words, function calling helps you connect generative models to external systems so that the generated content includes the most up-to-date and accurate information. Learn more in the function calling tutorial.
Use embeddings
Embedding is a technique used to represent information as a list of floating point numbers in an array. With Gemini, you can represent text (words, sentences, and blocks of text) in a vectorized form, making it easier to compare and contrast embeddings. For example, two texts that share a similar subject matter or sentiment should have similar embeddings, which can be identified through mathematical comparison techniques such as cosine similarity.
Use the embedding-001
model with the embedContent
method (or the
batchEmbedContent
method) to generate embeddings. The following example
generates an embedding for a single string:
final model = GenerativeModel(model: 'embedding-001', apiKey: apiKey);
final content = Content.text('The quick brown fox jumps over the lazy dog.');
final result = await model.embedContent(content);
print(result.embedding.values);
Count tokens
When using long prompts, it might be useful to count tokens before sending any
content to the model. The following examples show how to use countTokens()
for various use cases:
// For text-only input
final tokenCount = await model.countTokens(Content.text(prompt));
print('Token count: ${tokenCount.totalTokens}');
// For text-and-image input (multimodal)
final tokenCount = await model.countTokens([
Content.multi([prompt, ...imageParts])
]);
print('Token count: ${tokenCount.totalTokens}');
// For multi-turn conversations (like chat)
final prompt = Content.text(message);
final allContent = [...chat.history, prompt];
final tokenCount = await model.countTokens(allContent);
print('Token count: ${tokenCount.totalTokens}');
Options to control content generation
You can control content generation by configuring model parameters and by using safety settings.
Note that passing generationConfig
or safetySettings
to a model request
method (like generateContent
) will fully override the configuration object
with the same name passed in getGenerativeModel
.
Configure model parameters
Every prompt you send to the model includes parameter values that control how the model generates a response. The model can generate different results for different parameter values. Learn more about Model parameters. The configuration is maintained for the lifetime of your model instance.
final generationConfig = GenerationConfig(
stopSequences: ["red"],
maxOutputTokens: 200,
temperature: 0.9,
topP: 0.1,
topK: 16,
);
final model = GenerativeModel(
// The Gemini 1.5 models are versatile and work with most use cases
model: 'gemini-1.5-flash',
apiKey: apiKey,
generationConfig: generationConfig,
);
Use safety settings
You can use safety settings to adjust the likelihood of getting responses that may be considered harmful. By default, safety settings block content with medium and/or high probability of being unsafe content across all dimensions. Learn more about Safety settings.
Here's how to set one safety setting:
final safetySettings = [
SafetySetting(HarmCategory.harassment, HarmBlockThreshold.high)
];
final model = GenerativeModel(
// The Gemini 1.5 models are versatile and work with most use cases
model: 'gemini-1.5-flash',
apiKey: apiKey,
safetySettings: safetySettings,
);
You can also set more than one safety setting:
final safetySettings = [
SafetySetting(HarmCategory.harassment, HarmBlockThreshold.high),
SafetySetting(HarmCategory.hateSpeech, HarmBlockThreshold.high),
];
What's next
Prompt design is the process of creating prompts that elicit the desired response from language models. Writing well structured prompts is an essential part of ensuring accurate, high quality responses from a language model. Learn about best practices for prompt writing.
Gemini offers several model variations to meet the needs of different use cases, such as input types and complexity, implementations for chat or other dialog language tasks, and size constraints. Learn about the available Gemini models.