View on ai.google.dev | Run in Google Colab | View source on GitHub |
The Gemini API supports prompting with text, image, and audio data, also known as multimodal prompting. You can include text, image, and audio in your prompts. For small images, you can point the Gemini model directly to a local file when providing a prompt. For larger images, videos (sequences of image frames), and audio, upload the files with the File API before including them in prompts.
The File API lets you store up to 20GB of files per project, with each file not exceeding 2GB in size. Files are stored for 48 hours and can be accessed with your API key for generation within that time period. It is available at no cost in all regions where the Gemini API is available.
For information on valid file formats (MIME types) and supported models, see Supported file formats.
This guide shows how to use the File API to upload a media file and include it in a GenerateContent
call to the Gemini API. For more information, see the code
samples.
Setup
Before you use the File API, you need to install the Gemini API SDK package and configure an API key. This section describes how to complete these setup steps.
Install the Python SDK and import packages
The Python SDK for the Gemini API is contained in the google-generativeai package. Install the dependency using pip.
pip install -q -U google-generativeai
Import the necessary packages.
import google.generativeai as genai
from IPython.display import Markdown
Setup your API key
The File API uses API keys for authentication and access. Uploaded files are associated with the project linked to the API key. Unlike other Gemini APIs that use API keys, your API key also grants access to data you've uploaded to the File API, so take extra care in keeping your API key secure. For more on keeping your keys secure, see Best practices for using API keys.
Store your API key in a Colab Secret named GOOGLE_API_KEY
. If you don't already have an API key, or are unfamiliar with Colab Secrets, refer to the Authentication quickstart.
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)
Upload a file to the File API
The File API lets you upload a variety of multimodal MIME types, including images and audio formats. The File API handles inputs that can be used to generate content with model.generateContent
or model.streamGenerateContent
.
The File API accepts files under 2GB in size and can store up to 20GB of files per project. Files last for 2 days and cannot be downloaded from the API.
First, you will prepare a sample image to upload to the API.
To upload your own file, see the Appendix section.
curl -o image.jpg https://storage.googleapis.com/generativeai-downloads/images/jetpack.jpg
Next, you'll upload that file to the File API.
sample_file = genai.upload_file(path="image.jpg",
display_name="Sample drawing")
print(f"Uploaded file '{sample_file.display_name}' as: {sample_file.uri}")
Uploaded file 'Sample drawing' as: https://generativelanguage.googleapis.com/v1beta/files/ui00j5zfuqe0
The response
shows that the File API stored the specified display_name
for the uploaded file and a uri
to reference the file in Gemini API calls. Use response
to track how uploaded files are mapped to URIs.
Depending on your use cases, you could store the URIs in structures such as a dict
or a database.
Get file
After uploading the file, you can verify the API has successfully received the files by calling files.get
.
It lets you get the file metadata that have been uploaded to the File API that are associated with the Cloud project your API key belongs to. Only the name
(and by extension, the uri
) are unique. Only use the displayName
to identify files if you manage uniqueness yourself.
file = genai.get_file(name=sample_file.name)
print(f"Retrieved file '{file.display_name}' as: {sample_file.uri}")
Generate content
After uploading the file, you can make GenerateContent
requests that reference the File API URI. In this example, you create prompt that starts with a text followed by the uploaded image.
# Set the model to Gemini 1.5 Pro.
model = genai.GenerativeModel(model_name="models/gemini-1.5-pro-latest")
response = model.generate_content(["Describe the image with a creative description.", sample_file])
Markdown(">" + response.text)
Delete files
Files are automatically deleted after 2 days. You can also manually delete them using files.delete()
.
genai.delete_file(sample_file.name)
print(f'Deleted {sample_file.display_name}.')
Supported file formats
Gemini models support prompting with multiple file formats. This section explains considerations in using general media formats for prompting, specifically image, audio, and video files. You can use media files for prompting only with specific model versions, as shown in the following table.
Model | Images | Audio | Video |
---|---|---|---|
Gemini 1.5 Pro (release 008 and later) | ✔ (3600 max image files) | ✔ | ✔ |
Gemini Pro Vision | ✔ (16 max image files) |
Image formats
You can use image data for prompting with the gemini-pro-vision
and gemini-1.5-pro
models. When you use images for prompting, they are subject to the following limitations and requirements:
- Images must be in one of the following image data MIME types:
- PNG - image/png
- JPEG - image/jpeg
- WEBP - image/webp
- HEIC - image/heic
- HEIF - image/heif
- Maximum of 16 individual images for the
gemini-pro-vision
and 3600 images forgemini-1.5-pro
- No specific limits to the number of pixels in an image; however, larger images are scaled down to fit a maximum resolution of 3072 x 3072 while preserving their original aspect ratio.
Audio formats
You can use audio data for prompting with the gemini-1.5-pro
model. When you use audio for prompting, they are subject to the following limitations and requirements:
- Audio data is supported in the following common audio format MIME types:
- WAV - audio/wav
- MP3 - audio/mp3
- AIFF - audio/aiff
- AAC - audio/aac
- OGG Vorbis - audio/ogg
- FLAC - audio/flac
- The maximum supported length of audio data in a single prompt is 9.5 hours.
- Audio files are resampled down to a 16 Kbps data resolution, and multiple channels of audio are combined into a single channel.
- There is no specific limit to the number of audio files in a single prompt, however the total combined length of all audio files in a single prompt cannot exceed 9.5 hours.
Video formats
You can use video data for prompting with the gemini-1.5-pro
model. However, video file formats are not supported as direct inputs by the Gemini API. You can use video data as prompt input by breaking down the video into a series of still frame images and a separate audio file. This approach lets you manage the amount of data, and the level of detail provided by the video, by choosing how many frames per second are included in your prompt from the video file.
Appendix: Uploading files to Colab
This notebook uses the File API with files that were downloaded from the internet. If you're running this in Colab and want to use your own files, you first need to upload them to the colab instance.
First, click Files on the left sidebar, then click the Upload button:
Next, you'll upload that file to the File API. In the form for the code cell below, enter the filename for the file you uploaded and provide an appropriate display name for the file, then run the cell.
my_filename = "gemini_logo.png" # @param {type:"string"}
my_file_display_name = "Gemini Logo" # @param {type:"string"}
my_file = genai.upload_file(path=my_filename,
display_name=my_file_display_name)
print(f"Uploaded file '{my_file.display_name}' as: {my_file.uri}")