Quickstart: Speech to text with the Azure OpenAI Whisper model
This quickstart explains how to use the Azure OpenAI Whisper model for speech to text conversion. The Whisper model can transcribe human speech in numerous languages, and it can also translate other languages into English.
The file size limit for the Whisper model is 25 MB. If you need to transcribe a file larger than 25 MB, you can use the Azure AI Speech batch transcription API.
Note
The OpenAI Whisper model is currently in Limited Access Public Preview.
Prerequisites
- An Azure subscription. You can create one for free.
- An Azure OpenAI resource with a Whisper model deployed in a supported region. For more information, see Create a resource and deploy a model with Azure OpenAI.
Set up
Retrieve key and endpoint
To successfully make a call against Azure OpenAI, you need an endpoint and a key.
Variable name | Value |
---|---|
AZURE_OPENAI_ENDPOINT |
This value can be found in the Keys & Endpoint section when examining your resource from the Azure portal. Alternatively, you can find the value in the Azure OpenAI Studio > Playground > Code View. An example endpoint is: https://aoai-docs.openai.azure.com/ . |
AZURE_OPENAI_API_KEY |
This value can be found in the Keys & Endpoint section when examining your resource from the Azure portal. You can use either KEY1 or KEY2 . |
Go to your resource in the Azure portal. The Endpoint and Keys can be found in the Resource Management section. Copy your endpoint and access key as you'll need both for authenticating your API calls. You can use either KEY1
or KEY2
. Always having two keys allows you to securely rotate and regenerate keys without causing a service disruption.
Environment variables
Create and assign persistent environment variables for your key and endpoint.
Important
If you use an API key, store it securely somewhere else, such as in Azure Key Vault. Don't include the API key directly in your code, and never post it publicly.
For more information about AI services security, see Authenticate requests to Azure AI services.
setx AZURE_OPENAI_API_KEY "REPLACE_WITH_YOUR_KEY_VALUE_HERE"
setx AZURE_OPENAI_ENDPOINT "REPLACE_WITH_YOUR_ENDPOINT_HERE"
REST API
In a bash shell, run the following command. You need to replace YourDeploymentName
with the deployment name you chose when you deployed the Whisper model. The deployment name isn't necessarily the same as the model name. Entering the model name results in an error unless you chose a deployment name that is identical to the underlying model name.
curl $AZURE_OPENAI_ENDPOINT/openai/deployments/YourDeploymentName/audio/transcriptions?api-version=2024-02-01 \
-H "api-key: $AZURE_OPENAI_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@./wikipediaOcelot.wav"
The first line of the preceding command with an example endpoint would appear as follows:
curl https://aoai-docs.openai.azure.com/openai/deployments/{YourDeploymentName}/audio/transcriptions?api-version=2024-02-01 \
You can get sample audio files, such as wikipediaOcelot.wav, from the Azure AI Speech SDK repository at GitHub.
Important
For production, store and access your credentials using a secure method, such as Azure Key Vault. For more information about credential security, see Azure AI services security.
Output
{"text":"The ocelot, Lepardus paradalis, is a small wild cat native to the southwestern United States, Mexico, and Central and South America. This medium-sized cat is characterized by solid black spots and streaks on its coat, round ears, and white neck and undersides. It weighs between 8 and 15.5 kilograms, 18 and 34 pounds, and reaches 40 to 50 centimeters 16 to 20 inches at the shoulders. It was first described by Carl Linnaeus in 1758. Two subspecies are recognized, L. p. paradalis and L. p. mitis. Typically active during twilight and at night, the ocelot tends to be solitary and territorial. It is efficient at climbing, leaping, and swimming. It preys on small terrestrial mammals such as armadillo, opossum, and lagomorphs."}
PowerShell
Run the following command. You need to replace YourDeploymentName
with the deployment name you chose when you deployed the Whisper model. The deployment name isn't necessarily the same as the model name. Entering the model name results in an error unless you chose a deployment name that is identical to the underlying model name.
# Azure OpenAI metadata variables
$openai = @{
api_key = $Env:AZURE_OPENAI_API_KEY
api_base = $Env:AZURE_OPENAI_ENDPOINT # your endpoint should look like the following https://YOUR_RESOURCE_NAME.openai.azure.com/
api_version = '2024-02-01' # this may change in the future
name = 'YourDeploymentName' #This will correspond to the custom name you chose for your deployment when you deployed a model.
}
# Header for authentication
$headers = [ordered]@{
'api-key' = $openai.api_key
}
$form = @{ file = get-item -path './wikipediaOcelot.wav' }
# Send a completion call to generate an answer
$url = "$($openai.api_base)/openai/deployments/$($openai.name)/audio/transcriptions?api-version=$($openai.api_version)"
$response = Invoke-RestMethod -Uri $url -Headers $headers -Form $form -Method Post -ContentType 'multipart/form-data'
return $response.text
You can get sample audio files, such as wikipediaOcelot.wav, from the Azure AI Speech SDK repository at GitHub.
Important
For production, store and access your credentials using a secure method, such as The PowerShell Secret Management with Azure Key Vault. For more information about credential security, see Azure AI services security.
Output
The ocelot, Lepardus paradalis, is a small wild cat native to the southwestern United States, Mexico, and Central and South America. This medium-sized cat is characterized by solid black spots and streaks on its coat, round ears, and white neck and undersides. It weighs between 8 and 15.5 kilograms, 18 and 34 pounds, and reaches 40 to 50 centimeters 16 to 20 inches at the shoulders. It was first described by Carl Linnaeus in 1758. Two subspecies are recognized, L. p. paradalis and L. p. mitis. Typically active during twilight and at night, the ocelot tends to be solitary and territorial. It is efficient at climbing, leaping, and swimming. It preys on small terrestrial mammals such as armadillo, opossum, and lagomorphs.
Python
Prerequisites
- Python 3.8 or later
- The following Python library: os
Set up
Install the OpenAI Python client library with:
pip install openai
Create a new Python file called quickstart.py. Then open it up in your preferred editor or IDE.
Replace the contents of quickstart.py with the following code. Modify the code to add your deployment name:
import os
from openai import AzureOpenAI
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-02-01",
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
)
deployment_id = "YOUR-DEPLOYMENT-NAME-HERE" #This will correspond to the custom name you chose for your deployment when you deployed a model."
audio_test_file = "./wikipediaOcelot.wav"
result = client.audio.transcriptions.create(
file=open(audio_test_file, "rb"),
model=deployment_id
)
print(result)
Run the application using the python
command on your quickstart file:
python quickstart.py
You can get sample audio files, such as wikipediaOcelot.wav, from the Azure AI Speech SDK repository at GitHub.
Important
For production, store and access your credentials using a secure method, such as Azure Key Vault. For more information about credential security, see Azure AI services security.
Output
{"text":"The ocelot, Lepardus paradalis, is a small wild cat native to the southwestern United States, Mexico, and Central and South America. This medium-sized cat is characterized by solid black spots and streaks on its coat, round ears, and white neck and undersides. It weighs between 8 and 15.5 kilograms, 18 and 34 pounds, and reaches 40 to 50 centimeters 16 to 20 inches at the shoulders. It was first described by Carl Linnaeus in 1758. Two subspecies are recognized, L. p. paradalis and L. p. mitis. Typically active during twilight and at night, the ocelot tends to be solitary and territorial. It is efficient at climbing, leaping, and swimming. It preys on small terrestrial mammals such as armadillo, opossum, and lagomorphs."}
Clean up resources
If you want to clean up and remove an Azure OpenAI resource, you can delete the resource. Before deleting the resource, you must first delete any deployed models.
Next steps
- To learn how to convert audio data to text in batches, see Create a batch transcription.
- For more examples, check out the Azure OpenAI Samples GitHub repository.