How to use OpenAI’s new Structured Outputs API (with code)

3 min readAug 8, 2024

OpenAI has recently released a game-changing feature for devs looking to build more reliable systems.

The new model, gpt-4o-2024–08–06, with Structured Outputs scores a perfect 100% on OpenAI’s structured extraction evaluation. In comparison, gpt-4–0613 scores less than 40%. Source: OpenAI’s blog post

This new feature ensures that the model’s output will exactly match the JSON Schemas provided by developers, making it easier to build powerful assistants and extract structured data.

⚠️ If you want to use this technique with GPT-4o or other LLMs to extract clean structured data from any PDF, Word doc, or website, check out this open source extractor tool

How does it work?

Under the hood, OpenAI uses a technique called constrained sampling or constrained decoding. Instead of allowing the model to select any token from the vocabulary, it constrains the output to only tokens that are valid according to the supplied schema. This is done dynamically, so the model can still generate flexible and diverse responses while adhering to the specified structure.

The constrained decoding approach used by OpenAI involves dynamically determining which tokens are valid after each token is generated, based on the previously generated tokens and the rules within the context-free grammar (CFG) that indicates which tokens are valid next. This ensures that the model’s output always adheres to the specified schema.

How to use it

I’m more of a “learn-by-example” man, so I’ll give the simplest possible example I could come up with.

In this example, we define a Person class using Pydantic, which has two fields: name and age

from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: int

We then create an OpenAI client and use the chat.completions.parse method to send our request. The messages parameter includes a system message instructing the model to extract the names and ages, and a user message with the text we want to extract data from. The tools parameter includes our Person class, specifying that we want the model to extract data in this format.

from openai import OpenAI
client = OpenAI()

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {
            "role": "system",
            "content": "Extract the names and ages of the people mentioned in the following text."
        },
        {
            "role": "user",
            "content": "John is 30 years old and his sister Alice is 25."
        }
    ],
    tools=[
        openai.pydantic_function_tool(Person)
    ]
)

print(completion.choices[0].message.tool_calls[0].function.parsed_arguments)

completion.choices[0].message.tool_calls[0].function.parsed_arguments is a dictionary that contains the parsed arguments returned by the model, which will match the structure defined by your Pydantic class. In this case, since we defined a Person class with name and age fields, the parsed arguments will also have those fields.

Here’s an example of what the parsed arguments might look like:

{
    'name': 'John',
    'age': 30
}

Now, let’s say you want to use these parsed arguments in your code instead of just printing them. Here’s how you could create a new Person object using the parsed arguments:

# Create a new Person object using the parsed arguments
person = Person(**parsed_arguments)
print(person.name)  # Output: John
print(person.age)   # Output: 30

That’s pretty neat!

What models can I use?

Structured ouputs are supported on all models that include gpt-4 and later, and response formats available on gpt-4o-mini and gpt-4o-2024–08–06.

Limitations and Restrictions

There are a few limitations to keep in mind when using Structured Outputs:

Structured Outputs only supports a subset of JSON Schema, as detailed in OpenAI’s documentation.
The first API response with a new schema will incur additional latency due to the preprocessing of the schema. Subsequent responses will be faster with no latency penalty.
The model may refuse unsafe requests or stop generating before completing the schema if it reaches max_tokens or another stop condition.
Structured Outputs doesn’t prevent all kinds of model mistakes within the values of the JSON object.
Structured Outputs is not compatible with parallel function calls.

How to use OpenAI’s new Structured Outputs API (with code)

How does it work?

How to use it

What models can I use?

Limitations and Restrictions

Written by Emmett McFarlane