feat: add OpenAI's new structured output API #180

monotykamary · 2024-09-21T11:20:47Z

What does this PR do?:

Add json_schema Field to ChatOpenAI and Enhance Response Format Handling
Cleans up a handful of tests for mix test --include live_open_ai

Summary:

This pull request introduces a new json_schema field to the ChatOpenAI module, which enhances the response format handling capabilities by allowing for JSON Schema-based validation.

Changes:

Field Definition:
- Added field :json_schema, :map, default: nil to the ChatOpenAI struct.
- Updated the list of valid struct keys to include :json_schema.
Response Format Handling:
- Modified the set_response_format function to account for cases when json_response is true and json_schema is provided.
  - If json_response is true and json_schema is not nil, the response format is set to include json_schema.
  - If json_response is true but json_schema is nil, the response format defaults to json_object.
  - If json_response is false, the response format is set to text.

Code Details:

Field Addition:
```
field :json_schema, :map, default: nil
```

Response Format Modification:

defp set_response_format(%ChatOpenAI{json_response: true, json_schema: json_schema}) when not is_nil(json_schema) do
  %{
    "type" => "json_schema",
    "json_schema" => json_schema
  }
end

defp set_response_format(%ChatOpenAI{json_response: true}) do
  %{"type" => "json_object"}
end

defp set_response_format(%ChatOpenAI{json_response: false}) do
  %{"type" => "text"}
end

brainlid · 2024-09-23T01:17:26Z

Thanks @monotykamary for the PR!

From what I see, it looks like your primary goal here is to attach a json_schema to the OpenAI module. However, the schema is not sent to OpenAI, it is intended to be optionally used by the developer later. It's basically a "hold on to this because it's relevant and I may need it later." The ultimate goal seems to be to link the schema to the request so it can be used later to validate generated responses for schema compliance.

Is that right? Or did I miss something?

If that's right, then I'm inclined to have that live on the LLMChain so other models benefit as well. Or optionally in a MessageProcessor for schema compliance.

Oh, and thanks for fixing those bad tests! 😊 How did those get in there? 😅

monotykamary · 2024-09-23T01:31:10Z

It's essentially to implement structured outputs from OpenAI, which is more like json_object, but we specify the json schema inside the response format instead of inside the prompt: https://platform.openai.com/docs/guides/structured-outputs/examples

However, the schema is not sent to OpenAI

Wait, how? 🤔 We are using this exact code to structure our outputs for a scraper we have.

ksanderer · 2024-09-23T15:53:05Z

OAI Structured Output requires the response_format field. The model will use the schema during the inference phase to sample generated tokens only from the "allowed" tokens.

https://platform.openai.com/docs/api-reference/chat/create#chat-create-response_format

brainlid · 2024-09-23T18:51:01Z

Awesome! Thanks for walking me through that and thank you for the contribution!

❤️💛💙💜

* 'main' of github.com:brainlid/langchain: Add AWS Bedrock support to ChatAnthropic (#154) Handle functions with no parameters for Google AI (#183) Handle missing token usage fields for Google AI (#184) Handle empty text parts from GoogleAI responses (#181) Support system instructions for Google AI (#182) feat: add OpenAI's new structured output API (#180) Support strict mode for tools (#173) Do not duplicate tool call parameters if they are identical (#174) 🐛 cast tool_calls arguments correctly inside message_deltas (#175)

lud-wj · 2024-12-13T18:59:32Z

Do you know if this is going to be released anytime soon ?

brainlid · 2024-12-14T16:58:50Z

Hi @lud-wj, I keep intending to make a release soon. It will probably be a new RC because there's still a couple changes I want to make. That and I'm procrastinating writing the CHANGELOG. 😆

lud-wj · 2024-12-14T18:04:34Z

Alright, thank you!

vkryukov · 2024-12-15T01:30:53Z

That and I'm procrastinating writing the CHANGELOG. 😆

Don't forget that this is a package for working with LLMs! Here is something to get you started: I asked Claude to write the summary change log for git log v0.3.0-rc.0..HEAD --pretty="%s" in the format of CHANGELOG.md. I have no idea whether that's accurate or not, but just to give you a starting point :).

## v0.3.0 (Unreleased)

**Major Features:**
* Added AWS Bedrock support for Anthropic Claude integration
* Added support for OpenAI's new structured output API
* Added support for fallback handling when LLM services are overloaded
* Improved tool/function handling across providers:
  - Strict mode for tools
  - Better support for tool calls with parameters
  - Improved error handling for function execution
  - Support for functions with no parameters in Google AI

**Improvements:**
* Enhanced Google AI Support:
  - Added safety settings configuration
  - Added system instructions support
  - Fixed handling of empty text parts and token usage
  - Better handling of finish reasons
* Improved streaming reliability:
  - Fixed streaming issues with Azure OpenAI Service
  - Fixed OpenAI stream decode issues
  - Fixed Ollama streaming response
* Better error handling and recovery:
  - Improved handling of "Too many requests" from AWS Bedrock
  - Better handling of overloaded service responses
  - Enhanced function execution failure responses

**Breaking Changes:**
* Changed return value structure of `LLMChain.run/2`
* Added "processed_content" to ToolResult struct

**Documentation:**
* Improved documentation for all LLM providers
* Added examples for tool_choice in OpenAI and Anthropic
* Updated configuration documentation for API keys
* Added new examples and notebooks for image content chats

brainlid · 2024-12-16T00:13:05Z

Ha! You rock @vkryukov! I used your approach as a starting point and I'm prepping an RC.1 right now.

brainlid · 2024-12-16T00:25:17Z

Published the new RC release

lud-wj · 2024-12-17T17:58:31Z

You can use git cliff to generate nice changelogs, since you are already using commit prefixes like feat:, fix, etc.

monotykamary added 4 commits September 21, 2024 18:18

feat: add OpenAI's new structured output API

b751525

test(openai): add test cases for response_format

e0e1eb7

fix: clean up tests

2093217

chore: revert env comment from testing

b2df785

brainlid merged commit 72f93a6 into brainlid:main Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add OpenAI's new structured output API #180

feat: add OpenAI's new structured output API #180

monotykamary commented Sep 21, 2024 •

edited

Loading

brainlid commented Sep 23, 2024

monotykamary commented Sep 23, 2024

ksanderer commented Sep 23, 2024

brainlid commented Sep 23, 2024

lud-wj commented Dec 13, 2024

brainlid commented Dec 14, 2024 •

edited

Loading

lud-wj commented Dec 14, 2024

vkryukov commented Dec 15, 2024

brainlid commented Dec 16, 2024

brainlid commented Dec 16, 2024

lud-wj commented Dec 17, 2024

feat: add OpenAI's new structured output API #180

feat: add OpenAI's new structured output API #180

Conversation

monotykamary commented Sep 21, 2024 • edited Loading

What does this PR do?:

Summary:

Changes:

Code Details:

brainlid commented Sep 23, 2024

monotykamary commented Sep 23, 2024

ksanderer commented Sep 23, 2024

brainlid commented Sep 23, 2024

lud-wj commented Dec 13, 2024

brainlid commented Dec 14, 2024 • edited Loading

lud-wj commented Dec 14, 2024

vkryukov commented Dec 15, 2024

brainlid commented Dec 16, 2024

brainlid commented Dec 16, 2024

lud-wj commented Dec 17, 2024

monotykamary commented Sep 21, 2024 •

edited

Loading

brainlid commented Dec 14, 2024 •

edited

Loading