Fix structured data generation for newer gpt-4o models. #61

jemc · 2024-12-06T18:46:20Z

It's been observed that while gpt-4o-2024-05-13 is quite reliable at forced tool calling under Kurt, newer snapshots of gpt-4o are not at all reliable with this (they often respond with natural language even when tool calling is supposed to be forced).

It's my hypothesis that OpenAI made the existing forced tool calling weaker somehow when they added constrained token sampling features (which is a stronger feature), available only in the newer snapshots.

Unfortunately, the set of allowed JSON Schemas for constrained token sampling is smaller than the set allowed for tools on gpt-4o-2024-05-13 so this is in some sense a regression, while in another sense a leap forward (constrained token sampling is a stronger guarantee).

We need to find a suitable approach for dealing with this. There are three parts to this solution, probably:

make the new constrained token sampling mode available as a new option in KurtSamplingOptions, to let applications opt into this stronger guarantee, while accepting the resulting JSON Schema limitations
fiddle with other new API options to try to make the current forced tool calling mode more reliable on newer snapshots (ideally, at least as reliable as it is/was on the older gpt-4o-2024-05-13 snapshot)
update the set of known models to include the newer snapshots, making it easier to distinguish these in testing

Also this highlights the need for a capability eval suite as described in #28, which would have caught this problem faster, and could be used to make conclusive empirical statements about this kind of regression.

The text was updated successfully, but these errors were encountered:

This maps to the new-ish `strict: true` feature of OpenAI which enables constrained token sampling, but has certain caveats that make it undesirable to turn on by default. See issue #61 for more info.

jemc added the bug Something isn't working label Dec 6, 2024

jemc changed the title ~~Fix structed data generation for newer gpt-4o models.~~ Fix structureed data generation for newer gpt-4o models. Dec 6, 2024

jemc changed the title ~~Fix structureed data generation for newer gpt-4o models.~~ Fix structured data generation for newer gpt-4o models. Dec 6, 2024

jemc added the enhancement New feature or request label Dec 6, 2024

jemc mentioned this issue Dec 6, 2024

feat: add forceSchemaConstrainedTokens to KurtSamplingOptions #62

Merged

jemc closed this as completed Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix structured data generation for newer gpt-4o models. #61

Fix structured data generation for newer gpt-4o models. #61

jemc commented Dec 6, 2024

Fix structured data generation for newer gpt-4o models. #61

Fix structured data generation for newer gpt-4o models. #61

Comments

jemc commented Dec 6, 2024