Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON mode #65

Closed
scblaze opened this issue Dec 3, 2024 · 5 comments
Closed

JSON mode #65

scblaze opened this issue Dec 3, 2024 · 5 comments

Comments

@scblaze
Copy link

scblaze commented Dec 3, 2024

It would be helpful if we could constrain the generated output to valid JSON.

A promptJSON function might make sense as the API. When used, it would constrain the model generation to valid JSON (potentially also parsing that into an object).

session.promptJson("List five cities and their populations, respond in jsdoc type format: {name: string, population: number}[]");

This would primarily be helpful to force the model to avoid extra description text such as:

Sure, your five cities are:

[{ "name": "New York...

Possibly in the future, a schema could be supplied that would further constrain output to match that schema (edit: requested in #35). Regardless of that, a general restriction to valid JSON would be very helpful.

@christianliebel
Copy link

Related: #7

@domenic
Copy link
Collaborator

domenic commented Dec 3, 2024

Yep, it's a duplicate of #7 in fact :)

Duplicate of #35

@domenic domenic closed this as completed Dec 3, 2024
@scblaze
Copy link
Author

scblaze commented Dec 4, 2024

@domenic this is related to #35 but the suggestion here is more basic than what is proposed in that issue. This issue is requesting allowing constraining output generation to any valid JSON, without constraining to a specific schema.

The suggestion proposed in #35 would be very useful and would make this request obsolete if implemented. However if #35 was not implemented, a form of JSON mode as suggested here could be a useful incremental step.

@tomayac
Copy link
Contributor

tomayac commented Dec 4, 2024

Without the guarantees of a schema. any valid JSON won't help you with the parsing.

"Give me synonyms for the word 'happy'. Respond in valid JSON."

It could then respond with an array, an object, a combination thereof,… If you can't tell what to expect, the validity of the JSON alone won't help you much.

@scblaze
Copy link
Author

scblaze commented Dec 4, 2024

You wouldn't write:

Give me synonyms for the word 'happy'. Respond in valid JSON.

You instead write something like:

Give me synonyms for the word 'happy'. Respond in valid JSON with JSDOC type format `string[]`

Or

Give me synonyms for the word 'happy'. Respond in valid JSON with an array of the synonyms like:

["synonym 1", "synonym 2"]

Or various other combinations of instructions and examples.

#35 would allow you to force the correct format response, but models by themselves are pretty good at following instructions if you provide them.

One common failure mode these chat models have though, is they like to add extra descriptive text. For example they might reply:

Here are your synonyms:

["content"]

Or

\`\`\`javascript
["content"]
\`\`\`

(please ignore the backslashes in this example, I don't know how to format this correctly here)

Dealing with these can be quite difficult. A JSON mode addresses this common failure mode, even though it can't guarantee you get the schema back you want.

To be clear, I would prefer something like #35. But if that isn't implement, JSON mode would be very useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants