Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json-schema-to-grammar improvements (+ added to server) #5978

Merged
merged 106 commits into from
Mar 21, 2024

Conversation

ochafik
Copy link
Collaborator

@ochafik ochafik commented Mar 10, 2024

Improved JSON schema → GBNF grammar conversion support

  • JSON schema features newly supported:
    • required, allOf, anyOf
    • tuples (incl. common legacy items syntax)
    • $ref (in same schema, or over https if --allow-fetch is set), allowing recursive types
    • additionalProperties
    • pattern (most features, no greediness modifiers nor lookaheads; dot . can be made to match line breaks with --dotall flag)
    • date, time, date-time, uuid string formats
  • Ported to C++ (and updated JS version):
    • Added to server: the API now handles "response_format": {"type": "json_object", "schema": ...} on its own (see examples below), bridging an important gap between the C++ server and llama-cpp-python
    • Added tests to ensure the Python, JavaScript and C++ versions are synced. Their only differences are:
      • The C++ version has no option to resolve remote $ref (deemed too risky in server)
      • The Python version supports promoting a {type: "string", pattern: "..."} as a raw pattern (used by examples/regex-to-grammar.py, see example below)
  • Fixes
    • Prevent [, 1]
    • Respect original property order (except for required properties guaranteed to appear before optionals, and --prop-order flag)

As a result, it can now consume the JSON schemas produced by Pydantic (used by ~ all Python LLM frameworks), typescript-json-schema and its more recent fork ts-json-schema-generator, along with some more advanced schemas (tsconfig.json is the toughest I tested).

Hopefully this PR hasn't grown too big, happy to send it in chunk if needed.

Examples

(outputs below are with Nous-Hermes-2-Mixtruct-v0.1-8x7B-DPO-DARE_TIES-Q6_K)

  • tsconfig.json

    ./main --grammar-file \
      <( python examples/json-schema-to-grammar.py https://json.schemastore.org/tsconfig.json ) \
      -p "Write a tsconfig.json for a simple project with strict types incremental compiler/build options:"
    Show output
    {
      "compilerOptions": {
        "module": "esnext",
        "target": "es2017",
        "sourceMap": true,
        "outDir": "./dist",
        "strict": true,
        "moduleResolution": "node",
        "esModuleInterop": true,
        "skipLibCheck": true,
        "forceConsistentCasingInFileNames": true,
        "lib": [
          "dom.iterable",
          "esnext"
        ],
        "declaration": false,
        "resolveJsonModule": true,
        "isolatedModules": true,
        "noEmit": false,
        "jsx": "react",
        "incremental": true
      },
      "include": [
        "./src/**/*.ts",
        "./src/**/*.tsx"
      ],
      "exclude": [
        "node_modules",
        "dist"
      ]
    }
  • Pydantic w/ recursive types

    pip install pydantic
    
    echo "
    from pydantic import BaseModel
    from typing import Optional, Union, Tuple
    import json
    
    class QAPair(BaseModel):
      question: str
      concise_answer: str
      justification: str
    
    class PyramidalSummary(BaseModel):
      title: str
      summary: str
      question_answers: list[QAPair]
      sub_sections: list['PyramidalSummary']
    
    if __name__ == '__main__':
      print(json.dumps(PyramidalSummary.model_json_schema()))
    
    " | python - | tee qa-schema.json | python examples/json-schema-to-grammar.py - > qa-schema.gbnf
    
    ./main --grammar-file qa-schema.gbnf --log-disable --no-display-prompt -p "
      You are a highly efficient corporate document summarizer.
      Create a pyramidal summary of an imaginary internal document about our company processes
      (starting high-level, going down to each sub sections).
      Keep questions short, and answers even shorter (trivia / quizz style).
      Here is the schema of the output: $( cat qa-schema.json ).
    " | tee out.json
    Show output
    // cat out.json | jq
    {
      "title": "Company Processes Summary",
      "summary": "Our company has a strong focus on employee development, sustainability and innovation.",
      "question_answers": [
        {
          "question": "What is the main focus of our company?",
          "concise_answer": "Employee development, sustainability and innovation",
          "justification": "Mentioned in internal document."
        },
        {
          "question": "How does the company support employee development?",
          "concise_answer": "Regular training and mentorship programs.",
          "justification": "Document states 'We provide regular training and mentorship programs for employees to continually develop their skills.'"
        }
      ],
      "sub_sections": [
        {
          "title": "Employee Development Processes",
          "summary": "Our company provides regular training and mentorship programs for employees to grow professionally.",
          "question_answers": [
            {
              "question": "What methods does the company use to foster employee development?",
              "concise_answer": "Training workshops, online courses and mentorship.",
              "justification": "Stated in internal document."
            }
          ],
          "sub_sections": [
            {
              "title": "Training Workshops",
              "summary": "We organize regular training workshops to enhance employees' skills and knowledge.",
              "question_answers": [
                {
                  "question": "What is the purpose of training workshops?",
                  "concise_answer": "To improve employees' skills and knowledge.",
                  "justification": "Said so in internal document."
                }
              ],
              "sub_sections": []
            },
            {
              "title": "Online Courses",
              "summary": "We offer a variety of online courses for employees to learn at their own pace.",
              "question_answers": [
                {
                  "question": "What is the advantage of offering online courses?",
                  "concise_answer": "Flexibility and convenience.",
                  "justification": "Can be inferred from document."
                }
              ],
              "sub_sections": []
            },
            {
              "title": "Mentorship",
              "summary": "We pair employees with experienced mentors to guide them in their career paths.",
              "question_answers": [
                {
                  "question": "Why do we have a mentorship program?",
                  "concise_answer": "To facilitate career development and growth.",
                  "justification": "Explicitly stated in document."
                }
              ],
              "sub_sections": []
            }
          ]
        },
        {
          "title": "Sustainability Initiatives",
          "summary": "Our company is committed to promoting sustainability through various green initiatives.",
          "question_answers": [
            {
              "question": "What are some examples of our sustainability initiatives?",
              "concise_answer": "Recycling programs, energy-efficient lighting and eco-friendly office supplies.",
              "justification": "Mentioned in internal document."
            }
          ],
          "sub_sections": []
        },
        {
          "title": "Recycling Programs",
          "summary": "We have implemented recycling programs to minimize waste and promote sustainability.",
          "question_answers": [
            {
              "question": "What types of materials do we recycle?",
              "concise_answer": "Paper, plastic, glass and aluminum.",
              "justification": "Specified in internal document."
            }
          ],
          "sub_sections": []
        },
        {
          "title": "Energy-Efficient Lighting",
          "summary": "We use energy-efficient lighting solutions to reduce our carbon footprint and save on energy costs.",
          "question_answers": [
            {
              "question": "What kind of energy-efficient lighting do we use?",
              "concise_answer": "LED lights.",
              "justification": "Detailed in internal document."
            }
          ],
          "sub_sections": []
        },
        {
          "title": "Eco-Friendly Office Supplies",
          "summary": "We prioritize the use of eco-friendly office supplies to support our sustainability goals.",
          "question_answers": [
            {
              "question": "What are some examples of eco-friendly office supplies we use?",
              "concise_answer": "Recycled paper, non-toxic ink cartridges and reusable pens.",
              "justification": "Listed in internal document."
            }
          ],
          "sub_sections": []
        }
      ]
    }
  • Regular expressions

    # Note that anything optional seems to require careful prompting.
    ./main --grammar "$( python examples/regex-to-grammar.py '^(\([0-9]{1,3}\))?[0-9]{3}-[0-9]{4}$' )" \
      -p "What is my phone number? (make sure to include the full prefix, wrapped in parentheses):"
    Show output
    (970)512-4628
    
  • TypeScript types

    # Notice optional 'e' accepts additional properties returning arrays of [string, number] tuples
    ./examples/ts-type-to-grammar.sh "{a: string, b: string, c?: number|string, d?: string, e?: {[i: string]: [string,number][]}}"
    Show output
    a-kv ::= "\"a\"" space ":" space string
    b-kv ::= "\"b\"" space ":" space string
    c ::= number | string
    c-kv ::= "\"c\"" space ":" space c
    c-rest ::= ( "," space d-kv )? d-rest
    d-kv ::= "\"d\"" space ":" space string
    d-rest ::= ( "," space e-kv )?
    e ::= "{" space  (e-additional-kvs )? "}" space
    e-additional-kv ::= string ":" space e-additional-value
    e-additional-kvs ::= e-additional-kv ( "," space e-additional-kv )*
    e-additional-value ::= "[" space ( e-additional-value-item ( "," space e-additional-value-item )* )? "]" space
    e-additional-value-item ::= "[" space string "," space number "]" space
    e-kv ::= "\"e\"" space ":" space e
    number ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)? space
    root ::= "{" space a-kv "," space b-kv ( "," space ( c-kv c-rest | d-kv d-rest | e-kv ) )? "}" space
    space ::= " "?
    string ::=  "\"" (
            [^"\\] |
            "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
          )* "\"" space
  • Typescript type including Date:

    # TypeScript allows for a compact description, useful to save tokens when putting the schema in the prompt itself:
    ( export TYPE="{name: string, description: string, start: Date, end?: Date}[]" ; \
        ./main --grammar-file <( ./examples/ts-type-to-grammar.sh "$TYPE" ) \
            --log-disable --no-display-prompt -n 2048 -p "
              List the 10 most noteworthy events or periods of the past 3 centuries, in JSON format '$TYPE'.
              Do specify an end date if the event was not an instant:
            " \
            | tee out.json )
    Show output
    // cat out.json | jq
    [
      {
        "name": "The Enlightenment",
        "description": "A cultural movement in Europe from roughly 1685 to 1815 that emphasized reason and individualism rather than tradition.",
        "start": "1685-01-01T00:00:00Z",
        "end": "1815-01-01T00:00:00Z"
      },
      {
        "name": "The Industrial Revolution",
        "description": "A period of major industrialization and innovation, starting in England around 1760 and lasting until the early 20th century.",
        "start": "1760-01-01T00:00:00Z",
        "end": "1920-01-01T00:00:00Z"
      },
      {
        "name": "The French Revolution",
        "description": "A period of radical social and political upheaval in France, starting in 1789 and lasting until 1799.",
        "start": "1789-01-01T00:00:00Z",
        "end": "1799-01-01T00:00:00Z"
      },
      {
        "name": "The American Civil War",
        "description": "A conflict between the Northern and Southern states of America, starting in 1861 and lasting until 1865.",
        "start": "1861-01-01T00:00:00Z",
        "end": "1865-01-01T00:00:00Z"
      },
      {
        "name": "The World Wars",
        "description": "Two major global conflicts in the 20th century, with World War I lasting from 1914 until 1918 and World War II from 1939 until 1945.",
        "start": "1914-01-01T00:00:00Z",
        "end": "1945-01-01T00:00:00Z"
      },
      {
        "name": "The Cold War",
        "description": "A period of geopolitical tension between the Western and Eastern blocs, starting in 1947 and lasting until 1991.",
        "start": "1947-01-01T00:00:00Z",
        "end": "1991-01-01T00:00:00Z"
      },
      {
        "name": "The Space Age",
        "description": "A period of space exploration and development, starting in the mid 20th century and continuing to this day.",
        "start": "1946-01-01T00:00:00Z",
        "end": "2100-01-01T00:00:00Z"
      }
    ]
  • Other special formats:

    ./main \
      --grammar "$( \
        echo '{"prefixItems": [{ "format": "date" }, { "format": "uuid" }, { "format": "time" }, { "format": "date-time" }]}' | \
        python examples/json-schema-to-grammar.py - )" \
     -p "A very important date:"
    Show output
    [ "1967-05-29", "19680913-1968-0915-1968-091431140000", "19:00:00-19:01" ,"2008-07-30T11:16:29.154Z"] 
  • JSON schema in server API (using instructor without llama-cpp-python):

    pip install instructor openai pydantic
    ./server -m some-model.gguf &
    
    python -c '
    import instructor
    from typing import List
    from openai import OpenAI
    from pydantic import BaseModel
    
    client = instructor.patch(
        OpenAI(base_url="http://localhost:8080", api_key="123"),
        mode=instructor.Mode.JSON_SCHEMA)
    
    class UserDetail(BaseModel):
        name: str
        age: int
    
    print(client.chat.completions.create(
        model="whatever",
        messages=[{
            "role": "user",
            "content": "Extract `Jason is 30 years old`",
        }],
        response_model=UserDetail))
    '
    Show output
    name='Jason' age=30

TODOs (before undrafting this PR)

  • Convert to .mjs again and test server + chat.mjs (had to implement a custom regexp parser instead of using Python's builtin one, to allow porting to JS)
  • Test patterns w/ unicode and more syntactic features
  • Document kv generation strategy (added longer example below)
  • Support string format "date"
  • Support string format "date-time"
  • Test new OAI json format server behaviour
  • Support mix of regular, optional and additional properties
  • Fix sanitizer tests

Possible followups:

ochafik added 30 commits March 1, 2024 14:11
{"type": "string", "pattern": "^({\"question\": \"[^\"]+\", \"response\": \"[^\"]+\"}\\n)+$"}
Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squash merge when you are ready

I haven't tested the implementation, but it seems you've put enough effort and there are unit tests, so it should be good 👍

Comment on lines 32 to 35
std::cerr << "#" << std::endl;
std::cerr << "# Test '" << name.c_str() << "' failed." << std::endl;
std::cerr << "#" << std::endl;
std::cerr << schema.c_str() << std::endl;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the rest of the codebase:

std::cerr -> fprintf(stderr,
std::cout -> fprintf(stdout,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return result;
}

static std::string replacePattern(const std::string& input, const std::regex& regex, const std::function<std::string(const std::smatch &)>& replacement) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, put spaces before and after - see the rest of the usages in the codebase:

Suggested change
static std::string replacePattern(const std::string& input, const std::regex& regex, const std::function<std::string(const std::smatch &)>& replacement) {
static std::string replacePattern(const std::string & input, const std::regex & regex, const std::function<std::string(const std::smatch &)> & replacement) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ochafik ochafik merged commit 5b7b0ac into ggerganov:master Mar 21, 2024
54 of 56 checks passed
ochafik added a commit to ochafik/llama.cpp that referenced this pull request Mar 21, 2024
ggerganov pushed a commit that referenced this pull request Mar 22, 2024
* json: only attempt python & node schema conversion tests if their bins are present

Tests introduced in #5978
disabled in #6198

* json: orange warnings when tests skipped

* json: ensure py/js schema conv tested on ubuntu-focal-make

* json: print env vars in test
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* json: fix arrays (disallow `[,1]`)

* json: support tuple types (`[number, string]`)

* json: support additionalProperties (`{[k: string]: [string,number][]}`)

* json: support required / optional properties

* json: add support for pattern

* json: resolve $ref (and support https schema urls)

* json: fix $ref resolution

* join: support union types (mostly for nullable types I think)

* json: support allOf + nested anyOf

* json: support any (`{}` or `{type: object}`)

* json: fix merge

* json: temp fix for escapes

* json: spaces in output and unrestricted output spaces

* json: add typings

* json:fix typo

* Create ts-type-to-grammar.sh

* json: fix _format_literal (json.dumps already escapes quotes)

* json: merge lit sequences and handle negatives

{"type": "string", "pattern": "^({\"question\": \"[^\"]+\", \"response\": \"[^\"]+\"}\\n)+$"}

* json: handle pattern repetitions

* Update json-schema-to-grammar.mjs

* Create regex-to-grammar.py

* json: extract repeated regexp patterns to subrule

* Update json-schema-to-grammar.py

* Update json-schema-to-grammar.py

* Update json-schema-to-grammar.py

* json: handle schema from pydantic Optional fields

* Update json-schema-to-grammar.py

* Update json-schema-to-grammar.py

* Update ts-type-to-grammar.sh

* Update ts-type-to-grammar.sh

* json: simplify nullable fields handling

* json: accept duplicate identical rules

* json: revert space to 1 at most

* json: reuse regexp pattern subrules

* json: handle uuid string format

* json: fix literal escapes

* json: add --allow-fetch

* json: simplify range escapes

* json: support negative ranges in patterns

* Delete commit.txt

* json: custom regex parser, adds dot support & JS-portable

* json: rm trailing spaces

* Update json-schema-to-grammar.mjs

* json: updated server & chat `( cd examples/server && ./deps.sh )`

* json: port fixes from mjs to python

* Update ts-type-to-grammar.sh

* json: support prefixItems alongside array items

* json: add date format + fix uuid

* json: add date, time, date-time formats

* json: preserve order of props from TS defs

* json: port schema converter to C++, wire in ./server

* json: nits

* Update json-schema-to-grammar.cpp

* Update json-schema-to-grammar.cpp

* Update json-schema-to-grammar.cpp

* json: fix mjs implementation + align outputs

* Update json-schema-to-grammar.mjs.hpp

* json: test C++, JS & Python versions

* json: nits + regen deps

* json: cleanup test

* json: revert from c++17 to 11

* json: nit fixes

* json: dirty include for test

* json: fix zig build

* json: pass static command to std::system in tests (fixed temp files)

* json: fix top-level $refs

* json: don't use c++20 designated initializers

* nit

* json: basic support for reserved names `{number:{number:{root:number}}}`

* Revamp test cmake to allow args (WORKING_DIRECTORY needed for JSON test)

* json: re-ran server deps.sh

* json: simplify test

* json: support mix of additional props & required/optional

* json: add tests for some expected failures

* json: fix type=const in c++, add failure expectations for non-str const&enum

* json: test (& simplify output of) empty schema

* json: check parsing in test + fix value & string refs

* json: add server tests for OAI JSON response_format

* json: test/fix top-level anyOf

* json: improve grammar parsing failures

* json: test/fix additional props corner cases

* json: fix string patterns (was missing quotes)

* json: ws nit

* json: fix json handling in server when there's no response_format

* json: catch schema conversion errors in server

* json: don't complain about unknown format type in server if unset

* json: cleaner build of test

* json: create examples/json-schema-pydantic-example.py

* json: fix date pattern

* json: move json.hpp & json-schema-to-grammar.{cpp,h} to common

* json: indent 4 spaces

* json: fix naming of top-level c++ function (+ drop unused one)

* json: avoid using namespace std

* json: fix zig build

* Update server.feature

* json: iostream -> fprintf

* json: space before & refs for consistency

* json: nits
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* json: only attempt python & node schema conversion tests if their bins are present

Tests introduced in ggerganov#5978
disabled in ggerganov#6198

* json: orange warnings when tests skipped

* json: ensure py/js schema conv tested on ubuntu-focal-make

* json: print env vars in test
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024
* json: fix arrays (disallow `[,1]`)

* json: support tuple types (`[number, string]`)

* json: support additionalProperties (`{[k: string]: [string,number][]}`)

* json: support required / optional properties

* json: add support for pattern

* json: resolve $ref (and support https schema urls)

* json: fix $ref resolution

* join: support union types (mostly for nullable types I think)

* json: support allOf + nested anyOf

* json: support any (`{}` or `{type: object}`)

* json: fix merge

* json: temp fix for escapes

* json: spaces in output and unrestricted output spaces

* json: add typings

* json:fix typo

* Create ts-type-to-grammar.sh

* json: fix _format_literal (json.dumps already escapes quotes)

* json: merge lit sequences and handle negatives

{"type": "string", "pattern": "^({\"question\": \"[^\"]+\", \"response\": \"[^\"]+\"}\\n)+$"}

* json: handle pattern repetitions

* Update json-schema-to-grammar.mjs

* Create regex-to-grammar.py

* json: extract repeated regexp patterns to subrule

* Update json-schema-to-grammar.py

* Update json-schema-to-grammar.py

* Update json-schema-to-grammar.py

* json: handle schema from pydantic Optional fields

* Update json-schema-to-grammar.py

* Update json-schema-to-grammar.py

* Update ts-type-to-grammar.sh

* Update ts-type-to-grammar.sh

* json: simplify nullable fields handling

* json: accept duplicate identical rules

* json: revert space to 1 at most

* json: reuse regexp pattern subrules

* json: handle uuid string format

* json: fix literal escapes

* json: add --allow-fetch

* json: simplify range escapes

* json: support negative ranges in patterns

* Delete commit.txt

* json: custom regex parser, adds dot support & JS-portable

* json: rm trailing spaces

* Update json-schema-to-grammar.mjs

* json: updated server & chat `( cd examples/server && ./deps.sh )`

* json: port fixes from mjs to python

* Update ts-type-to-grammar.sh

* json: support prefixItems alongside array items

* json: add date format + fix uuid

* json: add date, time, date-time formats

* json: preserve order of props from TS defs

* json: port schema converter to C++, wire in ./server

* json: nits

* Update json-schema-to-grammar.cpp

* Update json-schema-to-grammar.cpp

* Update json-schema-to-grammar.cpp

* json: fix mjs implementation + align outputs

* Update json-schema-to-grammar.mjs.hpp

* json: test C++, JS & Python versions

* json: nits + regen deps

* json: cleanup test

* json: revert from c++17 to 11

* json: nit fixes

* json: dirty include for test

* json: fix zig build

* json: pass static command to std::system in tests (fixed temp files)

* json: fix top-level $refs

* json: don't use c++20 designated initializers

* nit

* json: basic support for reserved names `{number:{number:{root:number}}}`

* Revamp test cmake to allow args (WORKING_DIRECTORY needed for JSON test)

* json: re-ran server deps.sh

* json: simplify test

* json: support mix of additional props & required/optional

* json: add tests for some expected failures

* json: fix type=const in c++, add failure expectations for non-str const&enum

* json: test (& simplify output of) empty schema

* json: check parsing in test + fix value & string refs

* json: add server tests for OAI JSON response_format

* json: test/fix top-level anyOf

* json: improve grammar parsing failures

* json: test/fix additional props corner cases

* json: fix string patterns (was missing quotes)

* json: ws nit

* json: fix json handling in server when there's no response_format

* json: catch schema conversion errors in server

* json: don't complain about unknown format type in server if unset

* json: cleaner build of test

* json: create examples/json-schema-pydantic-example.py

* json: fix date pattern

* json: move json.hpp & json-schema-to-grammar.{cpp,h} to common

* json: indent 4 spaces

* json: fix naming of top-level c++ function (+ drop unused one)

* json: avoid using namespace std

* json: fix zig build

* Update server.feature

* json: iostream -> fprintf

* json: space before & refs for consistency

* json: nits
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024
* json: only attempt python & node schema conversion tests if their bins are present

Tests introduced in ggerganov#5978
disabled in ggerganov#6198

* json: orange warnings when tests skipped

* json: ensure py/js schema conv tested on ubuntu-focal-make

* json: print env vars in test
tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024
* json: fix arrays (disallow `[,1]`)

* json: support tuple types (`[number, string]`)

* json: support additionalProperties (`{[k: string]: [string,number][]}`)

* json: support required / optional properties

* json: add support for pattern

* json: resolve $ref (and support https schema urls)

* json: fix $ref resolution

* join: support union types (mostly for nullable types I think)

* json: support allOf + nested anyOf

* json: support any (`{}` or `{type: object}`)

* json: fix merge

* json: temp fix for escapes

* json: spaces in output and unrestricted output spaces

* json: add typings

* json:fix typo

* Create ts-type-to-grammar.sh

* json: fix _format_literal (json.dumps already escapes quotes)

* json: merge lit sequences and handle negatives

{"type": "string", "pattern": "^({\"question\": \"[^\"]+\", \"response\": \"[^\"]+\"}\\n)+$"}

* json: handle pattern repetitions

* Update json-schema-to-grammar.mjs

* Create regex-to-grammar.py

* json: extract repeated regexp patterns to subrule

* Update json-schema-to-grammar.py

* Update json-schema-to-grammar.py

* Update json-schema-to-grammar.py

* json: handle schema from pydantic Optional fields

* Update json-schema-to-grammar.py

* Update json-schema-to-grammar.py

* Update ts-type-to-grammar.sh

* Update ts-type-to-grammar.sh

* json: simplify nullable fields handling

* json: accept duplicate identical rules

* json: revert space to 1 at most

* json: reuse regexp pattern subrules

* json: handle uuid string format

* json: fix literal escapes

* json: add --allow-fetch

* json: simplify range escapes

* json: support negative ranges in patterns

* Delete commit.txt

* json: custom regex parser, adds dot support & JS-portable

* json: rm trailing spaces

* Update json-schema-to-grammar.mjs

* json: updated server & chat `( cd examples/server && ./deps.sh )`

* json: port fixes from mjs to python

* Update ts-type-to-grammar.sh

* json: support prefixItems alongside array items

* json: add date format + fix uuid

* json: add date, time, date-time formats

* json: preserve order of props from TS defs

* json: port schema converter to C++, wire in ./server

* json: nits

* Update json-schema-to-grammar.cpp

* Update json-schema-to-grammar.cpp

* Update json-schema-to-grammar.cpp

* json: fix mjs implementation + align outputs

* Update json-schema-to-grammar.mjs.hpp

* json: test C++, JS & Python versions

* json: nits + regen deps

* json: cleanup test

* json: revert from c++17 to 11

* json: nit fixes

* json: dirty include for test

* json: fix zig build

* json: pass static command to std::system in tests (fixed temp files)

* json: fix top-level $refs

* json: don't use c++20 designated initializers

* nit

* json: basic support for reserved names `{number:{number:{root:number}}}`

* Revamp test cmake to allow args (WORKING_DIRECTORY needed for JSON test)

* json: re-ran server deps.sh

* json: simplify test

* json: support mix of additional props & required/optional

* json: add tests for some expected failures

* json: fix type=const in c++, add failure expectations for non-str const&enum

* json: test (& simplify output of) empty schema

* json: check parsing in test + fix value & string refs

* json: add server tests for OAI JSON response_format

* json: test/fix top-level anyOf

* json: improve grammar parsing failures

* json: test/fix additional props corner cases

* json: fix string patterns (was missing quotes)

* json: ws nit

* json: fix json handling in server when there's no response_format

* json: catch schema conversion errors in server

* json: don't complain about unknown format type in server if unset

* json: cleaner build of test

* json: create examples/json-schema-pydantic-example.py

* json: fix date pattern

* json: move json.hpp & json-schema-to-grammar.{cpp,h} to common

* json: indent 4 spaces

* json: fix naming of top-level c++ function (+ drop unused one)

* json: avoid using namespace std

* json: fix zig build

* Update server.feature

* json: iostream -> fprintf

* json: space before & refs for consistency

* json: nits
tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024
* json: only attempt python & node schema conversion tests if their bins are present

Tests introduced in ggerganov#5978
disabled in ggerganov#6198

* json: orange warnings when tests skipped

* json: ensure py/js schema conv tested on ubuntu-focal-make

* json: print env vars in test
tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 18, 2024
* json: fix arrays (disallow `[,1]`)

* json: support tuple types (`[number, string]`)

* json: support additionalProperties (`{[k: string]: [string,number][]}`)

* json: support required / optional properties

* json: add support for pattern

* json: resolve $ref (and support https schema urls)

* json: fix $ref resolution

* join: support union types (mostly for nullable types I think)

* json: support allOf + nested anyOf

* json: support any (`{}` or `{type: object}`)

* json: fix merge

* json: temp fix for escapes

* json: spaces in output and unrestricted output spaces

* json: add typings

* json:fix typo

* Create ts-type-to-grammar.sh

* json: fix _format_literal (json.dumps already escapes quotes)

* json: merge lit sequences and handle negatives

{"type": "string", "pattern": "^({\"question\": \"[^\"]+\", \"response\": \"[^\"]+\"}\\n)+$"}

* json: handle pattern repetitions

* Update json-schema-to-grammar.mjs

* Create regex-to-grammar.py

* json: extract repeated regexp patterns to subrule

* Update json-schema-to-grammar.py

* Update json-schema-to-grammar.py

* Update json-schema-to-grammar.py

* json: handle schema from pydantic Optional fields

* Update json-schema-to-grammar.py

* Update json-schema-to-grammar.py

* Update ts-type-to-grammar.sh

* Update ts-type-to-grammar.sh

* json: simplify nullable fields handling

* json: accept duplicate identical rules

* json: revert space to 1 at most

* json: reuse regexp pattern subrules

* json: handle uuid string format

* json: fix literal escapes

* json: add --allow-fetch

* json: simplify range escapes

* json: support negative ranges in patterns

* Delete commit.txt

* json: custom regex parser, adds dot support & JS-portable

* json: rm trailing spaces

* Update json-schema-to-grammar.mjs

* json: updated server & chat `( cd examples/server && ./deps.sh )`

* json: port fixes from mjs to python

* Update ts-type-to-grammar.sh

* json: support prefixItems alongside array items

* json: add date format + fix uuid

* json: add date, time, date-time formats

* json: preserve order of props from TS defs

* json: port schema converter to C++, wire in ./server

* json: nits

* Update json-schema-to-grammar.cpp

* Update json-schema-to-grammar.cpp

* Update json-schema-to-grammar.cpp

* json: fix mjs implementation + align outputs

* Update json-schema-to-grammar.mjs.hpp

* json: test C++, JS & Python versions

* json: nits + regen deps

* json: cleanup test

* json: revert from c++17 to 11

* json: nit fixes

* json: dirty include for test

* json: fix zig build

* json: pass static command to std::system in tests (fixed temp files)

* json: fix top-level $refs

* json: don't use c++20 designated initializers

* nit

* json: basic support for reserved names `{number:{number:{root:number}}}`

* Revamp test cmake to allow args (WORKING_DIRECTORY needed for JSON test)

* json: re-ran server deps.sh

* json: simplify test

* json: support mix of additional props & required/optional

* json: add tests for some expected failures

* json: fix type=const in c++, add failure expectations for non-str const&enum

* json: test (& simplify output of) empty schema

* json: check parsing in test + fix value & string refs

* json: add server tests for OAI JSON response_format

* json: test/fix top-level anyOf

* json: improve grammar parsing failures

* json: test/fix additional props corner cases

* json: fix string patterns (was missing quotes)

* json: ws nit

* json: fix json handling in server when there's no response_format

* json: catch schema conversion errors in server

* json: don't complain about unknown format type in server if unset

* json: cleaner build of test

* json: create examples/json-schema-pydantic-example.py

* json: fix date pattern

* json: move json.hpp & json-schema-to-grammar.{cpp,h} to common

* json: indent 4 spaces

* json: fix naming of top-level c++ function (+ drop unused one)

* json: avoid using namespace std

* json: fix zig build

* Update server.feature

* json: iostream -> fprintf

* json: space before & refs for consistency

* json: nits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants