Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap #566

Open
7 of 12 tasks
Stranger6667 opened this issue Oct 4, 2024 · 2 comments
Open
7 of 12 tasks

Roadmap #566

Stranger6667 opened this issue Oct 4, 2024 · 2 comments

Comments

@Stranger6667
Copy link
Owner

Stranger6667 commented Oct 4, 2024

This is a somewhat detailed roadmap for the development of this crate as I'd like to be transparent about what changes the user may anticipate. I will update it roughly once a month with all the main milestones.

Right now the current version is 0.26.0. This plan is not set in store and some other features could be implemented outside of these milestones (e.g. regex config support, or moving some dependencies behind features).

❤️ If you'd like to help the development, consider sponsoring me (like Sentry does)

Full JSON Schema support - DONE 🎉

Missing non-optional test cases:

  • 6 failing unevaluatedItems ($ref, $recursiveRef & $dynamicRef support)
  • vocabulary
  • 2 failing unevaluatedProperties ($ref ordering issue)
  • 1 failing $ref in Draft 2019-09

Some optional will be resolved along with the ones above, however, some bignum tests for Draft 4 depend on parsing numbers with arbitrary precision which I plan to address later.

UPDATE 2024-10-12: Most of the unevaluatedItems keyword logic is implemented and only referencing support is left.
UPDATE 2024-10-20: Vocabulary support is done, now there are only 3 required tests not passing. These are bugs in the current implementation, so I'd consider support of Drafts 2019-09 & 2020-12 as done and will fix those cases later on.
UPDATE 2024-10-24: Finished the last few remaining cases.

Better errors

The main issue with errors is that validate returns an iterator, but I'd rather have validate + iter_errors where the former returns Result<(), ValidationError>

As the end goal the errors should:

  • Support customization of error messages
  • Be easily usable without lifetime issues

I'd also think about separating schema & instance errors.

It would be nice to rework the error iterator, so it is bound to the validator instance (as it uses its sub-validators). There are some performance issues I'd like to fix, i.e. collecting errors into vectors during validation.

UPDATE 2024-10-26: The older validate version is replaced with validate + iter_errors

Non-blocking resolving

Right now, the resolving of external references happens during the building phase, which makes it relatively straightforward to implement non-blocking resolving (actually, I'd like to retrieve resources in batches rather than sequentially too).

Along with this change, I'd like to expose custom resolvers to Python bindings + hopefully with async too, but not 100% sure about this.

Output formats

Adopt naming from the "next" draft + implement a hierarchical style. At this point, I want to separate all the annotation storage from the main graph, as it implies overhead for is_valid and validate.

Rework Python bindings

  • ABI3 wheels
  • CI cleanup
  • PyPy support
  • Python 3.13 (it could happen earlier too)
  • Custom retrievers
  • Output formats

For a long time, I wanted to implement generic JSON input, I'll try to work on it at this stage. In Python bindings, sometimes the overhead data (de)-serialization during building a validator / validation is up to 80%, having generic input will greatly speed up everything.

Arbitrary precision also goes here, I'd like to have it behind a flag, so it could be disabled by default, but enabled in Python bindings.

WASM

It should "just work", maybe except for filesystem resolving, but network stuff could work via web_sys (I've tried it in css-inline).

  • Run tests on WASM as a part of CI
  • Demo website with WASM

Performance, performance, performance

I have tons of ideas about it as my main use case is to speed up the generation of instances that match a schema (in hypothesis-jsonschema & schemathesis), hence I am going to focus on the is_valid performance because the exact errors don't matter in this case.

@jpmckinney
Copy link
Contributor

jpmckinney commented Dec 22, 2024

Along with this change, I'd like to expose custom resolvers to Python bindings + hopefully with async too, but not 100% sure about this.

I presently use python-jsonschema, where I use custom resolvers for a few reasons:

  1. The default resolver follows file:// URLs. If an application dereferences and renders a user-provided schema, this can be used to read any JSON file on the filesystem to which the web process has access. I therefore use a custom resolver that only resolves HTTP and HTTPS URLs (this is vulnerable to server-side request forgery, but I'm less worried about that).
  2. I work with JSON Schema where users are allowed to "patch" the default schema, using JSON Merge Patch (RFC 7386). I use python-jsonschema's registry to make references to the default schema resolve to the patched schema.

It would be really nice to be able to control (1).

For (2), would a workaround be to provide only dereferenced schema to jsonschema_rs?

@Stranger6667
Copy link
Owner Author

Hey @jpmckinney

Thank you for bringing this up!

  1. Good to know this exact use case, I think that exposing resolvers should not be an issue and could be implemented similarly to how custom format validators work right now.
  2. I am not 100% sure about a workaround (if you could provide an example, it would help), however, as this project practically followed the same registry-based design for references, I think we can expose the same functionality to Python bindings and it should be enough for your use case. It also seems like the feature is quite related to Validating individual definitions #432 & Support validating Open API sub-schemas #452

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants