Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Plans for this library #455

Closed
DerRidda opened this issue Feb 26, 2024 · 6 comments
Closed

[Question] Plans for this library #455

DerRidda opened this issue Feb 26, 2024 · 6 comments
Labels
Type: Question Further information is requested

Comments

@DerRidda
Copy link
Contributor

Hello @Stranger6667,

first of all thank you so much for creating this wonderful library, especially with Python bindings. It's not only much faster than the typical choices in the Python ecosystem, but also more compliant.
Since we moved to it, one of our most expensive processing steps has turned into one of the cheaper ones.

Which brings me to my question: What are your plans for further maintenance and development for jsonschema-rs?

I raise this question as I see no commits have been made to master in quite a while and open PRs have not been engaged with and as somebody who uses it in production, it is my due diligence to check.

Don't misunderstood me, I fully understand and subscribe to the principle that FOSS is supplied as is, with no implied guarantees of anything, and I am fully okay with that reality.

If you don't intend to, or simply lack resources for further maintenance, maybe we can find a model of community ownership for the future or could justify your personal time investment via some sponsoring?

Again, thanks for creating jsonschema-rs in the first place, no matter what the answer to this question will be. :)

@Stranger6667
Copy link
Owner

Hey @DerRidda

Thank you for bringing this up and for your kind words!
For the last couple of months, I have been thinking about the next steps for this library and wanted to share my thoughts eventually with everybody interested in using the library.

First and foremost, I want to continue working on this library and bring it up to speed with the most recent JSON Schema drafts.

In the last two years, I've started three complete rewrites, none of which got through. There are two main reasons for this - complexity & scope.

Besides the inherent complexity of newer JSON Schema drafts (specifically in $ref handling), there is complexity coming from the library internals design I am aiming for. There are fundamental design flaws in the original design I made (see #373) that I want to address, and implementing a better design is hard. However, I think at this point I have a vision of how it should look like, to make this library more flexible, compliant, and performant.

I plan to publish the latest rewrite (a separate repo), continue all the development there for simplicity, and move the code here when there is feature parity. At the same time, I want to address some of the issues opened in this repo (e.g. Python 3.12 support which should not be hard) and publish new versions.

A somewhat longer-term vision is to build more components relevant to other projects I am involved in (JSON Schema canonicalization for hypothesis-jsonschema, faster JSON Schema validation in schemathesis, etc) and reuse them. I was postponing them for a long time, but it feels like it the time to get back to them.

If you don't intend to, or simply lack resources for further maintenance, maybe we can find a model of community ownership for the future or could justify your personal time investment via some sponsoring?

Involving the community more will help and I want to have more people contributing & maintaining the library! I also think that the library will benefit if all the development happens in a more public manner (as opposed to my semi-private work).

Re: sponsoring - it is something I appreciate a lot and would be happy if somebody is willing to contribute this way :)

In conclusion, my immediate next step is to write a high-level plan in the form of a blog post / pinned issue and continue with the implementation.

Let me know if you have any questions, I'd be happy to discuss them :)

@Stranger6667 Stranger6667 pinned this issue Feb 26, 2024
@DerRidda
Copy link
Contributor Author

Thanks for the quick answer. Reading this leaves me optimistic and I am looking forward to reading the more detailed plan.

For our use case the only issue that is somewhat time critical is the Python 3.12 version support you mentioned. But even there we are not blocked from anything meaningful right now, as 3.12 was not the most spectacular release. This is more about a look into the future with 3.13 and beyond, which will most likely be much more interesting releases.

@Stranger6667 Stranger6667 added the Type: Question Further information is requested label May 10, 2024
@Stranger6667
Copy link
Owner

Stranger6667 commented Sep 20, 2024

As it has been around half a year since the start of the discussion, it will be helpful to post an update!

API

The library continues to move to its 1.0 API and the recent 0.20.0 brought a few changes in naming & ergonomics. I plan to make further changes in the error-related types and likely the API will have validate that returns the first error + iter_errors that will iterate over all of them. Python bindings already have this btw.

Custom keywords are supported in the Rust crate and I plan to add support on the Python level too, similar to custom formats.

The recent release got significantly updated documentation, which I hope will simplify the onboarding & answer the common usage questions & help with migration from earlier versions.

Spec compliance

With the recent release drafts 4, 6, and 7 are 100% supported (it is not yet reflected in bowtie though) and I am working towards supporting anchors & dynamic refs in more recent drafts. Implementing these should bring the compatibility to ~98-99% and there will be only unevaluatedItems left. Then I plan to improve the optional part of the spec including ECMA Script regex support and some formats.

At some point before 1.0 I am going to address all the output format styles but likely will take them from the next version of JSON Schema (they have simplified naming & fewer options there).

Async

Reference resolving will be happening before validation and references to remote resources will be processed in batches, which opens up an opportunity to resolve them all concurrently. There will be an async version of build / validator_for / etc + it will affect the API for reference resolvers (as they will need to work with batches)

UPDATE: Resolving of external references now happens just once.

Performance

I still plan to overhaul the internals to squeeze more performance, but only after getting 100% spec compliance. However, there will be intermediate steps:

  • DONE: Resolving references upfront will remove the need for locks during validation. I was doing some local experiments a few years ago and performance improvements were moderate in the range of 5-8%.
  • More validator groups. Right now if a schema has a few popular keywords at once they could be grouped into a single validator node. It works only for some keywords applicable to the object type. I want to expand it to numbers & strings, i.e. minimum & maximum & type often go together - mashing them into a single node will make the graph more compact and likely faster. Instead of doing it completely manually, I plan to implement codegen for this purpose.
  • Explore enum-based dispatch. I am not hopeful for this one, but worth checking.

What else?

In 0.21 I want to add support for custom reference resolving to Python bindings as well as switching to ABI3 wheels, simplifying CI, and supporting PyPy.

@jqnatividad
Copy link
Contributor

Just wanted to confirm that 0.21 has been a dream! Thanks for your hard work on this crate.

For my application of CSV validation, I can do 761,035 rows/second for a non-trivial JSON Schema!

@Stranger6667
Copy link
Owner

@jqnatividad Delighted to hear it! Thank you so much for your kind words :) I appreciate it a lot!

@Stranger6667 Stranger6667 unpinned this issue Oct 4, 2024
@Stranger6667
Copy link
Owner

Thank you all! Closing this in favor of #566 which will be periodically updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants