Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution model: Register-based Virtual Machine #641

Open
Stranger6667 opened this issue Nov 22, 2024 · 1 comment
Open

Execution model: Register-based Virtual Machine #641

Stranger6667 opened this issue Nov 22, 2024 · 1 comment

Comments

@Stranger6667
Copy link
Owner

Stranger6667 commented Nov 22, 2024

Summary

This proposal suggests developing a register-based virtual machine (VM) for validating JSON Schemas. The approach builds on existing optimizations, such as super instructions (now done as multiple keywords in a single node), by introducing a bytecode-based execution model. This should improve runtime performance, simplify reference handling, and unlock/simplify bringing more features such as Rust code generation for statically defined schemas and efficient storage of precompiled validators (storing bytecode).

Motivation

The current implementation uses super instructions to optimize validation by combining multiple JSON Schema keywords into single operations, minimizing jumps and execution overhead. While effective, this approach is bound by the limitations of direct interpretation. A register-based VM executing bytecode could significantly improve performance by:

  1. Reducing runtime overhead through bytecode interpretation which also has better spatial locality
  2. Simplifying lazy validation by requiring only an instruction pointer and state to denote the current position within the input value, enabling pausing and resuming validation seamlessly.
  3. Supporting precompilation of schemas into efficient Rust code via procedural macros.
  4. Allowing schemas to be serialized into bytecode, stored, and reloaded for reuse, enabling VM execution directly from a slice in static memory.

Implementation Overview

  1. Bytecode Generation:
    • Extend the current schema compilation pipeline to produce a compact bytecode representation.
    • Optimize instructions to leverage existing super instruction techniques while utilizing registers to minimize memory overhead.
  2. VM Architecture:
    • Design a register-based VM that interprets the generated bytecode.
    • Ensure efficient use of static memory for schema execution by allowing the VM to operate on instruction slices.
    • Support lazy validation with a minimal state comprising an instruction pointer and a position marker within the input value, allowing validation to be paused and resumed as needed.
  3. Rust Code Generation:
    • Provide an optional feature for generating Rust code from bytecode. Similar to what Python fastjsonschema does
    • Leverage procedural macros to transform static schemas into compile-time Rust code, avoiding runtime schema compilation overhead.
  4. Serialization and Deserialization:
    • Implement bytecode serialization into a compact, storable format.
    • Allow schemas to be deserialized and executed directly without requiring recompilation.

Advantages

• Performance: Faster execution through reduced interpretation overhead and precompiled Rust code.
• Lazy Validation: The bytecode design inherently supports lazy validation, simplifying the state management required for pause and resume operations.
• Flexibility: Serialized bytecode enables schema portability and reuse in environments with limited runtime resources.
• Static Optimizations: Procedural macros for static schemas eliminate runtime compilation entirely, further improving efficiency.

@jacobzim-stl
Copy link

It would be great if the byte code format was externally accessible, that way you could bring your own interpreter to implement custom functionality. E.g. If I want to find all json nodes matching a certain subschema.

It would amazing if the interpreter was built with a public API (even if unstable) that could be re-used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants