You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This proposal suggests developing a register-based virtual machine (VM) for validating JSON Schemas. The approach builds on existing optimizations, such as super instructions (now done as multiple keywords in a single node), by introducing a bytecode-based execution model. This should improve runtime performance, simplify reference handling, and unlock/simplify bringing more features such as Rust code generation for statically defined schemas and efficient storage of precompiled validators (storing bytecode).
Motivation
The current implementation uses super instructions to optimize validation by combining multiple JSON Schema keywords into single operations, minimizing jumps and execution overhead. While effective, this approach is bound by the limitations of direct interpretation. A register-based VM executing bytecode could significantly improve performance by:
Reducing runtime overhead through bytecode interpretation which also has better spatial locality
Simplifying lazy validation by requiring only an instruction pointer and state to denote the current position within the input value, enabling pausing and resuming validation seamlessly.
Supporting precompilation of schemas into efficient Rust code via procedural macros.
Allowing schemas to be serialized into bytecode, stored, and reloaded for reuse, enabling VM execution directly from a slice in static memory.
Implementation Overview
Bytecode Generation:
• Extend the current schema compilation pipeline to produce a compact bytecode representation.
• Optimize instructions to leverage existing super instruction techniques while utilizing registers to minimize memory overhead.
VM Architecture:
• Design a register-based VM that interprets the generated bytecode.
• Ensure efficient use of static memory for schema execution by allowing the VM to operate on instruction slices.
• Support lazy validation with a minimal state comprising an instruction pointer and a position marker within the input value, allowing validation to be paused and resumed as needed.
Rust Code Generation:
• Provide an optional feature for generating Rust code from bytecode. Similar to what Python fastjsonschema does
• Leverage procedural macros to transform static schemas into compile-time Rust code, avoiding runtime schema compilation overhead.
Serialization and Deserialization:
• Implement bytecode serialization into a compact, storable format.
• Allow schemas to be deserialized and executed directly without requiring recompilation.
Advantages
• Performance: Faster execution through reduced interpretation overhead and precompiled Rust code.
• Lazy Validation: The bytecode design inherently supports lazy validation, simplifying the state management required for pause and resume operations.
• Flexibility: Serialized bytecode enables schema portability and reuse in environments with limited runtime resources.
• Static Optimizations: Procedural macros for static schemas eliminate runtime compilation entirely, further improving efficiency.
The text was updated successfully, but these errors were encountered:
It would be great if the byte code format was externally accessible, that way you could bring your own interpreter to implement custom functionality. E.g. If I want to find all json nodes matching a certain subschema.
It would amazing if the interpreter was built with a public API (even if unstable) that could be re-used.
Summary
This proposal suggests developing a register-based virtual machine (VM) for validating JSON Schemas. The approach builds on existing optimizations, such as super instructions (now done as multiple keywords in a single node), by introducing a bytecode-based execution model. This should improve runtime performance, simplify reference handling, and unlock/simplify bringing more features such as Rust code generation for statically defined schemas and efficient storage of precompiled validators (storing bytecode).
Motivation
The current implementation uses super instructions to optimize validation by combining multiple JSON Schema keywords into single operations, minimizing jumps and execution overhead. While effective, this approach is bound by the limitations of direct interpretation. A register-based VM executing bytecode could significantly improve performance by:
Implementation Overview
• Extend the current schema compilation pipeline to produce a compact bytecode representation.
• Optimize instructions to leverage existing super instruction techniques while utilizing registers to minimize memory overhead.
• Design a register-based VM that interprets the generated bytecode.
• Ensure efficient use of static memory for schema execution by allowing the VM to operate on instruction slices.
• Support lazy validation with a minimal state comprising an instruction pointer and a position marker within the input value, allowing validation to be paused and resumed as needed.
• Provide an optional feature for generating Rust code from bytecode. Similar to what Python fastjsonschema does
• Leverage procedural macros to transform static schemas into compile-time Rust code, avoiding runtime schema compilation overhead.
• Implement bytecode serialization into a compact, storable format.
• Allow schemas to be deserialized and executed directly without requiring recompilation.
Advantages
• Performance: Faster execution through reduced interpretation overhead and precompiled Rust code.
• Lazy Validation: The bytecode design inherently supports lazy validation, simplifying the state management required for pause and resume operations.
• Flexibility: Serialized bytecode enables schema portability and reuse in environments with limited runtime resources.
• Static Optimizations: Procedural macros for static schemas eliminate runtime compilation entirely, further improving efficiency.
The text was updated successfully, but these errors were encountered: