Execution model: Register-based Virtual Machine #641

Stranger6667 · 2024-11-22T10:10:18Z

Summary

This proposal suggests developing a register-based virtual machine (VM) for validating JSON Schemas. The approach builds on existing optimizations, such as super instructions (now done as multiple keywords in a single node), by introducing a bytecode-based execution model. This should improve runtime performance, simplify reference handling, and unlock/simplify bringing more features such as Rust code generation for statically defined schemas and efficient storage of precompiled validators (storing bytecode).

Motivation

The current implementation uses super instructions to optimize validation by combining multiple JSON Schema keywords into single operations, minimizing jumps and execution overhead. While effective, this approach is bound by the limitations of direct interpretation. A register-based VM executing bytecode could significantly improve performance by:

Reducing runtime overhead through bytecode interpretation which also has better spatial locality
Simplifying lazy validation by requiring only an instruction pointer and state to denote the current position within the input value, enabling pausing and resuming validation seamlessly.
Supporting precompilation of schemas into efficient Rust code via procedural macros.
Allowing schemas to be serialized into bytecode, stored, and reloaded for reuse, enabling VM execution directly from a slice in static memory.

Implementation Overview

Bytecode Generation:
• Extend the current schema compilation pipeline to produce a compact bytecode representation.
• Optimize instructions to leverage existing super instruction techniques while utilizing registers to minimize memory overhead.
VM Architecture:
• Design a register-based VM that interprets the generated bytecode.
• Ensure efficient use of static memory for schema execution by allowing the VM to operate on instruction slices.
• Support lazy validation with a minimal state comprising an instruction pointer and a position marker within the input value, allowing validation to be paused and resumed as needed.
Rust Code Generation:
• Provide an optional feature for generating Rust code from bytecode. Similar to what Python fastjsonschema does
• Leverage procedural macros to transform static schemas into compile-time Rust code, avoiding runtime schema compilation overhead.
Serialization and Deserialization:
• Implement bytecode serialization into a compact, storable format.
• Allow schemas to be deserialized and executed directly without requiring recompilation.

Advantages

• Performance: Faster execution through reduced interpretation overhead and precompiled Rust code.
• Lazy Validation: The bytecode design inherently supports lazy validation, simplifying the state management required for pause and resume operations.
• Flexibility: Serialized bytecode enables schema portability and reuse in environments with limited runtime resources.
• Static Optimizations: Procedural macros for static schemas eliminate runtime compilation entirely, further improving efficiency.

jacobzim-stl · 2024-12-09T22:59:06Z

It would be great if the byte code format was externally accessible, that way you could bring your own interpreter to implement custom functionality. E.g. If I want to find all json nodes matching a certain subschema.

It would amazing if the interpreter was built with a public API (even if unstable) that could be re-used.

Stranger6667 added Priority: High Topic: Performance Difficulty: Hard labels Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execution model: Register-based Virtual Machine #641

Execution model: Register-based Virtual Machine #641

Stranger6667 commented Nov 22, 2024 •

edited

Loading

jacobzim-stl commented Dec 9, 2024

Execution model: Register-based Virtual Machine #641

Execution model: Register-based Virtual Machine #641

Comments

Stranger6667 commented Nov 22, 2024 • edited Loading

Summary

Motivation

Implementation Overview

Advantages

jacobzim-stl commented Dec 9, 2024

Stranger6667 commented Nov 22, 2024 •

edited

Loading