Skip to content

Latest commit

 

History

History
88 lines (68 loc) · 4.32 KB

README.md

File metadata and controls

88 lines (68 loc) · 4.32 KB

The electrumsv-bsor package

Tokenized have developed a serialisation format that encodes in Bitcoin script object representation, or BSOR for short. This package uses the test files from Tokenized's project in order to verify that it can both convert a binary representation to a decoded object, and to convert that decoded object back to the same binary representation.

This package loosely follows the standard dump/dumps/load/loads API that standard library modules like json and pickle provide, although there are some extra requirements for this package due to the need to the way data structures are defined.

The package API

  • electrumsv_bsor.dump(object, stream, structure_metadata)
  • data: bytes = electrumsv_bsor.dump(object, structure_metadata)
  • object = electrumsv_bsor.load(stream, definition, class_reference)
  • object = electrumsv_bsor.loads(data, definition, class_reference)

For now, the nuances of how to use this are best observed in the test files.

Structure markup in Python

The structure_metadata is how the encoder and decoder works out how to encode or decode different types of objects. For non-structures like PublicKey they generally already provide encoding and decoding methods directly to and from the desired encoding data type. For structures like XTestSubStruct these must meet the definition requirements and considered a special serialisation OBJECT data type.

    structure_metadata: dict[str, tuple[Any, Any, FieldType]] = {
        "PublicKey": (PublicKey.from_bytes, lambda instance: instance.to_bytes, FieldType.BYTES),
        "XTestSubStruct": (XTestSubStruct, None, FieldType.OBJECT),
    }

Structures define their own BSOR encoding format and this is done using the standard library dataclasses module.

@dataclass
class XTestStructSimple:
    IntField: int = field(metadata={ "bsor_id": 1 })
    StringField: str = field(metadata={ "bsor_id": 2 })
    IntZeroField: int = field(metadata={ "bsor_id": 3 })
    SubStruct: XTestSubStruct = field(metadata={ "bsor_id": 4     })
    BinaryField: bytes = field(metadata={ "bsor_id": 5 })
    IntPointerField1: int | None = field(metadata={ "bsor_id": 6 })
    IntPointerField2: int | None = field(metadata={ "bsor_id": 7 })
    PublicKeyField: PublicKey = field(metadata={ "bsor_id": 8  })
    ArrayStringPtrField: list[str|None] = field(metadata={ "bsor_id": 25 })

Each defined field in the structure must have a field identifier (bsor_id in the data classes metadata) that matches the same value in any other definitions of the same structure in other projects. The type of the field is drawn from the Python typing annotations, for example, in the case of the IntField field it is int.

Other supported metadata entries are:

  • bsor_length is used for fixed lengths of fields, whether string, bytes, lists or other types.
  • bsor_type can be provided for fields with float annotation, and may be either FieldType.FLOAT or FieldType.DOUBLE.

Go pointer equivalence is considered to represented by Optional Python values and are indicated with use of the | None Python type annotation.

Warning

Any use of this package is best preceded with test data that exercises the structures to be used for all variations.

Implementation notes

There are various things that need testing and further work and are not hit by the test files borrowed from the Tokenized project:

  • bsor_type values are not actually tested in the test data, and it is possible that decoding of FieldType.DOUBLE values does not work. Encoding does not check this metadata field yet.
  • bsor_length likely has a lot of edge cases that are not covered and may not even be representable in the current structures (lists of fixed length strings for instance).

The Python dataclasses standard module keeps type names in string format, which means there is no way of matching the type names to the classes being referred to. This is what the structure_metadata dictionary provides coverage of, both structures and encoded data types can be declared by the given dataclasses type name in this dictionary.