Skip to content

Commit

Permalink
README: write section on generator
Browse files Browse the repository at this point in the history
  • Loading branch information
karalabe committed Jul 9, 2024
1 parent abc6648 commit 81e2953
Show file tree
Hide file tree
Showing 2 changed files with 208 additions and 28 deletions.
234 changes: 207 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ There are several possible outcomes from this experiment:

## Design

### Responsibilities

The `ssz` package splits the responsibility between user code and library code in the way pictured below:

![Scope](./docs/scope.svg)
Expand All @@ -40,7 +42,7 @@ The `ssz` package splits the responsibility between user code and library code i

### Weird shit

The [SSZ spec](https://github.com/ethereum/consensus-specs/blob/dev/ssz/simple-serialize.md) has schema definitions for mapping SSZ data to [JSON and YAML](https://github.com/ethereum/consensus-specs/blob/dev/ssz/simple-serialize.md#json-mapping). We believe in separation of concerns. This library does not (nor will ever) concern itself with encoding/decoding from formats other than SSZ.
The [Simple Serialize spec](https://github.com/ethereum/consensus-specs/blob/dev/ssz/simple-serialize.md) has schema definitions for mapping SSZ data to [JSON and YAML](https://github.com/ethereum/consensus-specs/blob/dev/ssz/simple-serialize.md#json-mapping). We believe in separation of concerns. This library does not (nor will ever) concern itself with encoding/decoding from formats other than SSZ.

## How to use

Expand All @@ -58,10 +60,10 @@ Some data types in Ethereum will only contain a handful of statically sized fiel
type Address [20]byte

type Withdrawal struct {
Index uint64 `ssz-size:"8"`
Validator uint64 `ssz-size:"8"`
Address Address `ssz-size:"20"`
Amount uint64 `ssz-size:"8"`
Index uint64
Validator uint64
Address Address
Amount uint64
}
```

Expand Down Expand Up @@ -117,21 +119,21 @@ type Hash [32]byte
type LogsBLoom [256]byte

type ExecutionPayload struct {
ParentHash Hash `ssz-size:"32"`
FeeRecipient Address `ssz-size:"20"`
StateRoot Hash `ssz-size:"32"`
ReceiptsRoot Hash `ssz-size:"32"`
LogsBloom LogsBLoom `ssz-size:"256"`
PrevRandao Hash `ssz-size:"32"`
BlockNumber uint64 `ssz-size:"8"`
GasLimit uint64 `ssz-size:"8"`
GasUsed uint64 `ssz-size:"8"`
Timestamp uint64 `ssz-size:"8"`
ExtraData []byte `ssz-max:"32"`
BaseFeePerGas *uint256.Int `ssz-size:"32"`
BlockHash Hash `ssz-size:"32"`
Transactions [][]byte `ssz-max:"1048576,1073741824"`
Withdrawals []*Withdrawal `ssz-max:"16"`
ParentHash Hash
FeeRecipient Address
StateRoot Hash
ReceiptsRoot Hash
LogsBloom LogsBLoom
PrevRandao Hash
BlockNumber uint64
GasLimit uint64
GasUsed uint64
Timestamp uint64
ExtraData []byte
BaseFeePerGas *uint256.Int
BlockHash Hash
Transactions [][]byte
Withdrawals []*Withdrawal
}
```

Expand Down Expand Up @@ -262,10 +264,10 @@ The previous `Withdrawal` is a good example. Let's replace the `type Address [20

```go
type Withdrawal struct {
Index uint64 `ssz-size:"8"`
Validator uint64 `ssz-size:"8"`
Address []byte `ssz-size:"20"`
Amount uint64 `ssz-size:"8"`
Index uint64
Validator uint64
Address []byte
Amount uint64
}
```

Expand All @@ -284,17 +286,195 @@ Notably, the `ssz.DefineStaticBytes` call from our old code (which got given a `

Note, *checked methods* entail a runtime cost. When decoding such opaque slices, we can't blindly fill the fields with data, rather we need to ensure that they are allocated and that they are of the correct size. Ideally only use *checked methods* for prototyping or for pre-existing types where you just have to run with whatever you have and can't change the field to an array.

### Generated types
## Generated encoders

More often than not, the Go structs that you'd like to serialize to/from SSZ are simple data containers. Without some particular quirk you'd like to explicitly support, there's little reason to spend precious time counting the bits and digging through a long list of encoder methods to call.

For those scenarios, the library also supports generating the encoding/decoding code via a Go command:

```go
go run github.com/karalabe/ssz/cmd/sszgen --help
```

### Inferred field sizes

Let's go back to our very simple `Withdrawal` type from way back.

TODO
```go
type Withdrawal struct {
Index uint64
Validator uint64
Address [20]byte
Amount uint64
}
```

This seems like a fairly simple thing that we should be able to automatically generate a codec for. Let's try:

```
go run github.com/karalabe/ssz/cmd/sszgen --type Withdrawal
```

Calling the generator on this type will produce the following (very nice I might say) code:

```go
// Code generated by github.com/karalabe/ssz. DO NOT EDIT.

package main

import "github.com/karalabe/ssz"

// SizeSSZ returns the total size of the static ssz object.
func (obj *Withdrawal) SizeSSZ() uint32 {
return 8 + 8 + 20 + 8
}

// DefineSSZ defines how an object is encoded/decoded.
func (obj *Withdrawal) DefineSSZ(codec *ssz.Codec) {
ssz.DefineUint64(codec, &obj.Index) // Field (0) - Index - 8 bytes
ssz.DefineUint64(codec, &obj.Validator) // Field (1) - Validator - 8 bytes
ssz.DefineStaticBytes(codec, obj.Address[:]) // Field (2) - Address - 20 bytes
ssz.DefineUint64(codec, &obj.Amount) // Field (3) - Amount - 8 bytes
}
```

It has everything we would have written ourselves: `SizeSSZ` and `DefineSSZ`... and it also has a lot of useful comments we for sure wouldn't have written outselves. Generator for the win!

Ok, but this was too easy. All the fields of the `Withdrawal` object were primitive types of known lengths, so there's no heavy lifting involved at all. Lets take a look at a juicier example.

### Explicit field sizes

For our complex test, lets pick our dynamic `ExecutionPayload` type from before, but lets make it as hard as it gets and remove all size information from the Go types (e.g. instead of using `[32]byte`, we can make it extra hard by using `[]byte` only).

Now, obviously, if we were to write serialization code by hand, we'd take advantage of our knowledge of what each of these fields is semantically, so we could provide the necessary sizes for a decoder to use. If we want to, however, generate the serialization code, we need to share all that "insider-knowledge" with the code generator somehow.

The standard way in Go world is through struct tags. Specifically in the context of this library, it will be through the `ssz-size` and `ssz-max` tags. These follow the convention set previously by other Go SSZ libraries;

- `ssz-size` can be used to declare a field having a static size
- `ssz-max` can be used to declare a field having a dynamic size with a size cap.
- Both tags support multiple dimensions via comma-separation and omitting via `?`

```go
type ExecutionPayload struct {
ParentHash []byte `ssz-size:"32"`
FeeRecipient []byte `ssz-size:"32"`
StateRoot []byte `ssz-size:"20"`
ReceiptsRoot []byte `ssz-size:"32"`
LogsBloom []byte `ssz-size:"256"`
PrevRandao []byte `ssz-size:"32"`
BlockNumber uint64
GasLimit uint64
GasUsed uint64
Timestamp uint64
ExtraData []byte `ssz-max:"32"`
BaseFeePerGas *uint256.Int
BlockHash []byte `ssz-size:"32"`
Transactions [][]byte `ssz-max:"1048576,1073741824"`
Withdrawals []*Withdrawal `ssz-max:"16"`
}
```

Calling the generator as before, just with the `ExecutionPayload` yields in the below, much more interesting code:

```go
// Code generated by github.com/karalabe/ssz. DO NOT EDIT.

package main

import "github.com/karalabe/ssz"

// SizeSSZ returns either the static size of the object if fixed == true, or
// the total size otherwise.
func (obj *ExecutionPayload) SizeSSZ(fixed bool) uint32 {
var size = uint32(32 + 32 + 20 + 32 + 256 + 32 + 8 + 8 + 8 + 8 + 4 + 32 + 32 + 4 + 4)
if fixed {
return size
}
size += ssz.SizeDynamicBytes(obj.ExtraData)
size += ssz.SizeSliceOfDynamicBytes(obj.Transactions)
size += ssz.SizeSliceOfStaticObjects(obj.Withdrawals)

return size
}

// DefineSSZ defines how an object is encoded/decoded.
func (obj *ExecutionPayload) DefineSSZ(codec *ssz.Codec) {
// Define the static data (fields and dynamic offsets)
ssz.DefineCheckedStaticBytes(codec, &obj.ParentHash, 32) // Field ( 0) - ParentHash - 32 bytes
ssz.DefineCheckedStaticBytes(codec, &obj.FeeRecipient, 32) // Field ( 1) - FeeRecipient - 32 bytes
ssz.DefineCheckedStaticBytes(codec, &obj.StateRoot, 20) // Field ( 2) - StateRoot - 20 bytes
ssz.DefineCheckedStaticBytes(codec, &obj.ReceiptsRoot, 32) // Field ( 3) - ReceiptsRoot - 32 bytes
ssz.DefineCheckedStaticBytes(codec, &obj.LogsBloom, 256) // Field ( 4) - LogsBloom - 256 bytes
ssz.DefineCheckedStaticBytes(codec, &obj.PrevRandao, 32) // Field ( 5) - PrevRandao - 32 bytes
ssz.DefineUint64(codec, &obj.BlockNumber) // Field ( 6) - BlockNumber - 8 bytes
ssz.DefineUint64(codec, &obj.GasLimit) // Field ( 7) - GasLimit - 8 bytes
ssz.DefineUint64(codec, &obj.GasUsed) // Field ( 8) - GasUsed - 8 bytes
ssz.DefineUint64(codec, &obj.Timestamp) // Field ( 9) - Timestamp - 8 bytes
ssz.DefineDynamicBytesOffset(codec, &obj.ExtraData) // Offset (10) - ExtraData - 4 bytes
ssz.DefineUint256(codec, &obj.BaseFeePerGas) // Field (11) - BaseFeePerGas - 32 bytes
ssz.DefineCheckedStaticBytes(codec, &obj.BlockHash, 32) // Field (12) - BlockHash - 32 bytes
ssz.DefineSliceOfDynamicBytesOffset(codec, &obj.Transactions) // Offset (13) - Transactions - 4 bytes
ssz.DefineSliceOfStaticObjectsOffset(codec, &obj.Withdrawals) // Offset (14) - Withdrawals - 4 bytes

// Define the dynamic data (fields)
ssz.DefineDynamicBytesContent(codec, &obj.ExtraData, 32) // Field (10) - ExtraData - ? bytes
ssz.DefineSliceOfDynamicBytesContent(codec, &obj.Transactions, 1048576, 1073741824) // Field (13) - Transactions - ? bytes
ssz.DefineSliceOfStaticObjectsContent(codec, &obj.Withdrawals, 16) // Field (14) - Withdrawals - ? bytes
}
```

Points of interests to note:

- The generator realized that this type contains dynamic fields (either through `ssz-max` tags or via embedded dynamic objects), so it generated an implementation for `ssz.DynamicObject` (vs. `ssz.StaticObject` in the previous section).
- The generator took into consideration all the size `ssz-size` and `ssz-max` fields to generate serialization calls with different based types and runtime size checks.
- *Note, it is less performant to have runtime size checks like this, so if you know the size of a field, arrays are always preferable vs dynamic lists.*

### Cross-validated field sizes

We've seen that the size of a field can either be deduced automatically, or it can be provided to the generator explicitly. But what happens if we provide an ssz struct tag for a field of known size?

```go
type Withdrawal struct {
Index uint64 `ssz-size:"8"`
Validator uint64 `ssz-size:"8"`
Address [20]byte `ssz-size:"32"` // Deliberately wrong tag size
Amount uint64 `ssz-size:"8"`
}
```

```go
go run github.com/karalabe/ssz/cmd/sszgen --type Withdrawal

failed to validate field Withdrawal.Address: array of byte basic type tag conflict: field is 20 bytes, tag wants [32] bytes
```

The code generator will take into consideration the information in both the field's Go type and the struct tag, and will cross validate them against each other. If there's a size conflict, it will abort the code generation.

This functionality can be very helpful in detecting refactor issues, where the user changes the type of a field, which would result in a different encoding. By having the field tagged with an `ssz-size`, such an error would be detected.

As such, we'd recommend *always* tagging all SSZ encoded fields with their sizes. It results in both safer code and self-documenting code.

### Go generate

Perhaps just a mention, anyone using the code generator should call it from a `go:generate` compile instruction. It is much simpler and once added to the code, it can always be called via running `go generate`.

### Multi-type ordering

When generating code for multiple types at once (with one call or many), there's one ordering issue you need to be aware of.

When the code generator finds a field that is a struct of some sort, it needs to decide if it's a static or a dynamic type. To do that, it relies on checking if the type implements the `ssz.StaticObject` or `ssz.DynamicObject` interface. If if doesn't implement either, the generator will error.

This means, however, that if you have a type that's embedded in another type (e.g. in our examples above, `Withdrawal` was embedded inside `ExecutionPayload` in a slice), you need to generate the code for the inner type first, and then the outer type. This ensures that when the outer type is resolving the interface of the inner one, that is already generated and available.

## Quick reference

The table below is a summary of the methods available for `SizeSSZ` and `DefineSSZ`:

- The *Size API* is to be used to implement the `SizeSSZ` method's dynamic parts.
- The *Symmetric API* is to be used if the encoding/decoding doesn't require specialised logic.
- The *Asymmetric API* is to be used if encoding or decoding requires special casing
- The *Asymmetric API* is to be used if encoding or decoding requires special casing.

*If some type you need is missing, please open an issue, so it can be added.*

| Type | Size API | Symmetric API | Asymmetric Encoding | Asymmetric Decoding |
|:---------------------------:|:---------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
Expand Down
Loading

0 comments on commit 81e2953

Please sign in to comment.