Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress the State Response Message in State Sync #112

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions text/0112-compress-state-response-message-in-state-sync.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# RFC-0112: Compress the State Response Message in State Sync

| | |
| --------------- | ------------------------------------------------------------------------------------------- |
| **Start Date** | 14 August 2024 |
| **Description** | Compress the state response message to reduce the data transfer during the state syncing |
| **Authors** | Liu-Cheng Xu |

## Summary

This RFC proposes compressing the state response message during the state syncing process to reduce the amount of data transferred.

## Motivation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the motivation to send less data or to improve state sync performance? Based on what you're saying, I think that implementing a persistence feature would be enough to solve this issue on slower network connections, as it allows you to request subsequent portions of the state incrementally.

Perhaps a better way to solve it is a combination of a persistence layer to store the state you are retrieving and to be able to stop/restart your node, and perhaps changing the request to ask for a certain amount of state.

Currently, the state request is like this

message StateRequest {
    // Block header hash.
    bytes block = 1;
    // Start from this key.
    // Multiple keys used for nested state start.
    repeated bytes start = 2; // optional
    // if 'true' indicates that response should contain raw key-values, rather than proof.
    bool no_proof = 3;
}

What I'm suggesting is to add a new field with a limit:

message StateRequest {
    // Block header hash.
    bytes block = 1;
    // Start from this key.
    // Multiple keys used for nested state start.
    repeated bytes start = 2; // optional
    // if 'true' indicates that response should contain raw key-values, rather than proof.
    bool no_proof = 3;
    // amount of keys we want to receive in the response
    uint32 max_keys = 4;
}

I know this could be configured as a different proposal, but I think it could be a better way to solve what you are saying without relying on compression algorithms. In the worst case, you could use a combination of both, asking for a partial state + compressing the response.

The trade-off might be that the number of requests could increase. wdyt?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback. There is already a persistent state sync tracking issue paritytech/polkadot-sdk#4 that I'm actively working on. I initiated this RFC because this is a low-hanging fruit to reduce the bandwidth that can lead to immediate improvements in state sync performance, while persistent state sync requires much more effort and complexity (may take too long to wrap it up). As you can see from the patch, a few change lines will significantly improve the bandwidth consumption. The proposed changes in the RFC are minimal yet impactful in terms of bandwidth savings.

In contrast, your suggestion to add a max_keys field, while valid, does not contribute to reducing any bandwidth. It may even lead to increased request frequency, which could negate some of the performance gains we seek.


State syncing can require downloading several gigabytes of data, particularly for blockchains with large state sizes, such as Astar, which
has a state size exceeding 5 GiB (https://github.com/AstarNetwork/Astar/issues/1110). This presents a significant
challenge for nodes with slower network connections. Additionally, the current state sync implementation lacks a persistence feature (https://github.com/paritytech/polkadot-sdk/issues/4),
meaning any network disruption forces the node to re-download the entire state, making the process even more difficult.

## Stakeholders

This RFC benefits all projects utilizing the Substrate framework, specifically in improving the efficiency of state syncing.

- Node Operators.
- Substrate Users.

## Explanation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this requires some implementation details, like what compression algorithm is going to be used

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the code changes link in L42, We'll use zstd as the compression tool, which has already been used in other places in the substrate. I can add a sentence to make it more explicit.


The largest portion of the state response message consists of either `CompactProof` or `Vec<KeyValueStateEntry>`, depending on whether a proof is requested ([source](https://github.com/paritytech/polkadot-sdk/blob/0cd577ba1c4995500eb3ed10330d93402177a53b/substrate/client/network/sync/src/state_request_handler.rs#L216-L241)):

- `CompactProof`: When proof is requested, compression yields a lower ratio but remains beneficial, as shown in warp sync tests in the Performance section below.
- `Vec<KeyValueStateEntry>`: Without proof, this is theoretically compressible because the entries are generated by iterating the
storage sequentially starting from an empty storage key, which means many entries in the message share the same storage prefix, making it ideal
for compression.

## Drawbacks

None identified.

## Testing, Security, and Privacy

The [code changes](https://github.com/liuchengxu/polkadot-sdk/commit/2556fefacd2e817111d838af5f46d3dfa495852d) required for this RFC are straightforward: compress the state response on the sender side and decompress it on the receiver side. Existing sync tests should ensure functionality remains intact.

## Performance, Ergonomics, and Compatibility

### Performance

This RFC optimizes network bandwidth usage during state syncing, particularly for blockchains with gigabyte-sized states, while introducing negligible CPU overhead for compression and decompression. For example, compressing the state response during a recent Polkadot warp sync (around height #22076653) reduces the data transferred from 530,310,121 bytes to 352,583,455 bytes — a 33% reduction, saving approximately 169 MiB of data.

Performance data is based on [this patch](https://github.com/liuchengxu/polkadot-sdk/commit/da93360c9a59c29409061789c598d8f4e55d7856), with logs available [here](https://github.com/liuchengxu/polkadot-sdk/commit/9d98cefd5fac0a001d5910f7870ead05ab99eeba).

### Ergonomics

None.

### Compatibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me that this needs some network protocol versioning. For example what happens if the receiver is not expecting compressed data? Or if the receiver is expecting compressed data but gets uncompressed data?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valid concern, and I agree that network versioning would be the ideal solution. However, it seems that Substrate doesn't currently support network protocol versioning, which might deserve another RFC.

In the meantime, we can introduce a backward-compatible workaround. For example, we could define a new StateRequestV1 with an additional compress_response: bool field. This field would default to false if not provided. Here's how it would work in practice, assuming an old node (OldNode) and an upgraded node (NewNode):

  • OldNode to NewNode (OldNode => NewNode):
    When OldNode sends a StateRequest payload to NewNode, the NewNode interprets it as StateRequestV1 { ..., compress_response: false }. Based on this field, NewNode will decide whether to compress the response, ensuring that OldNode receives the data in an uncompressed format, maintaining compatibility.

  • NewNode to OldNode (NewNode => OldNode):
    If NewNode sends a StateRequestV1 payload to OldNode, OldNode will fail to decode it, leading to an aborted sync. This is because OldNode does not recognize the new StateRequestV1 structure.

  • OldNode to OldNode and NewNode to NewNode:
    Syncing between nodes of the same version (OldNode => OldNode or NewNode => NewNode) will proceed as expected, with OldNodes using uncompressed data and NewNodes using compressed data when possible.

The main consequence of this approach is that NewNodes won't be able to sync state from OldNodes, but they can still sync with other NewNodes. This workaround isn't perfect, but it ensures some level of compatibility until a more comprehensive network versioning solution can be implemented in Substrate.

Copy link
Contributor

@alindima alindima Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some support for req/response protocol versioning in paritytech/polkadot-sdk#2771 and used it for the implementation of #47. Although it's not optimal. It would first try a request on the new version and if that's not supported try on the old protocol (so making 2 round trips if the remote does not support the new version). It was implemented this way mostly due to libp2p not having the most flexible interfaces.

In any case, the RFC should clearly state the interactions between new/old nodes

CC: @lexnv

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkchr Any more comments apart from the interactions between the old and new nodes? If the workaround I proposed in my last comment is fine, I can update the RFC.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimization makes sense from my perspective. We are trading off bandwidth for CPU cycles in hopes that this will yield an overall faster warp sync.

It would be interesting to see the comparison with times as well.
For example the sub-triage-logs tool has a newly added command that measures the warp-sync time:

# Provide the path to the logs outputted by a substrate-based node
cargo run -- warp-time --file ../substrate-log.txt

Indeed we'd need to think about versioning as well. We'd like to deprecate the older request-response protocols, so its worth adding as few breaking changes as possible and having a deprecation plan in mind:

  • when the NewNode sends out the V2 format to the OldNode, the old node will fail to decode. After a decoding error the OldNode will change the reputation of the NewNode and possibly disconnect the NewNode from its network
  • NewNode can try to send out the V1 format after the V2 was dropped, but ideally, we should avoid the first case

Offhand, it looks like a similar approach to paritytech/polkadot-sdk#2771 should solve this issue. Then if we are to introduce a new request-response version, should we also consider other compression algorithms? I think this complicates things, but would be interesting to see some comparison between different compression algorithms and polkadot specific data (ie maybe we end up saving 20% bandwidth instead of 33% but we are faster to compress)


No compatibility issues identified.

## Prior Art and References

None.

## Unresolved Questions

None.

## Future Directions and Related Material

None.