-
Notifications
You must be signed in to change notification settings - Fork 797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync 2.0 #534
Comments
cc @bkchr |
Is this supposed to work with chain that doesn't have finality? I'm worried about growing number of places where finality is implicitly assumed to exist in Substrate, making it less generic. Also I'm interested in clearer separation between initial and keep-up sync here. |
It may not be finality in the same sense, as the the actual finality engine used. Substrate indeed already has a limit on maximum possible re-org. So in the simplest case sync may consider "final" any block past that threshold. But yes, this strategy uses the notion of finality to make thing simpler. We may make the sync strategy pluggable to support other cases though. |
We must support chains without their own finality, since this describes parachains. We do not afaik need chains without any notion of finality, because afaik they're not really secure anyways. A priori, I think this proposal makes sync easier to understand, which maybe helps us do async time thing by Handan (and Peter Czaban) and do my slashing reform idea. I've not really thought about the details though.. At least slashing reform cares most about sync ahead of finality, which is not really the topic here I guess, so maybe just orthogonal, not sure about async time. |
Do you suggest for Substrate to not support PoW and similar consensus mechanisms that don't have deterministic finality at all? |
Yes exactly.. Proof-of-work is not secure. It's especially bad on smaller chains ala https://www.crypto51.app/ but bitcoin shall be double spent eventually too ala https://economics.princeton.edu/working-papers/on-the-instability-of-bitcoin-without-the-block-reward/ We'll never fully support proof-of-work parachains anyways. We need pre-collation proofs in too many places, like parachain block submission, XCMP message transport, etc, so attackers could halt proof-of-work parachains fairly easily. And proof-of-work parachains would de facto pay 10x more for their parachain slot, assuming they miss like 9/10 relay chain slots. Intuitively, if you've a properly engineered protocol then you've somewhat tight tolerances everywhere, so proof-of-work anywhere typically enables DoS attacks upon your larger protocol. |
We are working on a similar rewrite in aleph-node for probably very similar reasons. Some crucial differences in our approach:
Some of these differences are due to us using our own finality gadget with somewhat different base assumptions than GRANDPA (thus 2 and 4), some are motivated by a much lower block production time (thus 2.1 and 3.2), and some are probably just random results of different approaches of people doing the design. I'm mentioning this in case we might be able to avoid some doubled effort in this area, or at least maybe share some insights. |
We would also like syncing to be decoupled from networking as much as possible but given that syncing is the default set protocol, it has some implications which we can't get rid of such as other protocols expecting it to exist and syncing being "handshake protocol", and it will always be treated as a sort of a different protocol. As far as keep-up/initial sync is concerned, it doesn't have to be as aggressive as it's the document right now. It can be adjusted to any threshold we find suitable. Obviously if the history download takes a long time, some of the announced blocks can spill over into the initial sync and it only switches to keep-up sync only after finalizing a very recent block. One reason to keep the syncing modes separate is to make the code more clearly separate and I fail to see why this would be unfeasible for your version. Could you elaborate on that? |
Why can we not change this?
Other protocols that want to follow sync (like grandpa, transaction sync etc) should use a generic stream of events PeerConnected/PeerDisconnected. This could be provided by everybody. AFAIR we already started doing this and not relying on events from network service telling us that a sync peer was connected. |
The main limitation has to do with handshaking (basically the assumption that the Right now the only protocol that is being followed is the sync protocol because it's considered the default but that already happens through a mechanism you're describing. But it's only implemented for syncing because we only need it for the sync protocol. But it can be implemented for every protocol if it's needed |
In our case we were planning to replace the normal sync/default set protocol with a near-noop, that essentially only performs the handshake.
I meant the other way around – we are trying to have a single mode, which would handle both syncing when a node is up to date, as well as when it's behind. This way we don't need any switching mechanism between modes and we don't need separate logic at all. The thing that might be unfeasible, is to have both that and the ability to "catch up" sufficiently fast from being far behind – we suspect we should be able to do that, but in case it's not possible we will introduce a separate "initial sync" mode. The switching between modes being too aggressive is not hypothetical for us btw – we had some problems with nodes "flapping" between major and normal sync when under load or with some network disruptions (in sync 1.0 with a 1s block prod time, and some other modifications), which sometimes resulted with the node ending up in a broken state. |
…e` (#1650) Move request-response handling from `ChainSync` to `SyncingEngine` as part of [Sync 2.0](#534) refactoring aimed at making `ChainSync` a pure state machine. Resolves #502. --------- Co-authored-by: Aaro Altonen <48052676+altonen@users.noreply.github.com> Co-authored-by: Bastian Köcher <git@kchr.de>
* upped the version of tracing * find/replace * added scale-info * added scale-info to more pallets * more build fixes * fixes * fixed imports * removed metadata * upped serde and codec versions * changed BaseCallFilter to Everything * added max instructions to FixedWeightBounds * added OperationalFeeMultiplier * import Everything for BaseCallFilter * added missing implementations * added force_unreserve implementation * removed commented code * return Vec from WeakBoundedVec * fixed OpaqueMetadata conversion * use BenchmarkError::Stop * using struct instead of method call * use struct patterns * changed mock CallFilter to use Everything * added scale-info/std to std feature * removed native_executor_instance macro * updated jsonrpc * use tokio_handle * renamed task_executor * removed unused imports * updated @polkadot/api to 6.8.1 because of new metadata * rustfmt * rustfmt with the right settings
…e` (paritytech#1650) Move request-response handling from `ChainSync` to `SyncingEngine` as part of [Sync 2.0](paritytech#534) refactoring aimed at making `ChainSync` a pure state machine. Resolves paritytech#502. --------- Co-authored-by: Aaro Altonen <48052676+altonen@users.noreply.github.com> Co-authored-by: Bastian Köcher <git@kchr.de>
This PR is part of [Sync 2.0](paritytech#534) refactoring aimed at making `ChainSync` a pure state machine. Resolves paritytech#501.
This is a tracking for the new implementation of syncing protocol and algorithm in substrate. It mostly concerns full/fast sync, but not warp sync.
Current implementation issues.
Proposed new protocol.
On connection peers exchange their best finalized block and all leaf blocks that are not yet finalized or discarded.
On import of each block, node sends out an announcement to all connected peers. Announcement includes newly imported block information along with the best finalized block for a peer.
Proposed syncing algorithm.
For each connected peer we maintain a set of leaves that they have and common finalized number. These are updated on each announcement.
There are two major syncing modes:
Initial sync. In this mode the node syncs a single finalized chain. It is triggered on startup or when majority of peers start announcing blocks with unknown parents (i.e. after an offline period). Since this is a canonical chain, series of blocks may be requested by number allowing for parallel block download from multiple peers. There's no need to do ancestry search as we can simply assume that the common block is the one with a min number in a finalized chain.
Keep-up sync. Node switches to this mode when it finalizes a recent block. In this mode sync simply requests chains that peers have, starting from each leaf and back to the best finalized block (or existing parent block).
Tasks
SyncingEngine
#503ChainSync
toSyncingEngine
#502ChainSync
toSyncingEngine
#501sc-network-common
#1738The text was updated successfully, but these errors were encountered: