Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare syncing for parallel sync strategies #3224

Merged
merged 11 commits into from
Feb 13, 2024

Conversation

dmitry-markin
Copy link
Contributor

@dmitry-markin dmitry-markin commented Feb 6, 2024

This PR should supersede #2814 and accomplish the same with less changes. It's needed to run sync strategies in parallel, like running ChainSync and GapSync as independent strategies, and running ChainSync and Sync 2.0 alongside each other.

The difference with #2814 is that we allow simultaneous requests to remote peers initiated by different strategies, as this is not tracked on the remote node in any way. Therefore, PeerPool is not needed.

CC @skunert

@dmitry-markin dmitry-markin added R0-silent Changes should not be mentioned in any release notes T0-node This PR/Issue is related to the topic “node”. labels Feb 6, 2024
substrate/client/network/sync/src/strategy.rs Show resolved Hide resolved
substrate/client/network/sync/src/strategy.rs Outdated Show resolved Hide resolved
substrate/client/network/sync/src/strategy.rs Outdated Show resolved Hide resolved
substrate/client/network/sync/src/strategy.rs Outdated Show resolved Hide resolved
substrate/client/network/sync/src/engine.rs Outdated Show resolved Hide resolved
substrate/client/network/sync/src/engine.rs Show resolved Hide resolved
Copy link
Contributor

@skunert skunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic LGTM! Just left some questions and minor remarks.

substrate/client/network/sync/src/strategy.rs Outdated Show resolved Hide resolved
substrate/client/network/sync/src/strategy.rs Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reading I was asking myself if it was feasible to use a message based system for the individual strategies too. Instead of the syncing engine calling individual on_XY handlers of the strategy, it could pass a message down to the individual strategies, and each strategy decides if it wants to handle it.

Would get rid of strategy knowing which substrategy is interested in the individual results. But yeah this is more an educational question for me (and maybe there are cases were this does not even work), nothing to act upon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, I'd like to comment on the feasibility of a fully async interface with strategies. When we implemented a bidirectional async interface between ProtocolController and Notifications, it turned out that it breaks the constraints of the originally half-synchronous interface (where we call on_XY handlers on ProtocolController and poll it as a stream for actions), requiring some tricks with handling of duplicate messages and discarding some "invalid" messages. We decided to just live with some smaller inconsistencies, because the proper implementation would require complex ACKing system with state machines in both ProtocolController and Notifications with lots of states. In syncing, our goal is to focus on a syncing state machines, and not on a message passing state machines — this is why we got rid of all the polling in the strategies.

On the other hand, what could work is introducing synchronous subscriber-like system, where we call a generic on_event(event: Event) handler that dispatches the events to specific strategies down the tree. Logically, this would be the same as calling specific on_XY handlers, but could replace the manual matches on active strategies with a subscriber-looking system, where a strategy instead registers itself for specific events. I'm not sure though if it's possible to have a "proper" Rust implementation of this without downcasting of abstract events when they reach specific strategies — otherwise, it looks like event matching would just move to strategies with a burden of them knowing about all the event types.

@altonen do you have something to add?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Sebastian is proposing is similar to the trait approach I've been harping about. If we store the active strategies in HashMap<StategyKey, Box<dyn Strategy>>, SyncingStrategy could iterate over all active strategies when handling an event and each strategy can decide whether it wants to handle that particular event. Of course if a key is provided, e.g., when a response is received, then SyncingStrategy would only call the specified strategy. This would clean up much of the code in SyncingStrategy and would allow plugging custom syncing implementations.

Like you described in the second paragraph, I don't see how Sebastian's proposal necessarily implies any async code though. I think the code would still work the way it does now but instead of SyncingStrategy checking explicitly if a strategy could be interested in an event, it won't make any assumptions, passes the event to the strategy and if it's not interested, it will just ignore it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the second paragraph captures pretty well what I meant. When we at some point add more strategies with different response handlers we would just add another message instead of adding a new on_XY on Strategy and the concrete implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I got it right, what @altonen is proposing implies that all strategies handle all event types. I.e., should provide on_XY handlers for all XY, implementing some generic Strategy trait, even though they may not be interested in all XY.

@dmitry-markin dmitry-markin added this pull request to the merge queue Feb 13, 2024
Merged via the queue into master with commit 96ebb30 Feb 13, 2024
129 of 130 checks passed
@dmitry-markin dmitry-markin deleted the dm-parallel-sync-strategies branch February 13, 2024 20:25
bgallois pushed a commit to duniter/duniter-polkadot-sdk that referenced this pull request Mar 25, 2024
This PR should supersede
paritytech#2814 and accomplish the
same with less changes. It's needed to run sync strategies in parallel,
like running `ChainSync` and `GapSync` as independent strategies, and
running `ChainSync` and Sync 2.0 alongside each other.

The difference with paritytech#2814
is that we allow simultaneous requests to remote peers initiated by
different strategies, as this is not tracked on the remote node in any
way. Therefore, `PeerPool` is not needed.

CC @skunert

---------

Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
R0-silent Changes should not be mentioned in any release notes T0-node This PR/Issue is related to the topic “node”.
Projects
Status: Blocked ⛔️
Development

Successfully merging this pull request may close these issues.

3 participants