Optimize fork sync targets handling #1199

prybalko · 2023-08-28T06:50:25Z

Resolves #525.

Utilize a BTreeMap to store fork_targets, providing efficient fork selection based on the block number.

altonen · 2023-08-31T10:48:14Z

substrate/client/network/sync/src/lib.rs

+	// Iterate in reverse order to download target with the highest number first.
+	// Other targets might belong to the same fork, so we don't need to download them separately.
+	for ((number, hash), r) in targets.range(range).rev() {


Could you explain this logic in more detail? It checks the fork targets in reverse order by (number, hash) so the last block of the fork is checked first and if it's not available, it proceeds to the parent of the latest block, am I understanding this correctly?

Not quite. The logic goes like this:

If there are multiple fork_targets, in what order should they be checked?
I believe the optimal approach would be to begin with the fork_target with the highest number (aka latest). If it's available, then download (number - finalized) blocks. If not, go to the next fork_target (which can be unrelated to the first one).
The optimization lies in the fact that other fork_targets with lower numbers might be part of the same fork that has already been downloaded.

Note: at this point fork_targets have already been filtered by number < best_num + max_blocks_per_request

I haven't tested running the original test yet with this optimization applied but how much better it got, do you remember?

It looks like it won't even traverse through the fork targets that are too far in to the future which is what we want but I think this could be optimized even further. Right now if a peer is not able to provide any fork targets, it still has to go through all of them which is not optimal. Which fork targets the peer is able to provide should probably be stored so the list of targets wouldn't have to be looped through each time we call fork_sync_request(). This will admittedly complicate the design and as @dmitry-markin is already working on the syncing code, it may be preferable to not make significant refactors until we've decided how the ChainSync should be structured.

how much better it got, do you remember?

Althought fork_targets were clearly stored not in optimal way, I didn't see any significalt changes in preformance. In my tests, the difference was negligible, as the time spent in fork_sync_request is in microseconds even for 10k blocks.

The logic itself looks good, but I'm not sure if this improvement makes significant difference. The thing is, due to the bug paritytech/substrate#12830 fork_sync_request is only capable of downloading forks of less than max_blocks_per_request (128 blocks). The performance difference on this scale can be not that big. For longer forks, may be, we somehow invert (or at least should) the notion of fork vs. canonical chain and download the fork as a canonical chain if we are lucky (or get stuck like in the issue above).

Also, as a downside, it looks like if we have multiple fork targets on the same fork, this change increases the chances that we hit a 128 block limit (as we are starting from the higher fork targets) and never import the fork.

The problems described are due to the bug paritytech/substrate#12830 and not due to this PR per se.

bkchr · 2023-11-08T14:29:04Z

@altonen @dmitry-markin what are we gonna do with this?

altonen · 2023-11-08T16:15:20Z

@altonen @dmitry-markin what are we gonna do with this?

This would need a Versi burn-in at least and the last time I spoke with Pavel about this, there were some confusion (at least on my side) about why this didn't improve the original test that was very slow when I last debugged it a year ago. I think we can keep this open and revisit when the sync refactoring @dmitry-markin has started is done and we have some time look into the fork targets.

Optimize fork sync targets handling

6221582

prybalko added R0-silent Changes should not be mentioned in any release notes I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. labels Aug 28, 2023

fix fmt

606c3c8

prybalko requested a review from altonen August 28, 2023 07:01

prybalko added 2 commits August 29, 2023 23:49

Merge branch 'master' into optimize-sync-fork-targets

fc259ff

Merge branch 'master' into optimize-sync-fork-targets

6b341ef

altonen reviewed Aug 31, 2023

View reviewed changes

altonen requested a review from dmitry-markin August 31, 2023 11:35

prybalko added 2 commits September 4, 2023 20:28

Merge branch 'master' into optimize-sync-fork-targets

30736f3

Merge branch 'master' into optimize-sync-fork-targets

430e3e5

bkchr pushed a commit that referenced this pull request Apr 10, 2024

Parachains finality relay (#1199)

03c2f06

bkchr closed this Jul 17, 2024

bkchr deleted the optimize-sync-fork-targets branch July 17, 2024 12:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize fork sync targets handling #1199

Optimize fork sync targets handling #1199

prybalko commented Aug 28, 2023

altonen Aug 31, 2023

prybalko Aug 31, 2023 •

edited

Loading

altonen Aug 31, 2023

prybalko Aug 31, 2023 •

edited

Loading

dmitry-markin Aug 31, 2023

bkchr commented Nov 8, 2023

altonen commented Nov 8, 2023

Optimize fork sync targets handling #1199

Optimize fork sync targets handling #1199

Conversation

prybalko commented Aug 28, 2023

altonen Aug 31, 2023

Choose a reason for hiding this comment

prybalko Aug 31, 2023 • edited Loading

Choose a reason for hiding this comment

altonen Aug 31, 2023

Choose a reason for hiding this comment

prybalko Aug 31, 2023 • edited Loading

Choose a reason for hiding this comment

dmitry-markin Aug 31, 2023

Choose a reason for hiding this comment

bkchr commented Nov 8, 2023

altonen commented Nov 8, 2023

prybalko Aug 31, 2023 •

edited

Loading

prybalko Aug 31, 2023 •

edited

Loading