Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activate Mempool storage based on a list of recent syncer obtain/extend lengths #2592

Closed
Tracked by #2309
mpguerra opened this issue Aug 9, 2021 · 11 comments · Fixed by #2722
Closed
Tracked by #2309

Activate Mempool storage based on a list of recent syncer obtain/extend lengths #2592

mpguerra opened this issue Aug 9, 2021 · 11 comments · Fixed by #2722
Assignees
Labels
A-rust Area: Updates to Rust code

Comments

@mpguerra
Copy link
Contributor

mpguerra commented Aug 9, 2021

Motivation

We want to control when we activate mempool, for testing purposes as well as when zebra reaches the chain tip.

Specifications

Designs

Testing

Test that activation happens near the chain tip

Related Work

This change depends on ticket #2595.

@mpguerra mpguerra added C-enhancement Category: This is an improvement S-needs-triage Status: A bug report needs triage labels Aug 9, 2021
@mpguerra mpguerra added this to the 2021 Sprint 16 milestone Aug 9, 2021
@teor2345 teor2345 changed the title Implement mechanism to Activate Mempool Activate Mempool based on a list of recent syncer obtain/extend lengths Aug 9, 2021
@teor2345
Copy link
Contributor

@oxarbitrage before you start this task, can you write down what you want to do, step by step?
Then we can work out any details that are unclear.

I'm doing some testing on PR #2602 at the moment, and updating this PR with the results.

@teor2345
Copy link
Contributor

teor2345 commented Aug 12, 2021

Here is how RecentSyncLengths from PR #2602 works:

Startup & Initial Syncs

On startup, the list is empty. Then it gets filled with lengths around 500.

Aug 11 14:13:33.723  INFO {zebrad="54b787a" net="Main"}:sync:obtain_tips:push_obtain_tips_length{sync_length=499}: zebrad::components::sync::recent_sync_lengths: sending recent sync lengths
recent_lengths=[499]
Aug 11 14:13:41.249  INFO {zebrad="54b787a" net="Main"}:sync:extend_tips:push_extend_tips_length{sync_length=498}: zebrad::components::sync::recent_sync_lengths: sending recent sync lengths
recent_lengths=[498, 499]
Aug 11 14:13:48.034  INFO {zebrad="54b787a" net="Main"}:sync:extend_tips:push_extend_tips_length{sync_length=498}: zebrad::components::sync::recent_sync_lengths: sending recent sync lengths
recent_lengths=[498, 498, 499]
Aug 11 14:13:53.848  INFO {zebrad="54b787a" net="Main"}:sync:extend_tips:push_extend_tips_length{sync_length=498}: zebrad::components::sync::recent_sync_lengths: sending recent sync lengths
recent_lengths=[498, 498, 498]

Temporary Error

When there is a temporary error, one zero length gets put in the list.
There might also be some shorter lengths in the list as the syncer recovers (around 100 - 400).

Aug 11 15:24:03.338  INFO {zebrad="c9ce29f" net="Main"}:sync:extend_tips:push_extend_tips_length{sync_length=498}: zebrad::components::sync::recent_sync_lengths: sending recent sync lengths
recent_lengths=[498, 498, 498]
Aug 11 15:24:07.763  INFO {zebrad="c9ce29f" net="Main"}:sync:extend_tips: zebrad::components::sync: trying to extend chain tips tips=1
Aug 11 15:24:08.830  INFO {zebrad="c9ce29f" net="Main"}:sync:extend_tips:push_extend_tips_length{sync_length=0}: zebrad::components::sync::recent_sync_lengths: sending recent sync lengths
recent_lengths=[0, 498, 498]
Aug 11 15:24:08.830  INFO {zebrad="c9ce29f" net="Main"}:sync: zebrad::components::sync: exhausted prospective tip set
Aug 11 15:24:08.830  INFO {zebrad="c9ce29f" net="Main"}:sync: zebrad::components::sync: waiting to restart sync timeout=61s
Aug 11 15:25:09.832  INFO {zebrad="c9ce29f" net="Main"}:sync: zebrad::components::sync: starting sync, obtaining new tips
Aug 11 15:25:15.850  INFO {zebrad="c9ce29f" net="Main"}:sync:obtain_tips:push_obtain_tips_length{sync_length=499}: zebrad::components::sync::recent_sync_lengths: sending recent sync lengths
recent_lengths=[499, 0, 498]
Aug 11 15:25:20.298  INFO {zebrad="c9ce29f" net="Main"}:sync:extend_tips: zebrad::components::sync: trying to extend chain tips tips=1
Aug 11 15:25:21.811  INFO {zebrad="c9ce29f" net="Main"}:sync:extend_tips:push_extend_tips_length{sync_length=498}: zebrad::components::sync::recent_sync_lengths: sending recent sync lengths
recent_lengths=[498, 499, 0]
Aug 11 15:25:26.442  INFO {zebrad="c9ce29f" net="Main"}:sync:extend_tips: zebrad::components::sync: trying to extend chain tips tips=1
Aug 11 15:25:27.074  INFO {zebrad="c9ce29f" net="Main"}:sync:extend_tips:push_extend_tips_length{sync_length=498}: zebrad::components::sync::recent_sync_lengths: sending recent sync lengths
recent_lengths=[498, 498, 499]
Aug 11 15:25:32.291  INFO {zebrad="c9ce29f" net="Main"}:sync:extend_tips: zebrad::components::sync: trying to extend chain tips tips=1
Aug 11 15:25:33.175  INFO {zebrad="c9ce29f" net="Main"}:sync:extend_tips:push_extend_tips_length{sync_length=498}: zebrad::components::sync::recent_sync_lengths: sending recent sync lengths
recent_lengths=[498, 498, 498]

Response Lengths During Sync

Some peers give us responses that are around 100, 400, or 500 long.
We can usually combine them into a list that is about 500 long.
But sometimes the list only has 60-400 blocks.

The long tail of obtain tips responses happened after Zebra had stopped downloading lots of blocks.
But it might have been lagging behind the tip by a few thousand blocks.
We can fix these kinds of issues later.

Screen Shot 2021-08-12 at 12 39 44

Reaching the Chain Tip

Once Zebra reaches the chain tip, the syncer gets responses around 2-20 blocks long.

Screen Shot 2021-08-12 at 12 42 56

@oxarbitrage
Copy link
Contributor

@oxarbitrage before you start this task, can you write down what you want to do, step by step?
Then we can work out any details that are unclear.

Maybe is better if you do it ? The list and the PR ? If not, please let me handle it at my way.

Thank you for the details of RecentSyncLengths

@oxarbitrage oxarbitrage assigned teor2345 and unassigned oxarbitrage Aug 12, 2021
@teor2345 teor2345 changed the title Activate Mempool based on a list of recent syncer obtain/extend lengths Activate Mempool storage based on a list of recent syncer obtain/extend lengths Aug 12, 2021
@upbqdn upbqdn self-assigned this Aug 13, 2021
@mpguerra mpguerra removed the S-needs-triage Status: A bug report needs triage label Aug 16, 2021
@upbqdn
Copy link
Member

upbqdn commented Aug 19, 2021

Do I understand it correctly that this ticket needs to come up with a suitable heuristic that would decide when it's the right time to start the mempool?

@oxarbitrage
Copy link
Contributor

This is a very high level description but i think something like this is what we need here:

The mempool will have a start() method or similar that we will call at any time. For example in testing we might want to call this at the genesis block or at any time we might need.

On the other hand, in the real world we will activate the mempool after zebra is in sync. The question is how do we know it is in sync. IIRC we were going to do this by watching the number of blocks that are being downloaded, while zebra is syncing it will be downloading blocks from the network at batches of 500(or similar, i dont remember the exact number) but when it gets in sync it will slow down.

We need an algorithm to do this in a function named is_zebra_in_sync() or something like that that will return true if we are in sync or false otherwise. Will be good to have some testing for this but this could be hard.

@oxarbitrage
Copy link
Contributor

By the way, it seems the task is simplified by the addition of RecentSyncLengths (#2592 (comment)) which is already merged. If you can access the recent_lengths field from your function it should be easy to decide if we are in sync.

Aug 11 15:25:33.175  INFO {zebrad="c9ce29f" net="Main"}:sync:extend_tips:push_extend_tips_length{sync_length=498}: zebrad::components::sync::recent_sync_lengths: sending recent sync lengths
recent_lengths=[498, 498, 498]

@teor2345
Copy link
Contributor

teor2345 commented Aug 19, 2021

If you can access the recent_lengths field from your function it should be easy to decide if we are in sync.

We can wrap the heuristic function and recent_lengths watch receiver in a struct, so it's easy to clone it, and pass it to different tasks and services.

But it's ok for the initial draft PR to just have a heuristic function.

If you want to see how syncing looks on a graph, you can use Grafana and Prometheus:

@upbqdn
Copy link
Member

upbqdn commented Aug 22, 2021

The following a proposal for the heuristic function.

We could utilize a simple moving average (SMA) defined by

where ri is the value of recent_lengths, and k is the number of the previous ri that we consider. It is basically the average of the last k recent_lengths.

I think we want to activate the mempool as early as possible when Zebra reaches the tip. I would therefore set k = 5, and the mempool would activate when SMA5 < 100. Here's an example for better intuition when this happens:
SMA5(500, 10, 10, 10, 10) = 108 => don't activate mempool yet (and wait for the next r),
SMA5(400, 10, 10, 10, 10) = 88 => activate mempool (we made this decision after four short recent_lengths).

If k = 5 is too much, and we want to decide whether to activate the mempool based on fewer recent_lenghts, we can lower k to 3, and set a different threshold for the SMA. If we need to go even lower with k, I wouldn't utilize SMA anymore, and use a simpler heuristic instead.

There is a nice optimization for computing an SMA based on the previous value of it so that it's not necessary to keep all k values in memory but just one.

@oxarbitrage
Copy link
Contributor

oxarbitrage commented Aug 23, 2021

Here is what i think we can do to get started:

There we have the recent_lengths vector available while zebra is syncing.

  • Call a function there with the recent_lengths as arguments:
mempool::is_zebra_in_sync(recent_lengths);
  • place the is_zebra_in_sync() in the mempool.rs at least by now:
pub fn is_zebra_in_sync(recent_lengths: Vector<_>) -> bool {

}
  • Inside we can use SMA as suggested, we can also just use the last value of the vector in a first version or some other way.

  • This way we can add a test in mempool/tests/prop.rs that will only test our is_zebra_in_sync() function by passing them arbitrary length vectors. We can also in the beginning do some integration tests instead(mempool/tests/integration.rs) where we can test the function with some hard coded lengths instead of the arbitrary ones that could be a bit harder.

@oxarbitrage
Copy link
Contributor

Maybe base your initial PR on #2615 so you will have some mempool files.

@dconnolly dconnolly added A-rust Area: Updates to Rust code P-Medium and removed C-enhancement Category: This is an improvement labels Aug 23, 2021
@teor2345
Copy link
Contributor

teor2345 commented Aug 24, 2021

I think we want to activate the mempool as early as possible when Zebra reaches the tip. I would therefore set k = 5, and the mempool would activate when SMA5 < 100. Here's an example for better intuition when this happens:
SMA5(500, 10, 10, 10, 10) = 108 => don't activate mempool yet (and wait for the next r),
SMA5(400, 10, 10, 10, 10) = 88 => activate mempool (we made this decision after four short recent_lengths).

I would encourage you to use a lower threshold than 100, because the heuristic has to work for 3 different cases:

  • syncing far from the tip: ~500 blocks for most syncs
  • syncing near the tip or after an error: 60-400 blocks for most syncs, but some zeroes
  • syncing at the tip: 0-20 blocks for most syncs

For example, if we have [60, 60, 60, 60, 60], we might not want to activate the mempool, because that's still a lot of blocks. (And we're obviously not at the tip.)

See the details here:
#2592 (comment)

Note that the latest length is added to the front of the RecentSyncLengths list, unlike the equation above:

/// New lengths are added to the front of the list.

If k = 5 is too much, and we want to decide whether to activate the mempool based on fewer recent_lenghts, we can lower k to 3, and set a different threshold for the SMA. If we need to go even lower with k, I wouldn't utilize SMA anymore, and use a simpler heuristic instead.

k is currently limited to a maximum of 4, feel free to change it in your PR:

pub const MAX_RECENT_LENGTHS: usize = 4;

We'll want to keep k low, because the list is updated:

  • every few seconds while syncing blocks
  • after 61 seconds if there is an error
  • every 61 seconds when Zebra is at the tip
    const SYNC_RESTART_DELAY: Duration = Duration::from_secs(61);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants