Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executionclient: add a comment about SubscribeNewHead choice #1996

Merged
merged 2 commits into from
Jan 26, 2025

Conversation

nkryuchkov
Copy link
Contributor

@nkryuchkov nkryuchkov commented Jan 21, 2025

Without the comment, it's not clear why we didn't choose SubscribeFilterLogs

Copy link

codecov bot commented Jan 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 47.0%. Comparing base (3e76188) to head (ef5df09).
Report is 10 commits behind head on stage.

Additional details and impacted files

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 324 to 325
// Therefore, we decided to implement more atomic behaviour, where we can revert the tx if there was an error in processing all the events of a block.
// So we can restart from this block once everything is good. Doing this based on the event stream was a bit harder.
Copy link
Contributor

@moshe-blox moshe-blox Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atomicity refers to how we save an entire block to the database in one tx, which could be done with streaming also

the specific bugs we had with streaming were because of missing blocks:

  • you first sync history from genesis to block 100, but then stream sometimes starts late at 102 (missed 101)
  • you inevitably miss blocks during any stream connection interruptions (such as EL restarts)

which is because you can't specify the fromBlock with streaming, it always starts at the highest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, please review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atomicity refers to how we save an entire block to the database in one tx, which could be done with streaming also

the specific bugs we had with streaming were because of missing blocks:

  • you first sync history from genesis to block 100, but then stream sometimes starts late at 102 (missed 101)
  • you inevitably miss blocks during any stream connection interruptions (such as EL restarts)

which is because you can't specify the fromBlock with streaming, it always starts at the highest

it's a very valid point.
It might be solvable in this way:

  • Subscribe to log streams
  • Check if the block is latest
  • If block is not latest, backfill until the block received from the stream is reached. While doing that, block the streaming go channel
  • Once backfilled, process the log received from the go channel. Resume stream listener.

Although, there are pitfalls, like - how long the channel could be blocked? Aren't event logs are missed while it's blocked? It's probably not a buffered channel and even if it is buffered, what happens if the buffer cap is reached? Also, with Streams, it seems, there is no way to know what is the last processed block . The last processed will be last block where event was published, but if the event was published 100 blocks back? All things considered, I think current implementation is just simpler and more robust.
If you understand what I mean :)

Copy link
Contributor

@moshe-blox moshe-blox Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oleg-ssvlabs i understand it's possible, but what would be the advantage of streaming, if you have to anyway fetch blocks individually to fill streaming gaps, then why not just do only this to keep the system simper and more predictable? (rather than alternate between 2 mechanisms)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetching blocks would only be needed for backfilling. Let's say, application was down for some time and missed some of the blocks. During the start up it would backfill that gap and once it is done, it would subscribe to the stream. During the entire runtime of the application, it would listen to the stream (which potentially can be days/weeks/months) instead of polling for blocks, which is more efficient from the network traffic perspective(though not sure how Subscribe works under the hood? Maybe it just polls?).
Again, it's just an alternative, it is not better. It has its trade offs. It would be a very strong option if Subscribe could take startFromBlock parameter

Copy link
Contributor Author

@nkryuchkov nkryuchkov Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oleg-ssvlabs I think what you are suggesting could work but it would definitely add a lot of cognitive complexity and room for potential bugs, also we'd need to add a lot of tests for it. It's something I'd consider only if we prove that its benefits outweigh added complexity: We need to check if execution client traffic is a bottleneck now and make sure we save a lot with this change

@nkryuchkov nkryuchkov requested a review from moshe-blox January 21, 2025 19:34
Copy link
Contributor

@moshe-blox moshe-blox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🔥

@y0sher y0sher merged commit ff4fa98 into stage Jan 26, 2025
7 checks passed
@y0sher y0sher deleted the comment-SubscribeNewHead branch January 26, 2025 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants