Skip to content
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.

ref impl: driver agent: respond to head events and stay in sync #131

Merged
merged 9 commits into from
Jan 21, 2022

Conversation

protolambda
Copy link
Collaborator

@protolambda protolambda commented Jan 10, 2022

Part of #119: staging -> main migration

This:

  • EngineDriver is like a binding to a remote engine, tracking to where it's synced.
  • Drive is a sync process that can be started/shutdown per remote engine, and responds to head events from L1, and syncs (happy extend-case, or worst-case sync with reorg) the L2 chain by running a driver-step on the right starting-point. There is only a single active driving process at a time per engine.

Depends on #130

Review: any team, systems team preferred.

Spec: Do we want to go-indepth, like back-off strategy and failure-cases of sync in the spec? Or maybe just the "if out of sync, run a driver-step on with L1 block X_n and L2 block parent Y_{n-1}" summary of driver routine behavior? (we already more or less have the latter)

Testing: this is the most time/event/integration heavy of all of the reference implementation. Only a single event hub, but still challenging to unit-test. It works e2e though. Any suggestions/help how to best test this?

opnode/l2/driver.go Outdated Show resolved Hide resolved
@norswap
Copy link
Contributor

norswap commented Jan 18, 2022

Quick question: this is divergent from staging, I assume this is the canonical version?

Copy link
Contributor

@norswap norswap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think the architecture that we conceptually have three concurrent loops for checking l1 head, l2 head and taking a sync step (with the sync loop slowing down each time it doesn't move closer to sync) is pretty neat.

Also something that could use a description. I'm thinking we probably need some markdown implementation notes.

opnode/l2/driver.go Outdated Show resolved Hide resolved
opnode/l2/driver.go Outdated Show resolved Hide resolved
opnode/l2/driver.go Outdated Show resolved Hide resolved
opnode/l2/driver.go Outdated Show resolved Hide resolved
e.Log.Error("failed to fetch L2 head info", "err", err)
}
cancel()
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to verify that the L2 head of the execution engine is either:

  • something we're already aware of
  • congruent with the referenced L1 block (i.e. valid) (in the case this is a head that was synced by the execution engine that we didn't set ourselves)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need to track what the engine says it has to determine if we can go through the happy sync path. We trust our engine to have L2 blocks with correct L1 information (since we inserted it and set it with forkchoice updates). FindSyncStart can handle any out-of-sync cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we're making the assumption that we are the one inserting the L2 blocks, what's the purpose of getting the execution engine to tell us about new heads (which we already know about & track).

I guess the point of this is being a stub for later, when we do have L2-level sync, at which point we will need to validate the block, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the point of this is being a stub for later, when we do have L2-level sync, at which point we will need to validate the block, right?

We don't need to validate, state-sync via engine is always towards a block-hash we already trust (i.e. we messaged the engine with a forkchoice update, finalized or head, depending on trust assumptions). And even when the engine has the wrong head, we can just reorg away if it's not canonical.

opnode/l2/driver.go Show resolved Hide resolved
opnode/l2/driver.go Outdated Show resolved Hide resolved
@protolambda
Copy link
Collaborator Author

Rebased on base branch

@protolambda
Copy link
Collaborator Author

Went back and forth whether or not we need refL1 in the FindSyncStart return params, but we just need a name update to nextRefL1 in the driver.

@protolambda protolambda force-pushed the impl-sync-start branch 2 times, most recently from ad0b2c8 to f5d7631 Compare January 19, 2022 18:17
@protolambda
Copy link
Collaborator Author

Rebased on the updated impl-sync-start PR (rebased that on main)

@trianglesphere
Copy link
Contributor

Testing: this is the most time/event/integration heavy of all of the reference implementation. Only a single event hub, but still challenging to unit-test. It works e2e though. Any suggestions/help how to best test this?

My recommendation for this is to make the Drive loop as small and self contained as possible. With explicit inputs, it's easy too set up a unit test. Unfortunately having timers does make this harder, but not impossible.

Base automatically changed from impl-sync-start to main January 20, 2022 18:42
@protolambda
Copy link
Collaborator Author

@trianglesphere implemented your suggestion (large refactor, no functional changes, but should pay off in test-ability), can you review?

Copy link
Contributor

@trianglesphere trianglesphere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on the refactor. I think pulling it apart like this teases out the fundamental structure and makes it a bit easier to understand.

One comment was for more comments, and the other is about what I think's a bug. Feel free to merge this once you think you've got it resolved.

opnode/l2/driver_state.go Show resolved Hide resolved
opnode/l2/driver_state.go Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jan 21, 2022

Codecov Report

Merging #131 (94dba5a) into main (f5d7631) will decrease coverage by 6.29%.
The diff coverage is 8.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #131      +/-   ##
==========================================
- Coverage   40.56%   34.26%   -6.30%     
==========================================
  Files           8       11       +3     
  Lines         604      750     +146     
==========================================
+ Hits          245      257      +12     
- Misses        339      468     +129     
- Partials       20       25       +5     
Impacted Files Coverage Δ
opnode/l2/driver.go 0.00% <0.00%> (ø)
opnode/l2/driver_loop.go 0.00% <0.00%> (ø)
opnode/l2/driver_state.go 14.47% <14.47%> (ø)
opnode/l2/sync_start.go 68.57% <100.00%> (+0.45%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f5d7631...94dba5a. Read the comment docs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants