Skip to content
This repository has been archived by the owner on Nov 26, 2020. It is now read-only.

proper synchronized mining #143

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

proper synchronized mining #143

wants to merge 11 commits into from

Conversation

raulk
Copy link
Member

@raulk raulk commented Jul 13, 2020

This PR introduces proper synchronized mining via two control planes:

  • global clock control plane.
  • mining control plane.

With this PR, we no longer need to backdate genesis nor decrease the BlockDelaySecs and PropagationDelaySecs configurations to accelerate mining.

Global clock control plane

Both miners and clients participate in this control plane. It is used to drive time in mining. Time starts at now, and moves into the future, one BlockDelaySecs at a time.

This is essentially modelled as a distributed semaphore, where all miners and clients wait until they have locally advanced their clocks. Once they all signal that fact via a topic, the semaphore allows them all to proceed at once.

The semaphore fires through a channel, which is then fed to the mining control plane.

Mining control plane

Only miners participate in this control plane. It feeds off the global clock control plane channel to conduct mining rounds.

As soon as the clock progresses, we attempt to mine a block. We then publish a message on a topic indicating who we are (address), and if we did produce a block or not.

Next, we subscribe to HEAD changes (which sends us a snapshot of our current head, plus any subsequent changes), and await until our HEAD becomes the tipset that includes all blocks we know were produced in that round.

At that point, we release a signal via a channel that backfeeds into the global clock control plane, which triggers the next collective epoch.


Fixes #135.
Partially addresses #87.
Uses filecoin-project/lotus#2362.


Future

This solution synchronises all miners to perfection, and does not allow us to simulate disruptions, faults, delays, etc. That said, the modularity and the delineation through clean input/output channels should allow us to mangle and manipulate things locally. However, we will need to think of ways to advertise to peers when we deliberately "opt out" of a round, as right now, the code expects full participation of all miners in all mining rounds.

@raulk raulk added this to the Sprint 5 milestone Jul 14, 2020
@raulk raulk marked this pull request as ready for review July 14, 2020 14:11
@raulk raulk changed the title WIP proper synchronized mining proper synchronized mining Jul 14, 2020
Copy link
Contributor

@yusefnapora yusefnapora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome.

I notice a lot of messages like this:

Jul 14 14:52:33.298557  INFO    101.6153s      ERROR << instance   3 >> 2020-07-14T10:52:33.298-0400    WARN    miner   miner/miner.go:179      mined block in the past {"block-time": "2020-07-15T10:44:12.000-0400", "time": "2020-07-15T10:44:12.100-0400", "duration": 0.1}   {"req_id": "8ac15ff3"}
Jul 14 14:52:33.312636  INFO    101.6294s      ERROR << instance   4 >> 2020-07-14T10:52:33.312-0400    WARN    sub     sub/incoming.go:81      Received block with large delay 25 from miner t01001    {"req_id": "8ac15ff3"}

I think the mined block in the past message is happening because we start at genesisTime + 100ms and advance by 1 epoch interval at each step, so we're always 100ms after the epoch when we actually mine the block. Maybe we should mine at a point midway between epochs? I think that would be closer to the real-world scenario, since miners want to push blocks before the epoch so they can propagate.

lotus-soup/testkit/node_time.go Outdated Show resolved Hide resolved
@raulk
Copy link
Member Author

raulk commented Jul 14, 2020

@raulk
Copy link
Member Author

raulk commented Jul 14, 2020

@yusefnapora

Jul 14 14:52:33.312636  INFO    101.6294s      ERROR << instance   4 >> 2020-07-14T10:52:33.312-0400    WARN    sub     sub/incoming.go:81      Received block with large delay 25 from miner t01001    {"req_id": "8ac15ff3"}

is a bit of a red herring. The block validator is not discarding blocks we've already received, so it appears that gossipsub delivers can deliver the same block multiple times. The first time we receive the block, we incorporate it to our HEAD, and when all blocks are received, we advance time. If due to gossipsub amplification, a duplicate block is received after we've fast-forwarded the clock 25s, it will still be delivered and will cause this warning.

@schomatis
Copy link
Contributor

The block validator is not discarding blocks we've already received,

Not totally familiar with gossipsub internals but we should drop (valid) repeated blocks in

https://github.com/filecoin-project/lotus/blob/13de81b3b27926fd5db1e7334d540f8e6d72c774/chain/sub/incoming.go#L220-L225

build.Clock = n.MockClock

// start at genesis + 100ms.
n.MockClock.Set(genesisTime.Add(100 * time.Millisecond))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still triggers (after last set of patches) the mined block in the past and related. In total we are advancing BlockDelaySecs plus this slight drift, putting us outside of the mining epoch window: the miner only advances BlockDelaySecs from its tipset and sees that it lost the mining window by 100ms. Ideally this should be a subtraction but in my local tests this breaks other assumptions (understandably because I'm drifting before genesis time). Maybe we should jump a smaller amount than BlockDelaySecs the first time (to remain inside the mining window when adding the full BlockDelaySecs amount in subsequent rounds).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the simplest solution here would be to advance the clock after the mining round and not before as it seems to be done now. (I think the change would translate to something like moving the clock advance logic from the local to the global case in SynchronizeClock.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make sense. I did try different variants and in some cases I ended up hitting the "got a block from the future" error. I think we do need to get this absolutely right, so I'm willing to invest time in nailing the state machine right now so that we're all happy and we can build on a solid base.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think advancing the clock just before the mining round won't work because of races between miners. Miners who advance their clock sooner will end up propagating what appear to be "blocks from the future" to miners who still haven't advanced their clocks. That's why I opted for this approach, as we have the assurance that all miners are in the right epoch when mining commences in concert.

After removing the drift, and testing atop this patch, we no longer have the spurious logs.

The only one we're left with is the occasional:

WARN	sub	sub/incoming.go:81	Received block with large delay 25 from miner t01001	{"req_id": "a305c648"}

Which I think has to do with gossipsub propagation, and not worth investigating right now IMO.

@schomatis
Copy link
Contributor

I'm going to give it an in-depth review and run tomorrow but the new logic decoupling of this last iteration is beautiful 🎨

Copy link
Contributor

@schomatis schomatis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, left some comments before giving this a run, nothing major.

lotus-soup/testkit/node_time.go Outdated Show resolved Hide resolved
lotus-soup/testkit/node_time.go Outdated Show resolved Hide resolved
lotus-soup/testkit/role_miner_sync.go Outdated Show resolved Hide resolved
lotus-soup/testkit/role_miner_sync.go Outdated Show resolved Hide resolved
lotus-soup/testkit/role_miner_sync.go Outdated Show resolved Hide resolved
lotus-soup/testkit/role_miner_sync.go Outdated Show resolved Hide resolved
lotus-soup/testkit/role_miner_sync.go Outdated Show resolved Hide resolved
lotus-soup/testkit/node_time.go Show resolved Hide resolved
lotus-soup/testkit/node_time.go Show resolved Hide resolved
lotus-soup/testkit/role_miner.go Outdated Show resolved Hide resolved
Copy link
Contributor

@schomatis schomatis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking very good. All the major comments have been addressed. I still need to run this and check the "mined in the past" delay but from previous comments it seems this has been fixed.

One last request. I need some more clarity on the channels that control this system. I left some untidy notes about this which might not add too much value (apologies for that, I can add more clarifications later when I have more time); their intent is to try to illustrate some of the confusions I have when I try to reason about this code. I actually know what the code is doing, but that is mainly because I'm very familiar with the protocol and I think how would I go about doing this, and it's basically aligned with this implementation. But when I actually read through the synchronization logic I still take a noticeable cognitive hit when following the channels from one function to another.

// all blocks that were produced in this epoch.
// * once that happens, we emit an epoch end signal, which the caller can
// interpret as a signal to advance the clock and begin a new epoch.
func (m *LotusMiner) SynchronizeMiner(ctx context.Context, epochStart <-chan abi.ChainEpoch, epochEnd chan<- abi.ChainEpoch) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

epochStart:

  • what do we start? Mining. It seems basically a "go mine" signal.
  • epoch is confusing, it may seem to imply that we mine on the epoch passed in the channel when in fact the epoch is used just for debugging purposes
  • the channel seems to be doing to things at once: signal to mine (this could just be an empty struct) and also tell you in which epoch the miner is standing on (do we need to obtain that information from here?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

epochEnd: similar, do we even read the epoch we're sending? or is this just i'm done syncing, the terms epoch and end seem to imply to me we are controlling the clock here but i think this is just a signal that i (miner) did something (either mined or synced the chain, unrelated to start/end epoch)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically this routine is intended to behave like an "actor", as in the actor-oriented programming model programming. It receives signals to work on epochStart and emits "work done" signals on epochEnd. In Akka, you would assimilate the epochStart channel to an inbox, and epochEnd to an outbox (using their mailbox analogy).

I haven't thought too much about how these epochs are actually used; I started making some of these channels of element type struct{}, but then I realised that I might as well propagate useful info. Although in hindsight, I see how that decision (and loose usage of the sent/recvd values) might confuse others.

How about we name these channels mineStart and mineDone? From the viewpoint of this routine, it establishes the clear boundary of signals that demarcate each unit of work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those names sound good.

//
// This method will return immediately, and will spawn a background goroutine
// that performs the clock synchronisation work.
func (n *LotusNode) SynchronizeClock(ctx context.Context) (localAdvance chan<- abi.ChainEpoch, globalEpoch <-chan abi.ChainEpoch, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

global vs local: i have trouble grappling with the topology implied by this locality

the user requests local advancement: it doesn't seem to me the user (miner) requests advancement, it just says
i've synced to this epoch, yes, you can interpret it as "i'm done let's move one", but the local/global implies to me some equality that is differentiated between how we order the group, but here we have miners and a central time/chain control, the miners don't have any leverage on the control (in more advance use cases even if the miner says i'm done in this round we may still not advance until some other event happens), i don't see the relation that could classify them into local/global with a centralized control system

Copy link
Member Author

@raulk raulk Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SynchronizeClock is another actor (actor-oriented programming). It encapsulates all the control of the clock. No other process in the system (not even the SynchronizeMiner actor) is entitled to change the clock (although we don't enforce that prohibition).

The idea behind this is that when the local node, when enlisted predicates are satisfied, decides that it's ready to advance the clock, it would send a signal to localAdvance.

Right now we ACTUALLY advance the local clock and advertise that fact to the world. In the future, we could use some form of 3PC (three phase commit), where we PREPARE [phase 1] the advancement, nodes opt in or opt out of that round via the sync service, and once all nodes have PREPARED, we actually COMMIT [phase 2] the local clock advancement by manipulating the mock clock, and CERTIFY [phase 3] that we are at the new epoch.

Once all nodes have passed the CERTIFY phase, that's when we release the signal on globalEpoch to trigger mining and/or other downstream processes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think I was having the 3PC model you're describing in my mind. I'm fine if this is not implemented right now.

Copy link
Contributor

@schomatis schomatis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

90% of the Received block with large delay warnings are coming from the client (instance 4) which is running one epoch ahead (not sure why), so I'm not sure I agree with the earlier diagnostic of gossipsub amplification. I would like to understand and preferably fix (or justify) this to look in depth at the other 10% of miner delays. (From what I've seen miner delays come from blocks the same miner mined, which the validator does not discard because in fact it is the first time it sees, but I would like to confirm this.)

edit: There is also a soft race condition (for lack of a better term) where each node advances its clock once all miners are done mining but not necessarily after all nodes are done syncing, which also might explain the reported delays (not totally sure, I'm still having trouble following the local/end global/start channels back and forth).

Comment on lines +151 to +158
go func() {
// for now, clients are always one epoch ahead locally with regards to miners.
// keeping the flywheel spinning!
for epoch := range globalEpochCh {
localAdvanceCh <- epoch + 1
}
}()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow the rationale behind this advancement from the comment.

@raulk
Copy link
Member Author

raulk commented Jul 17, 2020

@schomatis Good job catching the fact that it's the client that reports the "Received block with large delay" message. In general, we need to revisit the client's behaviour in all this synchronisation. I'm not convinced by the current solution.

The decision I had to take was: "do I want for the client to wait until its HEAD is updated, just like miners do?" It seemed a bit too heavy-handed to me, so that's why I decided not to wait for that condition right now, because this PR is just the first step to get fully deterministic tests (a) working [see #146], (b) doing useful stuff (we will need to simulate faults, forks, mining halting, etc.)

Right now the advancement logic is hardcoded, but to enable (b), we need some richer semantics/API where we can attach dynamically-evaluated predicates such that:

  1. Miners can decide to opt-out of a mining round (advance the clock but do not attempt to mine a block).
  2. Clients can watch the mining happen passively, then 1000 tipsets into the chain, decide to enroll in the ensemble, forward their clock to the current time, bootstrap against the network, and sync.
  3. etc.

In conclusion, I'm not so worried about getting everything right at this precise instance, because this PR is just laying the groundwork and we'll iterate on this code a lot as we introduce those predicates, checkpoints, etc.

@schomatis
Copy link
Contributor

In conclusion, I'm not so worried about getting everything right at this precise instance, because this PR is just laying the groundwork and we'll iterate on this code a lot as we introduce those predicates, checkpoints, etc.

I don't fully understand all the details here so I still don't have complete confidence in this implementation (that is on me, not an issue with the code) but I likely won't have time to review anything else today so if this is blocking the ongoing effort feel free to go ahead and merge this, we can iterate on this as we go.

Comment on lines +18 to +19
// build.BlockDelaySecs = 2
// build.PropagationDelaySecs = 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to sort this out. Synchronised mining doesn't need to reduce the block delay and propagation delay.

Copy link
Contributor

@schomatis schomatis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general none of the future features proposed here seem to me will need two different nodes to see a different time. The biggest goal of this PR should be just to make sure all the nodes are in sync, what each node does (mine or not mine, stay in consensus or fork) can be controlled by other means. Having these "local" clocks seems like adding too much complexity for a first iteration without much benefit, and as shown in #154 it also causes clock misalignments that result in unwanted behavior. Still, I'm approving this PR to explicitly signal I don't want to stall this effort moving forward and what I'm describing here could potentially be fixed in later iterations (at a bigger cost IMO than discussing and fixing it here though.)

@raulk raulk added the workstream/e2e-tests Workstream: End-to-end Tests label Jul 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
workstream/e2e-tests Workstream: End-to-end Tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

basic synchronized mining is broken
3 participants