Skip to content
This repository has been archived by the owner on Nov 26, 2020. It is now read-only.

fully deterministic, reproducible test scenarios #146

Open
raulk opened this issue Jul 15, 2020 · 2 comments
Open

fully deterministic, reproducible test scenarios #146

raulk opened this issue Jul 15, 2020 · 2 comments
Labels
workstream/e2e-tests Workstream: End-to-end Tests

Comments

@raulk
Copy link
Member

raulk commented Jul 15, 2020

Context

With #143 (proper synchronized mining), we have the ability to synchronise mining across a fleet of miners, so that they're advancing in lockstep via a global clock.

However, there are other chain-dependent async processes like window PoSt (fault declaration, recovery declaration, posting proofs), sealing, etc. that we need to wait for at every chain height before we proceed. We need to tap into those processes before we allow the clock to proceed.

Additionally, some/all of those processes generate messages asynchronously, and we might need to coordinate across miners to only advance the clock globally when all those messages have been received in the corresponding mempools.

Dependent downstream processes

Windowed PoSt runner (Lotus: storage package)

Currently subscribes to head changes, and uses those to drive these three processes (at least):

  • sector proving
  • fault declaration
  • recovery declaration

The way it works is: upon a new HEAD that starts a new proving window, we wait for StartConfidence epochs before we actually do anything (to avoid computational wastage in case of reorgs). We then:

  1. check recoveries of sectors up for challenge in the NEXT proving window.
  2. check faults of sectors up for challenge in the NEXT proving window.
  3. calculate the proofs of sectors up for challenge in THIS proving window.

Each of those steps generates and broadcasts messages. For each step that generated a message, we wait for build.MessageConfidence epochs on top of it before continuing with the next step.

^^ This would pose a catch-22 on synchronised mining: the logic whose completion we're waiting on before we advance the chain is in turn waiting for the chain to advance. IMO this logic is wrong to begin with: we should be having a message sentinel that we delegate watching and rebroadcasting messages to. We should not BLOCK window PoSt waiting for messages to appear. The current logic also has other weaknesses.

Proposed mechanics

  1. Make the window PoSt runner push messages to the mpool, but not wait for them to appear. Make it run linearly, in one shot.
  2. On every run, emit an event that reports how many sectors were faulted, recovered, and which sectors we proved. Also include the messages we posted (both message CID and full message).
  3. A mpool sentinel would subscribe to these events to know which messages it needs to watch appear on chain.
  4. Our synchronised mining logic would subscribe to these events to know when window PoSt has run and to learn which messages were pushed. All instances would wait for window PoSt to run and for all generated messages to appear in their local mpool (MpoolSub), before advancing to the next epoch.

Deal/sector sealing

Deals have epoch deadlines. If the chain advances too fast (as is the case with #143) sealing will not have enough time to run, ever, and therefore deals will always fail. Ideas:

  1. Fake/dummy sealing => reduces the time it takes to seal. Oni is not testing the sealing procedures themselves so we should be fine stubbing this out.
  2. Sealing callbacks/subscriptions.
@raulk
Copy link
Member Author

raulk commented Jul 15, 2020

Had a chat with @magik6k.

@raulk
Copy link
Member Author

raulk commented Jul 15, 2020

I quite like the journal idea, as it stops the proliferation of ad-hoc subscriptions for watching purposes. I think of it as an authoritative audit trail of system processes and decisions.

A better, normalised event subscription solution long-term might be to use https://github.com/libp2p/go-eventbus for typed events all around (killing the ad-hoc methods like SubHeadChanges, SubscribeHeadChanges). And the audit trail / journal could do a wildcard subscription and dump to disk. But this is out of scope right now — maybe in the future when a refactor is due.

I'll probably create an in-memory journal that I can clear/flush on every epoch to avoid unnecessary history build-up, and on Oni side, we can use https://github.com/mitchellh/mapstructure to consume journal entries into typed structs.

@raulk raulk added the workstream/e2e-tests Workstream: End-to-end Tests label Jul 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
workstream/e2e-tests Workstream: End-to-end Tests
Projects
None yet
Development

No branches or pull requests

1 participant