-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChainDB q-s-m failure with GetIsValid
#3689
Comments
With #3743, this will no longer annoy us in CI for the time being, but this should be revisited in the future. |
3743: ChainDB q-s-m test: workaround for flaky `GetIsValid` discrepancy r=amesgen a=amesgen This resolves the flakiness due to #3689 until we have time to properly fix this. To test this, observe that ``` cabal run test-storage -- -p 'ChainDB q-s-m' --quickcheck-replay=455411 ``` fails before this change, but will succeed after. Co-authored-by: Alexander Esgen <alexander.esgen@iohk.io>
3743: ChainDB q-s-m test: workaround for flaky `GetIsValid` discrepancy r=amesgen a=amesgen This resolves the flakiness due to #3689 until we have time to properly fix this. To test this, observe that ``` cabal run test-storage -- -p 'ChainDB q-s-m' --quickcheck-replay=455411 ``` fails before this change, but will succeed after. Co-authored-by: Alexander Esgen <alexander.esgen@iohk.io>
3743: ChainDB q-s-m test: workaround for flaky `GetIsValid` discrepancy r=amesgen a=amesgen This resolves the flakiness due to #3689 until we have time to properly fix this. To test this, observe that ``` cabal run test-storage -- -p 'ChainDB q-s-m' --quickcheck-replay=455411 ``` fails before this change, but will succeed after. Co-authored-by: Alexander Esgen <alexander.esgen@iohk.io>
Difference in behaviourThe difference in behaviour between the actual API implementation and the model with respect to the valid and invalid states in described by tables below, which depict the states in the chain selection algorithms, the first for the actual implementation, the second for the model, (note that invalid states are indicated by lowercase letters, while valid states are indicated by uppercase letters).
There is never a stage in which the GAEf chain is a plausible candidate in the model, due to the fact that it does simultaneous pruning of invalid blocks, before it does selection. It might be difficult to adapt this implementation of the model to precisely match that of the actual one, with respect to the getIsValid not used anywhereAnother point is that the |
The team discussed this today, and came to the following conclusions:
|
I took a closer look today. I think this If I change this Edit: after running the test suite ~100 times, I did see an error. Looks unfamiliar.
Edit: After 325 executions of the test suite, the above is still the only error I saw, with |
^^^ I opened Issue #3999 for the above failure cc @bartfrenk |
To reproduce
Run
on fa10cb4 (master at time of writing). This yields
This is exactly what #3651 ought to have fixed.
(edit NSF: just an update: it also fails on today's
master
tip, 03f935e, if you revert PR 3743, which disables this aspect of the test, before running with that seed)Cursory glance
The QSM commands look like this:
My interpretation: The model is wrong here, we have to validate
E
as the chain diff with rollback 2 and fragmentE >: F
is an improvement over our currently selected chainA >: B >: C
asC < F
. So even though we end up not selecting it asF
is invalid, ChainSel still learns thatE
is valid.We already have some asymmetry when comparing these validity results:
https://github.com/input-output-hk/ouroboros-network/blob/fa10cb4eef1e7d3e095cec3c2bb1210774b7e5fa/ouroboros-consensus-test/test-storage/Test/Ouroboros/Storage/ChainDB/StateMachine.hs#L501-L504
Concretely, it is for the "other way around": when ChainSel does not know whether a block is valid, we intentionally don't check whether the model agrees.
Full log
Click to expand
The text was updated successfully, but these errors were encountered: