silkworm: inconsistent database content after unwind #2359

canepat · 2024-09-17T10:01:21Z

After encountering the invalid block issue described in #2358, Silkworm performs an unwind cycle and subsequently a new forward cycle triggered by the Consensus Layer:

...
  INFO [09-12|07:43:57.151 UTC] ExecutionEngine                          verifying chain 0c2f4eeb92fe45bff59e422fef6beddc062758e0e4b098559c523c0591c07bf9
  INFO [09-12|07:43:57.295 UTC] sync::SentryClient                       Peer de753506a7855f1f5be13d9e75a5616a42a00a12 disconnected, active 44, info: -info-not-found-
  INFO [09-12|07:43:57.472 UTC] ExecutionPipeline                        Forward start
  INFO [09-12|07:43:57.472 UTC] [1/12 Headers]                           op=Forward from=20581422 to=20727515 span=146093 
  INFO [09-12|07:43:57.952 UTC] [1/12 Headers]                           op=Forward done=480.476ms 
  INFO [09-12|07:43:57.952 UTC] [2/12 BlockHashes]                       op=Forward from=20581422 to=20727515 span=146093 
  INFO [09-12|07:43:58.726 UTC] [2/12 BlockHashes]                       op=Forward done=773.195ms 
  INFO [09-12|07:43:58.726 UTC] [3/12 Bodies]                            op=Forward from=20581422 to=20727515 span=146093 
...
  INFO [09-12|07:44:12.899 UTC] [3/12 Bodies]                            op=Forward done=14.173s 
  INFO [09-12|07:44:12.899 UTC] [4/12 Senders]                           op=Forward from=20581422 to=20727515 span=146093 max_batch_size=139810 
...
  INFO [09-12|07:45:37.219 UTC] [4/12 Senders]                           op=Forward done=1m 24s 
  INFO [09-12|07:45:37.219 UTC] [5/12 Execution]                         op=Forward from=20580035 to=20727515 span=147480 
 ERROR [09-12|07:45:37.634 UTC] [5/12 Execution]                         function=execute_batch exception=MDBX_EKEYMISMATCH: The given key value is mismatched to the current cursor position 
 ERROR [09-12|07:45:37.634 UTC] [5/12 Execution]                         function=forward exception=kDbError 
 ERROR [09-12|07:45:37.634 UTC] [5/12 Execution]                         op=Forward returned=kDbError 
 ERROR [09-12|07:45:37.634 UTC] ExecPipeline                             Forward interrupted due to stage Execution failure
  INFO [09-12|07:45:37.634 UTC] MainChain::verify_chain forward_result=kDbError
...

The forward cycle fails with internal error code MDBX_EKEYMISMATCH, which means the database content has become inconsistent. Restarting Silkworm again produces the same error during the initial cleanup forward cycle.

This is the status of the pipeline stages in the Silkworm database:

db_toolbox --datadir /Volumes/Extreme\ SSD/Library/silkworm/silkworm stages                                        

 Stage Name                    Block 
 ------------------------ ---------- 
 AccountHistoryIndex        20460999         
 BlockHashes                20727515         
 Bodies                     20727515         
 CallTraces                 20460999         
 Execution                  20580035         
 Finish                     20460999         
 HashState                  20460999         
 Headers                    20727515         
 HistoryIndex               20460999         
 IntermediateHashes         20460999         
 LogIndex                   20460999         
 Senders                    20727515         
 StorageHistoryIndex        20460999         
 TxLookup                   20460999

Forcing the Execution state progress to 20581423 (the failed block) and performing a stage unwind cycle from 20581423 to 20581422 (the last valid block) reveals something interesting:

db_toolbox --exclusive --datadir /Volumes/Extreme\ SSD/Library/silkworm/silkworm unwind --height 20581422
  INFO [09-16|14:52:31.817 UTC] ExecutionPipeline                        Unwind start
  INFO [09-16|14:52:31.817 UTC] [1/12 Finish]                            op=Unwind done=343us 
  INFO [09-16|14:52:31.817 UTC] [2/12 TxLookup]                          op=Unwind done=31us 
  INFO [09-16|14:52:31.817 UTC] [3/12 CallTraces]                        op=Unwind done=15us 
  INFO [09-16|14:52:31.817 UTC] [4/12 LogIndex]                          op=Unwind done=25us 
  INFO [09-16|14:52:31.817 UTC] [5/12 HistoryIndex]                      op=Unwind done=27us 
  INFO [09-16|14:52:31.817 UTC] [6/12 HashState]                         op=Unwind done=23us 
  INFO [09-16|14:52:31.817 UTC] [7/12 IntermediateHashes]                op=Unwind done=33us 
  INFO [09-16|14:52:56.245 UTC] [8/12 Execution]                         op=Unwind from=20581423 to=20581422 span=1 
  INFO [09-16|14:52:59.462 UTC] Erased 0 records from AccountChangeSet starting key 00000000013a0c2f
  INFO [09-16|14:53:18.273 UTC] Erased 0 records from StorageChangeSet starting key 00000000013a0c2f
  INFO [09-16|14:53:21.348 UTC] Erased 1 records from Receipt starting key 00000000013a0c2f
  INFO [09-16|14:53:24.431 UTC] Erased 97 records from TransactionLog starting key 00000000013a0c2f
  INFO [09-16|14:53:32.280 UTC] Erased 0 records from CallTraceSet starting key 00000000013a0c2f

There are 1 block receipt and 97 transaction logs in the database belonging to block 20581423, which the builtin unwind cycle executed automatically by Silkworm after forward failure didn't remove. In fact, such automatic unwind cycle did the opposite: it committed such changes applied on the MDBX RW transaction after detecting the failure.

This is a regression introduced by #1511

sw_invalid_block.log
sw.log

The text was updated successfully, but these errors were encountered:

Fixes #2359 This PR restores in stage Execution the invalid block handling logic removed in #1511 Also fixes the invalid block reference lifetime issue causing an incorrect block hash to be stored as bad block hash in SyncContext and displayed into the warning log, as described in #2358

canepat added the bug Something isn't working label Sep 17, 2024

canepat self-assigned this Sep 17, 2024

canepat mentioned this issue Sep 17, 2024

node: fix Execution stage unwind after invalid block #2360

Merged

battlmonstr closed this as completed in #2360 Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

silkworm: inconsistent database content after unwind #2359

silkworm: inconsistent database content after unwind #2359

canepat commented Sep 17, 2024 •

edited

Loading

silkworm: inconsistent database content after unwind #2359

silkworm: inconsistent database content after unwind #2359

Comments

canepat commented Sep 17, 2024 • edited Loading

canepat commented Sep 17, 2024 •

edited

Loading