Blockchain: fix temp fails causing alt blocks to be permanently invalid #9395

jeffro256 · 2024-07-11T06:07:36Z

Issue appeared in stressnet where nodes would never switch to the valid chain with longest proof of work, but instead log a message like so:

INFO    global  src/cryptonote_core/blockchain.cpp:2125 ###### REORGANIZE on height: 2534772 of 2534772 with cum_difficulty 1072308428953
INFO    global  src/cryptonote_core/blockchain.cpp:2125  alternative blockchain size: 2 with cum_difficulty 1072309197139
ERROR   txpool  src/cryptonote_core/tx_pool.cpp:596     Failed to find tx_meta in txpool
ERROR   verify  src/cryptonote_core/blockchain.cpp:4239 Block with id: <BLKID> has at least one unknown transaction with id: <TXID>
ERROR   blockchain      src/cryptonote_core/blockchain.cpp:1205 Failed to switch to alternative blockchain
ERROR   blockchain      src/cryptonote_core/blockchain.cpp:1217 The block was inserted as invalid while connecting new alternative chain, block_id: <BLKID>

What I believe was happening is this chain of events:
1. Transaction T is added to mempool
2. Block B1, which contains T, is added as an alt block
3. Mempool conditions cause T to be pushed out
4. Block B2, with previous block B1, is added as an alt block
5. Cumulative difficulty of B2 relative to the main chain triggers a reorg attempt
6. Tx validation of B1 begins, but T is not present in the mempool
7. Method handle_block_to_main_chain returns false
8. As a result, method switch_to_alternative_blockchain adds B1 and B2 to m_invalid_blocks list permanently
9. Peer gets blocked, and node never reaches consensus with other nodes because B1 and B2 stay in the invalid blocks list

I could be wrong about the exact trigger, but regardless, handle_block_to_main_chain returns false for many reasons, some of which are non-deterministic (i.e. the node not having a transaction in the mempool). As such, we should not add blocks to the invalid blocks list every time handle_block_to_main_chain returns false. Additionally, handle_block_to_main_chain already adds blocks to the invalid blocks list for transaction input consensus failures (which should be deterministic). The other big type of failure that could occur s a PoW check, but in that case, we shouldn't add those blocks to the invalid blocks list either, since it is 100% free for a peer to create a block that fails the PoW check, and doing so would cause the list to be filled up with garbage quickly.

There is a discussion to be had whether we should have any invalid blocks cache in the first place...

Thanks to @Rucknium for finding this issue, and to everyone running a stressnet node!

iamamyth

I agree with the premise that an invalid block cache doesn't make much sense: If a non-malicious peer sends a block the node deems invalid, validation should fail fast in common scenarios (unsupported transaction versions, missing parent blocks, etc), rather than relying on caching. Caching just tends to obscure flaws in the efficiency of the validation logic (e.g. one might not notice fail-fast conditions being spotted late in the validation because of the caching).

j-berman · 2024-07-17T17:33:42Z

Also looking at the other spot add_block_as_invalid is called:

monero/src/cryptonote_core/blockchain.cpp

Lines 4288 to 4293 in caa62bc

    
           if(!check_tx_inputs(tx, tvc)) 
        
           { 
        
             MERROR_VER("Block with id: " << id  << " has at least one transaction (id: " << tx_id << ") with wrong inputs."); 
        
             //TODO: why is this done?  make sure that keeping invalid blocks makes sense. 
        
             add_block_as_invalid(bl, id);

since check_tx_inputs can be false at one point and become true (example), the cache behavior for invalid blocks looks incorrect.

Note: the fourth and final spot add_block_as_invalid is called:

monero/src/cryptonote_core/blockchain.cpp

Lines 4307 to 4311 in caa62bc

    
           if (tx_index >= m_blocks_txs_check.size() || memcmp(&m_blocks_txs_check[tx_index++], &tx_id, sizeof(tx_id)) != 0) 
        
           { 
        
             MERROR_VER("Block with id: " << id << " has at least one transaction (id: " << tx_id << ") with wrong inputs."); 
        
             //TODO: why is this done?  make sure that keeping invalid blocks makes sense. 
        
             add_block_as_invalid(bl, id);

is slated for removal in #9135

@jeffro256 thoughts on just removing the cache in this PR?

j-berman · 2024-07-17T17:45:24Z

Hmm, actually it would seem the block's prev_id would need to be different for check_tx_inputs to have a different result (higher height = different prev id), which means the block id would be distinct, and the block id with a truthy check_tx_inputs wouldn't be in the invalid cache

Approving this PR as is, though agree the invalid cache generally seems to introduce a wider vector for abuse than benefit

Rucknium · 2024-07-22T21:31:21Z

I ran this patch on a few stressnet nodes (i.e. nodes running https://github.com/spackle-xmr/monero ) and left a few nodes unpatched. My patched nodes never got stuck because of this invalid blocks problem, but my unpatched nodes got stuck four times during this test period. The patch seems to do what it's supposed to do.

AFAIK, the problem is likely to occur when transaction propagation is imperfect. An unpatched node may get a fluffy block, not have all the fluffy block's transactions in its transaction pool, then mark the block as permanently invalid because it can't find all of the block's transactions. On stressnet, this has occurred when (1) there are many large consolidation transactions (e.g. many 144in/2out) or (2) when one or more nodes have their maximum tx pool size set above the default and the aggregate size of unconfirmed txs on the network exceeds the default maximum tx pool size.

Blockchain: fix temp fails causing alt blocks to be permanently invalid

7fdaf61

0xFFFC0000 added easy enhancement labels Jul 11, 2024

Rucknium mentioned this pull request Jul 11, 2024

jeffro256: Blockchain: fix temp fails causing alt blocks to be permanently invalid spackle-xmr/monero#14

Merged

0xFFFC0000 added the pending review label Jul 11, 2024

iamamyth approved these changes Jul 12, 2024

View reviewed changes

selsta added the daemon label Jul 14, 2024

jeffro256 mentioned this pull request Jul 15, 2024

Blockchain: fix temp fails causing alt blocks to be permanently invalid [RELEASE] #9400

Open

j-berman approved these changes Jul 17, 2024

View reviewed changes

Rucknium mentioned this pull request Jul 22, 2024

Monero Research Lab Meeting - Wed 17 July 2024, 17:00 UTC monero-project/meta#1041

Closed

Rucknium mentioned this pull request Jul 26, 2024

Monero Research Lab Meeting - Wed 24 July 2024, 17:00 UTC monero-project/meta#1044

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blockchain: fix temp fails causing alt blocks to be permanently invalid #9395

Blockchain: fix temp fails causing alt blocks to be permanently invalid #9395

jeffro256 commented Jul 11, 2024

iamamyth left a comment

j-berman commented Jul 17, 2024

j-berman commented Jul 17, 2024

Rucknium commented Jul 22, 2024

Blockchain: fix temp fails causing alt blocks to be permanently invalid #9395

Are you sure you want to change the base?

Blockchain: fix temp fails causing alt blocks to be permanently invalid #9395

Conversation

jeffro256 commented Jul 11, 2024

iamamyth left a comment

Choose a reason for hiding this comment

j-berman commented Jul 17, 2024

j-berman commented Jul 17, 2024

Rucknium commented Jul 22, 2024