Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node gets banned during reorgs if mining #4799

Closed
sdbondi opened this issue Oct 12, 2022 · 5 comments · Fixed by #4949
Closed

Node gets banned during reorgs if mining #4799

sdbondi opened this issue Oct 12, 2022 · 5 comments · Fixed by #4949
Assignees
Labels
A-base_node Area - The Tari base node executable and libraries C-bug Category - fixes a bug, typically associated with an issue. S-high-severity Severity - High

Comments

@sdbondi
Copy link
Member

sdbondi commented Oct 12, 2022

WARN Could not provide requested block d54cd80ab95951fbc8cc605935ee11f9d8f9bdb2cd24ca319643a2bc16198241 to peer because not stored

This results in my node being banned (1 less connection each time) because we sent a block hash, but could not provide it later due to reorg (race condition). Full block requests happen a lot during reorgs because we check if header.prev_hash != *current_meta.best_block() { and then request the full block immediately if true.

The reason for the ban is to prevent spamming of block hashes in the previous scheme.

I think we should:

  1. Validate the header in reconcile block
  2. if valid, we know PoW was done so continue - if we request the block but the peer no longer has it, dont ban and discard the message
  3. if invalid, ban the peer that sent it
  4. Problem: A peer could send a non-tip header it mined that is not part of the chain. I think that we'd have to deal with this heuristically.
@stringhandler
Copy link
Collaborator

Maybe a short ban?

@sdbondi
Copy link
Member Author

sdbondi commented Oct 13, 2022

Good point, we ban for 100s only so that is ok. I'm still concerned about network splits during reorgs and that this is a fairly common case even though there are no malicious nodes. I was mining at a faster rate, and won a few blocks fairly quickly. This resulted in almost 50% of my connections being lost. Perhaps that is ok since it is unusual to produce many blocks (~4) that quickly (within 30-40s).

There are a few options that I can think of (no particular order):

  1. We don't put reorged blocks back into the orphan pool. This allows a sender to provide a reorged-out block and will reduce requests for previously reorged-out blocks. We did this previously but took it out I think?
  2. We could accept the bans as is, and the risk that it could create more network splits during reorg situations (closes this issue).
  3. Validate the header, if the header is valid and within say 5 blocks of our tip (the "reorg zone") we don't ban the peer if they cannot provide the full block
  4. We process one block at a time, so there could be a few nodes ahead in the queue, this could take a few seconds to resolve. By that time the sending node has reorged to the better chain. We could perhaps ignore "stale" newblock messages.
  5. We could ignore non-tip blocks and rely on sync for reorgs (potentially disruptive).
  6. Keep track of recent block misses, and only ban the peer if it happens frequently (overly complex)

I guess we could try selfish mining attacks to see if we can force significant network splits.

@stringhandler
Copy link
Collaborator

I would say:

  1. Keep the block in the reorg pool. When a sync peer asks for it, return it so that you can prove that you have it. I initially thought that you might want to say "here it is, but it's not my longest chain", but I suppose the other node can work that out anyway.

@stringhandler stringhandler added this to the Stagenet Freeze milestone Nov 1, 2022
@stringhandler stringhandler added C-bug Category - fixes a bug, typically associated with an issue. A-base_node Area - The Tari base node executable and libraries labels Nov 1, 2022
@stringhandler
Copy link
Collaborator

IMPACT: This could cause a weird hard fork, so let's bump the priority

@stringhandler stringhandler added the S-high-severity Severity - High label Nov 1, 2022
@stringhandler stringhandler moved this to Must Do in Tari Esme Testnet Nov 15, 2022
@stringhandler stringhandler moved this from Must Do to Selected for development in Tari Esme Testnet Nov 15, 2022
@brianp brianp self-assigned this Nov 15, 2022
@stringhandler stringhandler moved this from Selected for development to In Progress in Tari Esme Testnet Nov 16, 2022
@sdbondi
Copy link
Member Author

sdbondi commented Nov 21, 2022

image

Might be related to the short bans for this case, started merged mining with a sudden high hash rate and saw a sudden drop in peer connections.

stringhandler pushed a commit that referenced this issue Nov 28, 2022
Description
---
Remove the `FetchBlocksByHash` handler. It was only called from a single place and although designed to handle multiple blocks it was only ever sending a single hash at once making the multi-block functionality useless. 
Instead, opt to use the existing `GetBlockByHash` handler and expand that handler to accept a new `orphans` flag. Passing this flag means we'll accept found blocks from the orphan pool,

Motivation and Context
---
Previously if a node had re-orged after a sync had started it may result in not providing the complete block for a block it claimed it had. This results in a brief ban.

Make it also return blocks from the orphan pool and let the peer figure out what to do with it. 

How Has This Been Tested?
---
Tests, and running nodes.

Fixes: #4799
Repository owner moved this from In Progress to Done in Tari Esme Testnet Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-base_node Area - The Tari base node executable and libraries C-bug Category - fixes a bug, typically associated with an issue. S-high-severity Severity - High
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants