Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ban peer that advertises higher PoW than able to provide #3478

Conversation

sdbondi
Copy link
Member

@sdbondi sdbondi commented Oct 20, 2021

Description

  • Can only transition to HeaderSync if claimed chain metadata is
    advertised
  • HeaderSync is now aware of the claimed ChainMetadata
  • HeaderSync now assumes (invariant) that all sync peers have
    claimed a higher accumulated PoW and will ban them if the validated
    accumulated difficulty does not reach the claimed acc diff.
  • Adds ban condition in determine_sync_status phase, if a peer is not
    able to improve on the local chain strength (because we know that in
    order to be selected for header sync it must have advertised a
    stronger chain)
  • Ban if header sync completes but is less than the claimed PoW.
    This is not strictly necessary since they were still able
    to provide a stronger chain as per Nakamoto consensus, but could still
    indicate some malicious intent.
  • If sync fails for all peers, the state machine continues rather than
    WAITING. This removes the disruption that false metadata can cause.
  • fix select_sync_peers to include peers that claim to provide enough full
    blocks for our pruning horizon (fixes cucumber test)

Motivation and Context

A peer may disrupt a base node by sending false chain metadata. This PR ensures that if the remote peer cannot provide
the claimed chain, it will get banned and the base node will continue in listening state immediately.

How Has This Been Tested?

Ran sync cucumber tests
Manually, by creating a node that inflates it's advertised accumulated Pow by a small amount and checking that the node is banned by the syncing node.
not entirely related: add block sync RPC unit tests

@sdbondi sdbondi force-pushed the core-ban-peer-for-advertising-incorrect-chain-metadata branch 2 times, most recently from 52f1e6f to 2d4a8fb Compare October 20, 2021 14:24
- Can only transition to `HeaderSync` if claimed chain metadata is
  advertised
- `HeaderSync` is now aware of the claimed `ChainMetadata`
- `HeaderSync` now assumes (invariant) that all sync peers have
  claimed a higher accumulated PoW and will ban them if the validated
  accumulated difficulty does not reach the claimed acc diff.
- Adds ban condition in `determine_sync_status` phase, if a peer is not
  able to improve on the local chain strength (because we know that in
  order to be selected for header sync it must have advertised a
  stronger chain)
- Adds ban condition if header sync completes but is less than the
  claimed PoW. This is not strictly necessary since they were still able
  to provide a stronger chain as per Nakamoto consensus, but could still
  indicate some malicious intent.
- If sync fails for all peers, the state machine continues rather than
  `WAITING`. This removes the disruption that false metadata can cause.
- fix `select_sync_peers` to include peers claim that provide a enough full
  blocks for _our_ pruning horison (fixes cucumber test)
  higher than the local pruned
@sdbondi sdbondi force-pushed the core-ban-peer-for-advertising-incorrect-chain-metadata branch from 2d4a8fb to d124e02 Compare October 20, 2021 14:25
Copy link
Collaborator

@stringhandler stringhandler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with the code for the most part, but I don't like the extra pruned_height check in Listening

stringhandler
stringhandler previously approved these changes Oct 21, 2021
…ct-chain-metadata

* development:
  fix: fix recovery test reporting message (tari-project#3479)
  chore: improve cucumber tests to wait for broadcast (tari-project#3461)
  test: use TCP node for daily sync test (tari-project#3464)
Copy link
Collaborator

@SWvheerden SWvheerden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but the attack is still possible. We need to check each peer to its provided meta_data not just what headers it can provide.

actual: None,
local: local_total_accumulated_difficulty,
}),
SyncStatus::WereAhead => Err(BlockHeaderSyncError::PeerSentInaccurateChainMetadata {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when we are the first node in our peer list to receive a block via sync? Won't we ban all our other peers?
We select our sync peers not only on who sent meta_data to us but also on whom we are connected.
So a peer might have provided accurate metadata and did not cause the activation of sync mode, but it can get banned here because some other peer provided fake metadata.

So I am thinking of this, what happens if a peer sends invalid metadata, but does not respond to the dial request. Your node won't sync to that peer because it thinks it's offline and keep it out of sync, but it will ban the other good nodes?

Copy link
Member Author

@sdbondi sdbondi Oct 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check each peer to its provided meta_data not just what headers it can provide.

Not sure I follow, validation of headers (PoW) is the only way to determine if the peer is honest about its metadata claim

We select our sync peers not only on who sent meta_data to us but also on whom we are connected.

Good point, which is why I made this no longer the case. In order for the bn to enter the HeaderSync state, the peer MUST have made a claim to have a better block. This is an invariant of the HeaderSync state. So basically, I removed the code that dials all connected peers and now leave it up to the listening state to provide peers that have made a better metadata claim.

So I am thinking of this, what happens if a peer sends invalid metadata, but does not respond to the dial request.

Another good point. However, a peer is already connected when they make a metadata claim. A peer could time it to disconnect, which, as you say, would cause the header sync to fail. So to combat this, if a header sync fails the state machine immediately continues to listening state. This should cause no disruption (that is, a component listening for state events SHOULD NOT disrupted by the node entering the HeaderSync state - e.g mining should continue unless the chain has been confirmed to be behind)

EDIT: I believe the WAITING state after a failed header sync is the primary cause of disruptions, the change here is to follow a "all peers couldn't live up to their promises" header sync with going straight back to is_synced = true listening state as though nothing happened (and ban the peers that were naughty)

it will ban the other good nodes?

No it will only ban the node that was unable to back up its claim, due to the change that a claim has to be made to be included in HeaderSync sync_peers list.

Copy link
Member Author

@sdbondi sdbondi Oct 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too sure of the behaviour of upstream components (miner, maybe mempool, that's it?). I will look into and run a test on Sunday, but AFAIK it will not interrupt mining. The miner will only not begin if is_bootstrapped is false but after that will always continue on - will run a test to be sure, could be wrong.

EDIT: if necessary, another flag could be added is_synced that is set to false when the headers have proven a stronger PoW
EDIT2: Another potential change would be for header sync to select the active connection, rather than using dial_peer which will dial if necessary, which should never be - but I don't think this can be called an attack if there is no disruption caused.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, then we cover this.

@sdbondi
Copy link
Member Author

sdbondi commented Oct 22, 2021

Pruned mode network only passed locally, seems flakey now? Will check

EDIT: passed every time 🤷 - rerunning on CI

SWvheerden
SWvheerden previously approved these changes Oct 22, 2021
Copy link
Collaborator

@SWvheerden SWvheerden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aviator-app aviator-app bot added mq-failed and removed mq-failed labels Oct 22, 2021
@sdbondi sdbondi dismissed stale reviews from SWvheerden and stringhandler via b177c09 October 24, 2021 06:56
@sdbondi sdbondi force-pushed the core-ban-peer-for-advertising-incorrect-chain-metadata branch from b177c09 to 3018cfe Compare October 24, 2021 06:57
stringhandler
stringhandler previously approved these changes Oct 24, 2021
@aviator-app aviator-app bot added mq-failed and removed mq-failed labels Oct 24, 2021
@aviator-app aviator-app bot removed the mq-failed label Oct 24, 2021
@sdbondi sdbondi force-pushed the core-ban-peer-for-advertising-incorrect-chain-metadata branch from 12b7be7 to 5689976 Compare October 24, 2021 09:49
@sdbondi sdbondi force-pushed the core-ban-peer-for-advertising-incorrect-chain-metadata branch from 5689976 to 9e6fc82 Compare October 24, 2021 12:31
stringhandler
stringhandler previously approved these changes Oct 25, 2021
@aviator-app aviator-app bot removed the mq-failed label Oct 25, 2021
…ct-chain-metadata

* development: (32 commits)
  test: improve cucumber with wallets (tari-project#3507)
  feat: optimize pending transactions inbound query (tari-project#3500)
  test: add trace logs to wallet's base node monitor (tari-project#3502)
  fix: improve test Wallet should display transactions made (tari-project#3501)
  test: change timeouts in ci to reasonable values (tari-project#3494)
  fix: correct panic in tracing for comms (tari-project#3499)
  feat: optimize get transactions query (tari-project#3496)
  fix: fix config file whitespace issue when auto generated in windows (tari-project#3491)
  bump to rerun tests
  fix: improve responsiveness of wallet base node switching (tari-project#3488)
  feat: add decay_params method (tari-project#3454)
  Revert "macos-11"
  macos-11
  clean
  remove scripts after install
  install to tmp then use script to copy to home
  wip
  wip
  path
  wip
  ...
Copy link
Collaborator

@stringhandler stringhandler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man this PR is getting stuck in the tumbler. I am going to merge it since the CI seems to have broken elsewhere

@stringhandler stringhandler merged commit c04fca5 into tari-project:development Oct 28, 2021
@sdbondi sdbondi deleted the core-ban-peer-for-advertising-incorrect-chain-metadata branch October 28, 2021 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants