Fix peer down-scoring behaviour when gossip blobs/columns are received after `getBlobs` or reconstruction #6686

jimmygchen · 2024-12-12T05:39:08Z

Issue Addressed

Fixed a peer disconnection issue recently reported by @hangleang.

On PeerDAS devnet, he noticed that the proposer Grandline node gets disconnect by Lighthouse after publishing the columns. Logs show that Lighthouse performed reconstruction after receiving 50% (64 of 128) data columns, and ignored the remaining columns received from gossip. Additionally Lighthouse also penalise the peer - even though it's a HighToleranceError, due to the large number of messages for data columns that we can receive after reconstruction (50%, or up to 64 columns per slot), the node got disconnected immediately. He also noticed that it happened not only on Grandine, but also on Lighthouse peers as well.

Logs:

Nov 19 04:28:44.943 DEBG On-time head block                      set_as_head_time_ms: 7, imported_time_ms: 442, attestable_delay_ms: 1493, available_delay_ms: 1493, execution_time_ms: 3, consensus_time_ms: 65, blob_delay_ms: 1493, observed_delay_ms: 1338, total_delay_ms: 1942, slot: 40, proposer_index: 181, block_root: 0x740b40a44a9611ffddf53c1e7ffbaa88d21592e885e877fd14615e12e3acb426, service: beacon
Nov 19 04:28:44.944 DEBG Issuing engine_forkchoiceUpdated        current_slot: 40, head_block_root: 0x740b40a44a9611ffddf53c1e7ffbaa88d21592e885e877fd14615e12e3acb426, head_block_hash: 0x756e98dc89ccbab138b899d903513299899b5b4a0dbe828413a18e82d3621b76, justified_block_hash: 0x0000000000000000000000000000000000000000000000000000000000000000, finalized_block_hash: 0x0000000000000000000000000000000000000000000000000000000000000000, service: exec
Nov 19 04:28:44.944 DEBG No change in canonical head             head: 0x740b40a44a9611ffddf53c1e7ffbaa88d21592e885e877fd14615e12e3acb426, service: beacon
Nov 19 04:28:44.944 DEBG No change in canonical head             head: 0x740b40a44a9611ffddf53c1e7ffbaa88d21592e885e877fd14615e12e3acb426, service: beacon
Nov 19 04:28:44.944 DEBG Could not verify column sidecar for gossip. Ignoring the column sidecar, index: 117, block_root: 0x740b40a44a9611ffddf53c1e7ffbaa88d21592e885e877fd14615e12e3acb426, slot: 40, error: PriorKnown { proposer: 181, slot: Slot(40), index: 117 }
Nov 19 04:28:44.944 DEBG Peer score adjusted                     score: -5.99, peer_id: 16Uiu2HAmVgsM5dYcu4WL4ww2j3Wp8V5vBNCmkgWRDnPUbPC9CKhh, msg: gossip_data_column_high, service: libp2p

We should not penalise the node for sending us valid blobs / data columns on time. This is less obvious in Deneb as due to the lower message count, but gets worse in PeerDAS with higher message count for columns and reconstruction.

Proposed Changes

Remove peer penalty for blobs and columns that are received from gossip for the first time, but already available via other channels, such as getBlobs, supernode column reconstruction, or RPC.

Note that duplicates received after the first message is filtered by gossip. We only send IDONTWANT when publishing columns, and because we do gradual publication, IDONTWANT may not be sent to peers immediately - probably something to consider: send IDONTWANT immediately and but gradually publish to avoid excessive bandwidth usage.

Thanks @hangleang for reporting this!

…hey are recieved from the EL or available via column reconstruction.

jimmygchen · 2024-12-16T04:34:28Z

@mergify queue

mergify · 2024-12-16T04:34:36Z

queue

🛑 The pull request has been removed from the queue `default`

Pull request #6686 has been dequeued by a dequeue command.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

jimmygchen · 2024-12-16T04:36:04Z

@mergify dequeue

mergify · 2024-12-16T04:36:11Z

This pull request has been removed from the queue for the following reason: pull request dequeued.

Pull request #6686 has been dequeued by a dequeue command.

You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it.

If you want to requeue this pull request, you need to post a comment with the text: @mergifyio requeue

mergify · 2024-12-16T04:36:11Z

dequeue

✅ The pull request has been removed from the queue `default`

jimmygchen · 2024-12-16T04:49:22Z

@mergify requeue

mergify · 2024-12-16T04:49:45Z

requeue

✅ This pull request will be re-embarked automatically

The followup queue command will be automatically executed to re-embark the pull request

mergify · 2024-12-16T04:49:46Z

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at 847c801

Fix peer disconnection when gossip blobs/columns are received after t…

5a0a378

…hey are recieved from the EL or available via column reconstruction.

jimmygchen added ready-for-review The code is ready for review das Data Availability Sampling labels Dec 12, 2024

michaelsproul approved these changes Dec 16, 2024

View reviewed changes

michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Dec 16, 2024

mergify bot added a commit that referenced this pull request Dec 16, 2024

Merge of #6686

9dae233

This was referenced Dec 16, 2024

merge queue: embarking unstable (c92c07f) and [#6583 + #6588 + #6589 + #6618 + #6686] together #6701

Closed

Fix web3signer test fails on macOS #6588

Merged

Remove req_id from CustodyId #6589

Merged

builder gas limit & some refactoring #6583

Merged

mergify bot added a commit that referenced this pull request Dec 16, 2024

Merge of #6686

c4756af

This was referenced Dec 16, 2024

merge queue: embarking unstable (a6de0d5) and [#6577 + #6583 + #6588 + #6589 + #6618 + #6686] together #6702

Closed

Add CLI flag for HTTP API token path (VC) #6577

Merged

Write range sync tests in external event-driven form #6618

Merged

mergify bot merged commit 847c801 into sigp:unstable Dec 16, 2024
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix peer down-scoring behaviour when gossip blobs/columns are received after `getBlobs` or reconstruction #6686

Fix peer down-scoring behaviour when gossip blobs/columns are received after `getBlobs` or reconstruction #6686

jimmygchen commented Dec 12, 2024 •

edited

Loading

jimmygchen commented Dec 16, 2024

mergify bot commented Dec 16, 2024 •

edited

Loading

jimmygchen commented Dec 16, 2024

mergify bot commented Dec 16, 2024

mergify bot commented Dec 16, 2024

jimmygchen commented Dec 16, 2024

mergify bot commented Dec 16, 2024

mergify bot commented Dec 16, 2024 •

edited

Loading

Fix peer down-scoring behaviour when gossip blobs/columns are received after getBlobs or reconstruction #6686

Fix peer down-scoring behaviour when gossip blobs/columns are received after getBlobs or reconstruction #6686

Conversation

jimmygchen commented Dec 12, 2024 • edited Loading

Issue Addressed

Proposed Changes

jimmygchen commented Dec 16, 2024

mergify bot commented Dec 16, 2024 • edited Loading

🛑 The pull request has been removed from the queue default

jimmygchen commented Dec 16, 2024

mergify bot commented Dec 16, 2024

mergify bot commented Dec 16, 2024

✅ The pull request has been removed from the queue default

jimmygchen commented Dec 16, 2024

mergify bot commented Dec 16, 2024

✅ This pull request will be re-embarked automatically

mergify bot commented Dec 16, 2024 • edited Loading

✅ The pull request has been merged automatically

Fix peer down-scoring behaviour when gossip blobs/columns are received after `getBlobs` or reconstruction #6686

Fix peer down-scoring behaviour when gossip blobs/columns are received after `getBlobs` or reconstruction #6686

jimmygchen commented Dec 12, 2024 •

edited

Loading

mergify bot commented Dec 16, 2024 •

edited

Loading

🛑 The pull request has been removed from the queue `default`

✅ The pull request has been removed from the queue `default`

mergify bot commented Dec 16, 2024 •

edited

Loading