-
Notifications
You must be signed in to change notification settings - Fork 4.6k
records hash of timed-out pull responses #17263
Conversation
96c0150
to
3fd6f5f
Compare
Codecov Report
@@ Coverage Diff @@
## master #17263 +/- ##
=======================================
Coverage 82.6% 82.7%
=======================================
Files 423 423
Lines 118111 118111
=======================================
+ Hits 97663 97690 +27
+ Misses 20448 20421 -27 |
Gossip should record hash of pull responses which are timed out and fail to insert: https://github.com/solana-labs/solana/blob/ed51cde37/core/src/crds_gossip_pull.rs#L397-L400 so that they are excluded from the next pull request: https://github.com/solana-labs/solana/blob/ed51cde37/core/src/crds_gossip_pull.rs#L486-L491 otherwise the next pull request will likely include the same timed out values and waste bandwidth.
3fd6f5f
to
46e4b65
Compare
let failed_inserts = responses | ||
.into_iter() | ||
.filter_map(upsert) | ||
.map(|resp| hash(&bincode::serialize(&resp).unwrap())) | ||
.collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, these values eventually make it into the CrdsGossipPull::failed_inserts
here: https://github.com/solana-labs/solana/blob/master/core/src/crds_gossip_pull.rs#L446-L447. Could this vector now grow large/potentially have duplicates for the same hash response (if we got the same outdate response from multiple validators for instance) because we removed the crds.upserts(&response)
check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 3 if
branches here:
https://github.com/solana-labs/solana/blob/5e6b00fe9/core/src/crds_gossip_pull.rs#L390-L399
The first 2, push the value to failed_inserts
if crds.upsert(&response)
returns false.
The 3rd one, basically discards a pull-response, but does not record its hash any where. This change is updating the 3rd branch to record the hash of discarded value on this branch as well to exclude it from the next pull-request.
So, for the first 2 branches noting has changed; Only the 3rd branch now also records hash of discarded value like the first 2 branches.
Could this vector now grow large/potentially have duplicates for the same hash response
That is a good point. Ideally if the hash value is recorded, then it is excluded from the next pull-request (by adding it to the bloom filters), and so it should not appear in the pull responses again (until it is purged from failed_inserts
vec-deque). So as long as the failed_inserts
vector is updated before the next pull request, then they should be excluded from later pull responses, and we wouldn't expect to get duplicates.
But still, I agree that potentially this vector can get too large, and duplicates may creep in somehow (e.g. if pull-responses arrive very late). I took a note to address these, perhaps add an upper bound for the size of failed_inserts
in a follow up change.
because we removed the crds.upserts(&response) check?
I do not think I removed it. It is still there in filter_map
closure.
Problem
Gossip should record hash of pull responses which are timed out and
fail to insert:
https://github.com/solana-labs/solana/blob/ed51cde37/core/src/crds_gossip_pull.rs#L397-L400
so that they are excluded from the next pull request:
https://github.com/solana-labs/solana/blob/ed51cde37/core/src/crds_gossip_pull.rs#L486-L491
otherwise the next pull request will likely include the same timed out
values and waste bandwidth.
Summary of Changes
record hash of timed-out pull responses