-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Increment timestamp on refreshed votes #31908
Increment timestamp on refreshed votes #31908
Conversation
4aa7dc3
to
74a05df
Compare
core/src/consensus.rs
Outdated
@@ -454,6 +455,40 @@ impl Tower { | |||
self.last_vote_tx_blockhash | |||
} | |||
|
|||
pub fn refresh_last_vote_timestamp( | |||
&mut self, | |||
heaviest_bank_on_same_fork: Slot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename to heaviest_slot_on_same_fork
core/src/consensus.rs
Outdated
} else if let Some(last_voted_slot) = self.last_vote.last_voted_slot() { | ||
// If the previous vote did not send a timestamp due to clock error, try our | ||
// best to estimate the correct timestamp for the Timestamp Oracle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ick is this because this maybe_timestamp
might return None
here: https://github.com/ilya-bobyr/solana/blob/a872c0a373015a5427bdce238b32d66783595378/core/src/consensus.rs#L538
Shouldn't the if current_slot > self.last_timestamp.slot
always be true since votes are always increasing? Seems like this case we can just submit the current timestamp since it should never be true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah votes should always be increasing so that check should never fail. However the check below it
Line 593 in e0389ba
if timestamp >= self.last_timestamp.timestamp { |
could fail if
Utc::now().timestamp()
is screwed. Previously we would return None
and exclude this vote from the timestamp calculation for the oracle, but to avoid dedup we need to return a value here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm how often is Utc::now().timestamp()
going to go backwards? Feel like we could just panic in that case or retry until we get a proper timestamp. Estimating the slot time here seems kind of hacky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷. also if it was messed up for the original vote it could also be messed up for the estimation here.
do we want to kill everything over a missed timestamp though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how often is Utc::now().timestamp() going to go backwards?
whenever it wants. underlying time is sampled from std::time::SystemTime
(clock_gettime(CLOCK_REALTIME)
on linux), which is not monotonic. std::time::Instant
uses clock_gettime(CLOCK_MONOTONIC)
, but i don't think there's any way to get a timestamp from one.
i wonder if we'd be better off looking up the txs' rbh in status cache and comparing the slots to tiebreak. abusing the wallclock oracle is starting to feel too dirty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bank-frozen time actually makes a lot of sense for the timestamp oracle.
Would you still want to increment by 1sec for the vote-refresh distinction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my idea is
- Populate timestamp when freezing bank
- If UTC::now fails, use the best_estimate calc. More likely to be correct because there is no vote delay, and it's an inverse of the calc used in the stake weighted mean anyway.
- If we're worried about the compounding effect, denote which timestamps have been estimated and which have been observed. run best_estimate from the latest bank that has been observed.
- For refresh, during vote generation add a +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, I would expect the estimate off an estimate to be the same as an estimate off the original latest observed bank.
But anyway, I think this approach sounds okay, with the caveat that we probably still need to ensure that the timestamp doesn't go backward immediately after a refresh, whether the new vote timestamp is generated by UTC::now or best_estimate, so that those votes do not fail to land.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm going to need to see this to understand it. intuitively sounds way too invasive to backport
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, seems way too much to backport. For now i've stuck with the +1
idea, and updated last_timestamp
such that new vote timestamps cannot go backwards. Vote ts will be increasing or None
.
for the future: #32135 to track how to fix this properly.
74a05df
to
8d22a9f
Compare
Codecov Report
@@ Coverage Diff @@
## master #31908 +/- ##
========================================
Coverage 81.9% 81.9%
========================================
Files 767 767
Lines 208903 209002 +99
========================================
+ Hits 171223 171349 +126
+ Misses 37680 37653 -27 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
derp... left this comment pending 🤦
core/src/consensus.rs
Outdated
} else if let Some(last_voted_slot) = self.last_vote.last_voted_slot() { | ||
// If the previous vote did not send a timestamp due to clock error, try our | ||
// best to estimate the correct timestamp for the Timestamp Oracle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm going to need to see this to understand it. intuitively sounds way too invasive to backport
b535be9
to
78d2c63
Compare
78d2c63
to
24a3af8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic is a little complex, but i can't come up with much better.
it'd be good to get +1 from @CriesofCarrots too
(cherry picked from commit 01d3546)
(cherry picked from commit 01d3546)
@CriesofCarrots if you could leave a post-merge review once back in office, that'd be great! 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can't think of better logic for now. This seems like a more preferable place to start than the previous estimation code. I do wish we had better visibility into the effects; is it worth adding some metrics that report how often each of these cases are hit?
I've added some metrics here #32206 |
…s#31908) (solana-labs#32156) Increment timestamp on refreshed votes (solana-labs#31908) (cherry picked from commit 01d3546) Co-authored-by: Ashwin Sekar <ashwin@solana.com>
Problem
Follow up to the change #31879, now that we use vote tx timestamp as a tiebreaker in the vote queue deduplication logic, send a different timestamp everytime we refresh a vote.
Summary of Changes
On refresh increment the timestamp by 1 (using the time of refresh has potential to break the Timestamp Oracle). If the previous vote didn't have a timestamp try to estimate as best as possible.
Fixes #