Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

Increment timestamp on refreshed votes #31908

Merged
merged 1 commit into from
Jun 15, 2023

Conversation

AshwinSekar
Copy link
Contributor

@AshwinSekar AshwinSekar commented Jun 1, 2023

Problem

Follow up to the change #31879, now that we use vote tx timestamp as a tiebreaker in the vote queue deduplication logic, send a different timestamp everytime we refresh a vote.

Summary of Changes

On refresh increment the timestamp by 1 (using the time of refresh has potential to break the Timestamp Oracle). If the previous vote didn't have a timestamp try to estimate as best as possible.

Fixes #

@AshwinSekar AshwinSekar force-pushed the send-later-ts-refresh branch from 4aa7dc3 to 74a05df Compare June 1, 2023 18:50
@AshwinSekar AshwinSekar added v1.14 v1.16 PRs that should be backported to v1.16 labels Jun 1, 2023
@AshwinSekar AshwinSekar marked this pull request as ready for review June 1, 2023 18:59
@@ -454,6 +455,40 @@ impl Tower {
self.last_vote_tx_blockhash
}

pub fn refresh_last_vote_timestamp(
&mut self,
heaviest_bank_on_same_fork: Slot,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to heaviest_slot_on_same_fork

Comment on lines 470 to 472
} else if let Some(last_voted_slot) = self.last_vote.last_voted_slot() {
// If the previous vote did not send a timestamp due to clock error, try our
// best to estimate the correct timestamp for the Timestamp Oracle.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ick is this because this maybe_timestamp might return None here: https://github.com/ilya-bobyr/solana/blob/a872c0a373015a5427bdce238b32d66783595378/core/src/consensus.rs#L538

Shouldn't the if current_slot > self.last_timestamp.slot always be true since votes are always increasing? Seems like this case we can just submit the current timestamp since it should never be true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah votes should always be increasing so that check should never fail. However the check below it

if timestamp >= self.last_timestamp.timestamp {

could fail if Utc::now().timestamp() is screwed. Previously we would return None and exclude this vote from the timestamp calculation for the oracle, but to avoid dedup we need to return a value here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm how often is Utc::now().timestamp() going to go backwards? Feel like we could just panic in that case or retry until we get a proper timestamp. Estimating the slot time here seems kind of hacky.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷. also if it was messed up for the original vote it could also be messed up for the estimation here.
do we want to kill everything over a missed timestamp though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how often is Utc::now().timestamp() going to go backwards?

whenever it wants. underlying time is sampled from std::time::SystemTime (clock_gettime(CLOCK_REALTIME) on linux), which is not monotonic. std::time::Instant uses clock_gettime(CLOCK_MONOTONIC), but i don't think there's any way to get a timestamp from one.

i wonder if we'd be better off looking up the txs' rbh in status cache and comparing the slots to tiebreak. abusing the wallclock oracle is starting to feel too dirty

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bank-frozen time actually makes a lot of sense for the timestamp oracle.
Would you still want to increment by 1sec for the vote-refresh distinction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my idea is

  • Populate timestamp when freezing bank
  • If UTC::now fails, use the best_estimate calc. More likely to be correct because there is no vote delay, and it's an inverse of the calc used in the stake weighted mean anyway.
  • If we're worried about the compounding effect, denote which timestamps have been estimated and which have been observed. run best_estimate from the latest bank that has been observed.
  • For refresh, during vote generation add a +1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, I would expect the estimate off an estimate to be the same as an estimate off the original latest observed bank.

But anyway, I think this approach sounds okay, with the caveat that we probably still need to ensure that the timestamp doesn't go backward immediately after a refresh, whether the new vote timestamp is generated by UTC::now or best_estimate, so that those votes do not fail to land.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm going to need to see this to understand it. intuitively sounds way too invasive to backport

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, seems way too much to backport. For now i've stuck with the +1 idea, and updated last_timestamp such that new vote timestamps cannot go backwards. Vote ts will be increasing or None.

for the future: #32135 to track how to fix this properly.

@AshwinSekar AshwinSekar force-pushed the send-later-ts-refresh branch from 74a05df to 8d22a9f Compare June 1, 2023 19:57
@codecov
Copy link

codecov bot commented Jun 1, 2023

Codecov Report

Merging #31908 (24a3af8) into master (c01250f) will increase coverage by 0.0%.
The diff coverage is 69.8%.

@@           Coverage Diff            @@
##           master   #31908    +/-   ##
========================================
  Coverage    81.9%    81.9%            
========================================
  Files         767      767            
  Lines      208903   209002    +99     
========================================
+ Hits       171223   171349   +126     
+ Misses      37680    37653    -27     

Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

derp... left this comment pending 🤦

Comment on lines 470 to 472
} else if let Some(last_voted_slot) = self.last_vote.last_voted_slot() {
// If the previous vote did not send a timestamp due to clock error, try our
// best to estimate the correct timestamp for the Timestamp Oracle.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm going to need to see this to understand it. intuitively sounds way too invasive to backport

@AshwinSekar AshwinSekar force-pushed the send-later-ts-refresh branch 2 times, most recently from b535be9 to 78d2c63 Compare June 14, 2023 19:35
@AshwinSekar AshwinSekar force-pushed the send-later-ts-refresh branch from 78d2c63 to 24a3af8 Compare June 14, 2023 19:37
@AshwinSekar AshwinSekar requested review from t-nelson and carllin June 14, 2023 19:57
Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic is a little complex, but i can't come up with much better. :shipit:

it'd be good to get +1 from @CriesofCarrots too

@steviez steviez removed their request for review June 15, 2023 17:16
@AshwinSekar AshwinSekar merged commit 01d3546 into solana-labs:master Jun 15, 2023
mergify bot pushed a commit that referenced this pull request Jun 15, 2023
mergify bot pushed a commit that referenced this pull request Jun 15, 2023
mergify bot added a commit that referenced this pull request Jun 15, 2023
…32157)

Increment timestamp on refreshed votes (#31908)

(cherry picked from commit 01d3546)

Co-authored-by: Ashwin Sekar <ashwin@solana.com>
@t-nelson
Copy link
Contributor

logic is a little complex, but i can't come up with much better. :shipit:

it'd be good to get +1 from @CriesofCarrots too

@CriesofCarrots if you could leave a post-merge review once back in office, that'd be great! 🙏

mergify bot added a commit that referenced this pull request Jun 15, 2023
…32156)

Increment timestamp on refreshed votes (#31908)

(cherry picked from commit 01d3546)

Co-authored-by: Ashwin Sekar <ashwin@solana.com>
Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can't think of better logic for now. This seems like a more preferable place to start than the previous estimation code. I do wish we had better visibility into the effects; is it worth adding some metrics that report how often each of these cases are hit?

@AshwinSekar
Copy link
Contributor Author

AshwinSekar commented Jun 20, 2023

I do wish we had better visibility into the effects; is it worth adding some metrics that report how often each of these cases are hit?

I've added some metrics here #32206

wen-coding pushed a commit to wen-coding/solana that referenced this pull request Aug 15, 2023
wen-coding pushed a commit to wen-coding/solana that referenced this pull request Aug 15, 2023
wen-coding pushed a commit to wen-coding/solana that referenced this pull request Aug 15, 2023
bw-solana pushed a commit to bw-solana/solana that referenced this pull request Jan 10, 2025
…s#31908) (solana-labs#32156)

Increment timestamp on refreshed votes (solana-labs#31908)

(cherry picked from commit 01d3546)

Co-authored-by: Ashwin Sekar <ashwin@solana.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
v1.16 PRs that should be backported to v1.16
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants