Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

eliminate lock on record #15929

Merged
merged 6 commits into from
Mar 23, 2021

Conversation

jeffwashington
Copy link
Contributor

@jeffwashington jeffwashington commented Mar 16, 2021

Problem

PohRecorder.Record requires a PohRecorder mutex lock. This results in contention with poh. Poh locking delays cause drift.

Summary of Changes

Create a new PohRecorder 'Recorder' which can use non-blocking channels to record transactions in poh.

Fixes #

@jeffwashington jeffwashington force-pushed the non_locking_poh branch 2 times, most recently from 2e081c7 to 1551093 Compare March 17, 2021 00:08
@jeffwashington
Copy link
Contributor Author

Here are some rough metrics:
With Stephen's test on the colo machine, throttled to ~3k tps requested, 107600 hashes/tick (should be 100% of machine capability), 5s sample time, I get:
master: 17% of the time spent not hashing (mutexes, sending record entry on channel, etc.). 672 ticks (800 is ideal)
my branch: 6% of the time not hashing. 751 ticks (800 is ideal)

@jeffwashington jeffwashington force-pushed the non_locking_poh branch 13 times, most recently from e187541 to 408fb97 Compare March 19, 2021 18:20
@codecov
Copy link

codecov bot commented Mar 19, 2021

Codecov Report

Merging #15929 (234b4a8) into master (482c027) will decrease coverage by 0.0%.
The diff coverage is 84.7%.

@@            Coverage Diff            @@
##           master   #15929     +/-   ##
=========================================
- Coverage    79.9%    79.9%   -0.1%     
=========================================
  Files         409      409             
  Lines      107723   107864    +141     
=========================================
+ Hits        86153    86247     +94     
- Misses      21570    21617     +47     

@jeffwashington jeffwashington requested a review from sakridge March 22, 2021 02:43
Copy link
Contributor

@sakridge sakridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add performance information from a gcp bench-tps run and see where we are in terms of slot time under load and just the TPS. Can run it on gcp manually or through the same configuration as #performance-results

@sakridge sakridge requested a review from carllin March 22, 2021 17:42
@sakridge
Copy link
Contributor

Want to mark ready-for-review if that's the case?

@jeffwashington jeffwashington marked this pull request as ready for review March 22, 2021 18:14
@jeffwashington jeffwashington added the CI Pull Request is ready to enter CI label Mar 22, 2021
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Mar 22, 2021
@jeffwashington
Copy link
Contributor Author

jeffwashington commented Mar 22, 2021

Can we also add performance information from a gcp bench-tps run and see where we are in terms of slot time under load and just the TPS. Can run it on gcp manually or through the same configuration as #performance-results

last perf run:
mean_tps: 15067
max_tps: 28837
mean_confirmation_ms: 1586
max_confirmation_ms: 3513
99th_percentile_confirmation_ms: 2461
max_tower_distance: 31
last_tower_distance: 31
slots_per_second: 1.273

this pr:
mean_tps: 24424
max_tps: 70562
mean_confirmation_ms: 1777
max_confirmation_ms: 12530
99th_percentile_confirmation_ms: 5267
max_tower_distance: 64
last_tower_distance: 31
slots_per_second: 1.523

@jeffwashington jeffwashington requested a review from sakridge March 22, 2021 20:17
@sakridge
Copy link
Contributor

Seems to make poh_service.elapsed_ms a bit worse, but probably because it's going a lot faster.

poh-service-elapsed-ms

Copy link
Contributor

@sakridge sakridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sakridge sakridge added the v1.6 label Mar 23, 2021
@jeffwashington jeffwashington merged commit 57ba86c into solana-labs:master Mar 23, 2021
mergify bot pushed a commit that referenced this pull request Mar 23, 2021
* eliminate lock on record

* use same error as MaxHeightReached

* clippy

* review feedback

* refactor should_tick code

* pr feedback

(cherry picked from commit 57ba86c)
jeffwashington added a commit that referenced this pull request Mar 23, 2021
mergify bot added a commit that referenced this pull request Mar 30, 2021
* eliminate lock on record

* use same error as MaxHeightReached

* clippy

* review feedback

* refactor should_tick code

* pr feedback

(cherry picked from commit 57ba86c)

Co-authored-by: Jeff Washington (jwash) <75863576+jeffwashington@users.noreply.github.com>
@brooksprumo brooksprumo mentioned this pull request Aug 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants