Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[single node perf] Recalibrate and improve regression perf test #14894

Merged
merged 1 commit into from
Oct 9, 2024

Conversation

igor-aptos
Copy link
Contributor

Recalibrate for RG change
Update limits to be based on min_ratio / max_ratio of many runs update module working set to 100

Description

How Has This Been Tested?

Key Areas to Review

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (specify)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

@igor-aptos igor-aptos added CICD:run-execution-performance-test Run execution performance test CICD:run-execution-performance-full-test Run execution performance test (full version) labels Oct 8, 2024
Copy link

trunk-io bot commented Oct 8, 2024

⏱️ 14h 59m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
execution-performance / single-node-performance 5h 42m 🟥🟥🟥🟥🟩 (+18 more)
single-node-performance 2h 42m 🟥🟥🟥🟥🟥 (+2 more)
execution-performance / test-target-determinator 1h 17m 🟩🟩🟩🟩🟩 (+18 more)
test-target-determinator 33m 🟩🟩🟩🟩🟩 (+5 more)
test-target-determinator 27m 🟩🟩🟩🟩🟩 (+3 more)
rust-cargo-deny 22m 🟩🟩🟩🟩🟩 (+7 more)
check-dynamic-deps 20m 🟩🟩🟩🟩🟩 (+11 more)
dispatch_event 15m 🟩
dispatch_event 15m 🟩
dispatch_event 15m 🟩
dispatch_event 14m 🟩
check 11m 🟩🟩🟩
rust-move-tests 10m 🟩
rust-move-tests 10m 🟩
rust-move-tests 10m 🟩

settingsfeedbackdocs ⋅ learn more about trunk.io

@@ -766,6 +811,12 @@ def print_table(
if errors:
print("Errors: ")
print("\n".join(errors))
print("""If you expect your PR to change the performance, you need to recalibrate the values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@igor-aptos igor-aptos force-pushed the igor/recalibrate_for_rg_change branch from bfe3c55 to 1ac500f Compare October 8, 2024 17:09
@@ -766,6 +811,12 @@ def print_table(
if errors:
print("Errors: ")
print("\n".join(errors))
print("""If you expect your PR to change the performance, you need to recalibrate the values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines 152 to 154
# (or if from log:
# transaction_type module_working_set_size executor_type block_size expected_tps tps
# )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

CALIBRATION_SEPARATOR = " "

# transaction_type module_working_set_size executor_type min_ratio max_ratio median
# transaction_type module_working_set_size executor_type count min_ratio max_ratio median
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does "count" come from originally again? (I read the query and it seems the binary prints it?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, it's count(as="count")

@igor-aptos igor-aptos force-pushed the igor/recalibrate_for_rg_change branch from 1ac500f to 4b5fbe1 Compare October 8, 2024 17:26
@igor-aptos igor-aptos requested a review from wrwg October 8, 2024 17:40
@igor-aptos igor-aptos force-pushed the igor/recalibrate_for_rg_change branch from 4b5fbe1 to ecaef0e Compare October 8, 2024 19:29
@igor-aptos igor-aptos requested a review from a team as a code owner October 8, 2024 19:29
@igor-aptos igor-aptos force-pushed the igor/recalibrate_for_rg_change branch from ecaef0e to 2b7d626 Compare October 8, 2024 19:52
@@ -17,6 +17,11 @@ on:
default: false
type: boolean
description: Run complete version of the tests
IS_MAINNET_RUN:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stale

IGNORE_TARGET_DETERMINATION:
required: false
default: false
default: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it's a good idea to ignore by default ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only for jobs triggered from the UI, right?

if we are triggering from the UI, we really shouldn't be skipping if there are no changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right.

@igor-aptos igor-aptos force-pushed the igor/recalibrate_for_rg_change branch from 2b7d626 to 8164ee9 Compare October 8, 2024 20:10
@igor-aptos igor-aptos enabled auto-merge (squash) October 8, 2024 20:10

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@igor-aptos igor-aptos force-pushed the igor/recalibrate_for_rg_change branch 2 times, most recently from f174199 to d07e93a Compare October 8, 2024 21:02
@igor-aptos igor-aptos removed CICD:run-execution-performance-test Run execution performance test CICD:run-execution-performance-full-test Run execution performance test (full version) labels Oct 9, 2024
@igor-aptos igor-aptos force-pushed the igor/recalibrate_for_rg_change branch 2 times, most recently from 8776a21 to 390fa54 Compare October 9, 2024 04:58
@igor-aptos igor-aptos force-pushed the igor/recalibrate_for_rg_change branch from 390fa54 to 955648c Compare October 9, 2024 05:02
@igor-aptos igor-aptos enabled auto-merge (squash) October 9, 2024 05:02

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Recalibrate for RG change
Update limits to be based on min_ratio / max_ratio of many runs
update module working set to 100
@igor-aptos igor-aptos force-pushed the igor/recalibrate_for_rg_change branch from 955648c to 419098f Compare October 9, 2024 07:32

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Oct 9, 2024

✅ Forge suite realistic_env_max_load success on 419098febd91af2ced5acef684f4910c138f35dc

two traffics test: inner traffic : committed: 13788.90 txn/s, latency: 2884.92 ms, (p50: 2700 ms, p70: 3000, p90: 3000 ms, p99: 3600 ms), latency samples: 5242860
two traffics test : committed: 100.02 txn/s, latency: 2681.08 ms, (p50: 2400 ms, p70: 2600, p90: 2800 ms, p99: 12000 ms), latency samples: 1680
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.229, avg: 0.216", "QsPosToProposal: max: 0.310, avg: 0.255", "ConsensusProposalToOrdered: max: 0.309, avg: 0.295", "ConsensusOrderedToCommit: max: 0.465, avg: 0.445", "ConsensusProposalToCommit: max: 0.759, avg: 0.740"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 1.16s no progress at version 2841820 (avg 0.20s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 8.39s no progress at version 2841818 (avg 8.39s) [limit 15].
Test Ok

Copy link
Contributor

github-actions bot commented Oct 9, 2024

✅ Forge suite framework_upgrade success on 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775 ==> 419098febd91af2ced5acef684f4910c138f35dc

Compatibility test results for 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775 ==> 419098febd91af2ced5acef684f4910c138f35dc (PR)
Upgrade the nodes to version: 419098febd91af2ced5acef684f4910c138f35dc
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1293.66 txn/s, submitted: 1296.52 txn/s, failed submission: 2.86 txn/s, expired: 2.86 txn/s, latency: 2302.53 ms, (p50: 2100 ms, p70: 2400, p90: 3600 ms, p99: 4800 ms), latency samples: 117500
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1188.30 txn/s, submitted: 1190.79 txn/s, failed submission: 2.48 txn/s, expired: 2.48 txn/s, latency: 2597.55 ms, (p50: 2400 ms, p70: 3000, p90: 3700 ms, p99: 5100 ms), latency samples: 105220
5. check swarm health
Compatibility test for 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775 ==> 419098febd91af2ced5acef684f4910c138f35dc passed
Upgrade the remaining nodes to version: 419098febd91af2ced5acef684f4910c138f35dc
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1205.82 txn/s, submitted: 1208.28 txn/s, failed submission: 2.46 txn/s, expired: 2.46 txn/s, latency: 2673.66 ms, (p50: 2400 ms, p70: 2700, p90: 4200 ms, p99: 5600 ms), latency samples: 107700
Test Ok

Copy link
Contributor

github-actions bot commented Oct 9, 2024

✅ Forge suite compat success on 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775 ==> 419098febd91af2ced5acef684f4910c138f35dc

Compatibility test results for 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775 ==> 419098febd91af2ced5acef684f4910c138f35dc (PR)
1. Check liveness of validators at old version: 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775
compatibility::simple-validator-upgrade::liveness-check : committed: 12364.77 txn/s, latency: 2527.14 ms, (p50: 2100 ms, p70: 2200, p90: 5400 ms, p99: 7600 ms), latency samples: 442020
2. Upgrading first Validator to new version: 419098febd91af2ced5acef684f4910c138f35dc
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 6867.90 txn/s, latency: 4015.12 ms, (p50: 4600 ms, p70: 4900, p90: 5300 ms, p99: 5700 ms), latency samples: 123740
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6874.66 txn/s, latency: 4641.94 ms, (p50: 4800 ms, p70: 5100, p90: 6700 ms, p99: 7000 ms), latency samples: 226720
3. Upgrading rest of first batch to new version: 419098febd91af2ced5acef684f4910c138f35dc
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 7285.00 txn/s, latency: 3868.53 ms, (p50: 4100 ms, p70: 4400, p90: 4800 ms, p99: 5300 ms), latency samples: 140120
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 7070.02 txn/s, latency: 4564.81 ms, (p50: 4800 ms, p70: 5000, p90: 6400 ms, p99: 6900 ms), latency samples: 239920
4. upgrading second batch to new version: 419098febd91af2ced5acef684f4910c138f35dc
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 10984.65 txn/s, latency: 2507.54 ms, (p50: 2800 ms, p70: 2900, p90: 3100 ms, p99: 3200 ms), latency samples: 192740
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 11239.03 txn/s, latency: 2788.76 ms, (p50: 2800 ms, p70: 2900, p90: 3000 ms, p99: 3400 ms), latency samples: 371120
5. check swarm health
Compatibility test for 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775 ==> 419098febd91af2ced5acef684f4910c138f35dc passed
Test Ok

@igor-aptos igor-aptos merged commit f8eef74 into main Oct 9, 2024
50 checks passed
@igor-aptos igor-aptos deleted the igor/recalibrate_for_rg_change branch October 9, 2024 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants