Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update schedule and add input checking for replay-verify on archive #15501

Merged
merged 1 commit into from
Dec 9, 2024

Conversation

areshand
Copy link
Contributor

@areshand areshand commented Dec 4, 2024

Description

  1. switch to night to run main replay verify and provision.
  2. add validation for end range in case user enter a range that we cannot replay
  3. use correct argument for image tag

How Has This Been Tested?

local tested

Key Areas to Review

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (specify)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

@areshand areshand requested a review from a team as a code owner December 4, 2024 23:20
Copy link

trunk-io bot commented Dec 4, 2024

⏱️ 9h 51m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
replay-testnet / run-replay-verify 1h 22m 🟥🟥🟩 (+1 more)
replay-mainnet / run-replay-verify 1h 21m 🟥🟥🟩 (+1 more)
provision-testnet / provision 59m 🟥🟩
execution-performance / single-node-performance 48m 🟩🟩
provision-mainnet / provision 34m 🟩
test-target-determinator 33m 🟩🟩🟩🟩 (+6 more)
rust-cargo-deny 24m 🟩🟩🟩🟩 (+10 more)
check-dynamic-deps 17m 🟩🟩🟩🟩🟩 (+10 more)
forge-framework-upgrade-test / forge 15m 🟩
forge-e2e-test / forge 14m 🟩
rust-move-tests 13m
rust-move-tests 13m 🟩
rust-move-tests 13m 🟩
rust-move-tests 13m 🟩
rust-move-tests 13m 🟩

🚨 4 jobs on the last run were significantly faster/slower than expected

Job Duration vs 7d avg Delta
semgrep/ci 2m 23s +433%
fetch-last-released-docker-image-tag 4m 2m +129%
execution-performance / single-node-performance 24m 17m +42%
execution-performance / test-target-determinator 3m 4m -24%

settingsfeedbackdocs ⋅ learn more about trunk.io

@areshand areshand requested review from grao1991 and msmouse December 4, 2024 23:29
@@ -14,7 +14,7 @@ on:
- ".github/workflows/provision-replay-verify-archive-disks.yaml"
- ".github/workflows/workflow-run-replay-verify-archive-storage-provision.yaml"
schedule:
- cron: "0 22 * * 1,3,5" # This runs every Mon,Wed,Fri
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this UTC? (You said you switch to night)

@areshand areshand force-pushed the workflow_fix branch 3 times, most recently from dec6780 to 6d909ec Compare December 9, 2024 02:58
@areshand areshand added the CICD:build-images when this label is present github actions will start build+push rust images from the PR. label Dec 9, 2024
@areshand areshand force-pushed the workflow_fix branch 5 times, most recently from b93f8c7 to d1337b0 Compare December 9, 2024 04:57
# This is in case user manually cancel the step above, we still want to cleanup the resources
- name: Post-run cleanup
env:
GOOGLE_CLOUD_PROJECT: aptos-devinfra-0
if: ${{ always() }}
run: |
cd testsuite/replay-verify
poetry run python main.py --network ${{ inputs.NETWORK }} --cleanup
CMD="poetry run python main.py --network ${{ inputs.NETWORK }}" --cleanup
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appears to be a syntax error in the command string construction. The --cleanup flag is placed outside the quotes, which will cause it to be interpreted as a separate shell command. The corrected version should be:

CMD="poetry run python main.py --network ${{ inputs.NETWORK }} --cleanup"

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

Comment on lines 133 to 136
echo $DISK_URIS

if [ -n "$DISK_URIS" ]; then
gcloud compute disks delete $DISK_URIS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The $DISK_URIS variable needs to be quoted in the gcloud command to properly handle disk names containing spaces. The command should be:

gcloud compute disks delete "$DISK_URIS"

This ensures all disk URIs are passed correctly as a single argument to the delete command.

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

DISK_URIS=$(gcloud compute disks list --filter="-users:* AND status=READY" --format "value(uri())")
echo "Disks to be deleted:"
echo "$DISK_URIS"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gcloud compute disks delete command expects each disk URI as a separate argument. The current syntax will pass all URIs as a single argument, causing the command to fail. Consider updating to:

echo "$DISK_URIS" | xargs -r gcloud compute disks delete

The -r flag ensures xargs only runs if there are disk URIs to process.

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

@areshand areshand force-pushed the workflow_fix branch 2 times, most recently from 2897d0d to b52b890 Compare December 9, 2024 16:54
@areshand areshand enabled auto-merge (rebase) December 9, 2024 16:54

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Dec 9, 2024

✅ Forge suite compat success on 3c6e693a27339e73520f41030dce8fc9cd504967 ==> 0d42bb97d618216ad74698cc6d515076328b8da0

Compatibility test results for 3c6e693a27339e73520f41030dce8fc9cd504967 ==> 0d42bb97d618216ad74698cc6d515076328b8da0 (PR)
1. Check liveness of validators at old version: 3c6e693a27339e73520f41030dce8fc9cd504967
compatibility::simple-validator-upgrade::liveness-check : committed: 16336.90 txn/s, latency: 2072.00 ms, (p50: 2100 ms, p70: 2200, p90: 2400 ms, p99: 3300 ms), latency samples: 526700
2. Upgrading first Validator to new version: 0d42bb97d618216ad74698cc6d515076328b8da0
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 7196.04 txn/s, latency: 3961.76 ms, (p50: 4300 ms, p70: 4500, p90: 4700 ms, p99: 4800 ms), latency samples: 134500
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 7386.37 txn/s, latency: 4367.31 ms, (p50: 4700 ms, p70: 4700, p90: 4800 ms, p99: 4800 ms), latency samples: 253060
3. Upgrading rest of first batch to new version: 0d42bb97d618216ad74698cc6d515076328b8da0
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 6610.93 txn/s, latency: 4345.29 ms, (p50: 5000 ms, p70: 5300, p90: 5400 ms, p99: 5500 ms), latency samples: 121100
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 6647.98 txn/s, latency: 4974.59 ms, (p50: 5400 ms, p70: 5500, p90: 5500 ms, p99: 5800 ms), latency samples: 227140
4. upgrading second batch to new version: 0d42bb97d618216ad74698cc6d515076328b8da0
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 10354.54 txn/s, latency: 2686.51 ms, (p50: 2300 ms, p70: 2700, p90: 4000 ms, p99: 5500 ms), latency samples: 183240
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 11853.04 txn/s, latency: 2699.14 ms, (p50: 2400 ms, p70: 3500, p90: 3700 ms, p99: 3800 ms), latency samples: 390360
5. check swarm health
Compatibility test for 3c6e693a27339e73520f41030dce8fc9cd504967 ==> 0d42bb97d618216ad74698cc6d515076328b8da0 passed
Test Ok

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Dec 9, 2024

✅ Forge suite realistic_env_max_load success on 0d42bb97d618216ad74698cc6d515076328b8da0

two traffics test: inner traffic : committed: 14786.81 txn/s, latency: 2684.29 ms, (p50: 2700 ms, p70: 2700, p90: 3000 ms, p99: 3200 ms), latency samples: 5622560
two traffics test : committed: 100.02 txn/s, latency: 1416.38 ms, (p50: 1400 ms, p70: 1400, p90: 1500 ms, p99: 2600 ms), latency samples: 1800
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 1.538, avg: 1.475", "ConsensusProposalToOrdered: max: 0.311, avg: 0.289", "ConsensusOrderedToCommit: max: 0.388, avg: 0.379", "ConsensusProposalToCommit: max: 0.675, avg: 0.667"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.71s no progress at version 36883 (avg 0.20s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.63s no progress at version 2392671 (avg 0.63s) [limit 16].
Test Ok

Copy link
Contributor

github-actions bot commented Dec 9, 2024

✅ Forge suite framework_upgrade success on 3c6e693a27339e73520f41030dce8fc9cd504967 ==> 0d42bb97d618216ad74698cc6d515076328b8da0

Compatibility test results for 3c6e693a27339e73520f41030dce8fc9cd504967 ==> 0d42bb97d618216ad74698cc6d515076328b8da0 (PR)
Upgrade the nodes to version: 0d42bb97d618216ad74698cc6d515076328b8da0
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1504.71 txn/s, submitted: 1507.21 txn/s, failed submission: 2.50 txn/s, expired: 2.50 txn/s, latency: 2071.90 ms, (p50: 2100 ms, p70: 2200, p90: 2700 ms, p99: 4100 ms), latency samples: 132180
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1431.42 txn/s, submitted: 1433.26 txn/s, failed submission: 1.84 txn/s, expired: 1.84 txn/s, latency: 2182.62 ms, (p50: 2100 ms, p70: 2300, p90: 3300 ms, p99: 4700 ms), latency samples: 124640
5. check swarm health
Compatibility test for 3c6e693a27339e73520f41030dce8fc9cd504967 ==> 0d42bb97d618216ad74698cc6d515076328b8da0 passed
Upgrade the remaining nodes to version: 0d42bb97d618216ad74698cc6d515076328b8da0
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1459.82 txn/s, submitted: 1463.18 txn/s, failed submission: 3.36 txn/s, expired: 3.36 txn/s, latency: 2117.55 ms, (p50: 2000 ms, p70: 2100, p90: 3400 ms, p99: 4800 ms), latency samples: 130460
Test Ok

@areshand areshand merged commit dacbfc3 into main Dec 9, 2024
49 of 52 checks passed
@areshand areshand deleted the workflow_fix branch December 9, 2024 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:build-images when this label is present github actions will start build+push rust images from the PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants