Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Wrong calculation of allowed pruning height when changing snapshot-interval #23638

Open
1 task done
RogerKSI opened this issue Feb 8, 2025 · 0 comments · May be fixed by #23639
Open
1 task done

[Bug]: Wrong calculation of allowed pruning height when changing snapshot-interval #23638

RogerKSI opened this issue Feb 8, 2025 · 0 comments · May be fixed by #23639
Labels

Comments

@RogerKSI
Copy link

RogerKSI commented Feb 8, 2025

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

There are wrong calculations of the allowed pruning height when a node operator changes the snapshot-interval from one value (A) to a larger value (B). This issue can cause two problems:

  1. Prune the height while the snapshot at that height is processing.
  • Code: https://github.com/cosmos/cosmos-sdk/blob/v0.50.11/store/pruning/manager.go#L130-L136
  • The pruning logic assumes that it can safely delete all states up to pruneSnapshotHeights[0] + snapshotInterval - 1. However, this assumption fails when the snapshot interval is changed, potentially deleting a snapshot that is still being processed.
  • Example Scenario:
    • Block 10: The node operator sets snapshot-interval = 10 and pruning-keep-recent = "5", so the node creates a snapshot at block 10.
    • Block 15: The operator changes snapshot-interval to 20.
    • Block 20: The node creates a new snapshot at block 20.
    • Block 26: The pruning logic now prunes state up to block 29 (10 + 20 - 1 = 29). It will be limited to 20 because of the pruning-keep-recent. The state at height 20 will be deleted and it causes issues on the snapshot If the snapshot at block 20 is not fully finished yet.
  1. Pruning height stuck at the previous snapshot height.
  • Code: https://github.com/cosmos/cosmos-sdk/blob/v0.50.11/store/pruning/manager.go#L83-L89
    • The function only updates pruneSnapshotHeights if the next snapshot is at previousSnapshotHeight + snapshotInterval.
    • If the interval changes, this condition fails, meaning pruneSnapshotHeights does not shift forward.
    • As a result, the first value in pruneSnapshotHeights gets stuck at an old height, and the node continues using it to determine which heights to prune up to. (same code section as Issue 1)
    • Note: This also happens in case that snapshot at some height is failed or skipped.
  • Example Scenario:
    • Block 0: The operator sets snapshot-interval = 10.
    • Block 10: A snapshot is created. pruneSnapshotHeights = [10].
    • Block 15: The operator changes the snapshot-interval to 20.
    • Block 20: A new snapshot is created. pruneSnapshotHeights = [10, 20].
      • because 20 (pruneSnapshotHeights[1]) is not equal to 10 (pruneSnapshotHeights[0]) + 20 (snapshotInterval)
    • Block 40: Another snapshot is created. pruneSnapshotHeights = [10, 20, 40].
    • After that, pruning gets stuck:
      • pruneSnapshotHeights remains [10, 20, 40, …], but pruning only happens up to pruneSnapshotHeights[0] + snapshotInterval - 1 = 29 (pruning stops at block 29) The node never prunes blocks beyond height 29, leading to unexpected storage growth.

Cosmos SDK Version

v0.50+ with store v1

How to reproduce?

  1. Install simd from Cosmos SDK v0.50.11
  2. Configure the node (~/.simapp/config/app.toml)
    • pruning = "custom"
    • pruning-keep-recent = "5"
    • pruning-interval = "10"
    • snapshot-interval = 10
  3. Start the node and let it run. (The first snapshot will be created at block 10.)
  4. At block 15, stop the node and update snapshot-interval in app.toml to 20.
  5. Starr the node again.
  6. At block 26, the node will attempt to prune the state at block 20 (since pruning-keep-recent = 5). If the snapshot at block 20 is still in progress, pruning deletes the state before snapshot completion. (Problem 1)
  7. After block 29, the node stops pruning as it is now limited by the first snapshot height (pruneSnapshotHeights[0]). (Problem 2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 📋 Backlog
Development

Successfully merging a pull request may close this issue.

1 participant