Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align buffer blocks in saved state. #1240

Closed

Conversation

rndmcnlly
Copy link
Contributor

This wastes some bytes for the rare irregular buffer, but it opens up the possibility of running block-based deduplication that reaches across multiple large system components. For example, we can detect when a block from disk has one or more copies in main memory.

In a small experiment I ran to choose an optimal block size for memory blocks, I found 512 was the best (balancing frequency of block repetition with the storage needed to store block ids). However, a block size of 256 was nearly as good. Further, this block size matches the one used in the Buffer system for caching lazily loaded disk blocks. As a result, it seems like the best overall choice.

Context: I'm working towards losslessly compressing a sequence of savestates to create a video-like record of interactive performances. If potentially repetitive elements of state files are block-aligned, I can blindly deduplicate saves without inventing a new storage format.

@rndmcnlly rndmcnlly force-pushed the align-buffer-blocks-in-saves branch 2 times, most recently from 2bc50fc to e35e500 Compare February 1, 2025 21:03
This wastes some bytes for the rare irregular buffer, but it opens up the possibility of running block-based deduplication that reaches across multiple large system components. For example, we can detect when a block from disk has one or more copies in main memory.

In a small experiment I ran to choose an optimal block size for memory blocks, I found 512 was the best (balancing frequency of block repetition with the storage needed to store block ids). However, a block size of 256 was nearly as good. Further, this block size matches the one used in the Buffer system for caching lazily loaded disk blocks. As a result, it seems like the best overall choice.
@rndmcnlly rndmcnlly force-pushed the align-buffer-blocks-in-saves branch from e35e500 to f5c2866 Compare February 1, 2025 21:16
@rndmcnlly
Copy link
Contributor Author

I mistakenly thought that changing the alignment would only impact saving (in the same way that you can often improve an encoder without needing to make analogous changes to a decoder). However, now I see that the restoration process is currently sensitive to the alignment because it needs compute derive buffer_block_start given only info_block.length. I can see a way to make the restoration process alignment-agnostic (one simple solution is storing alignment information in the info block itself), but it would involve bumping STATE_VERSION and breaking support for old saves.

Meanwhile, I'm getting better at using GitHub, so I'll stop making these dead-end PRs with half-baked ideas. I now know how to actually run the tests for myself.

Closing.

@rndmcnlly rndmcnlly closed this Feb 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant