Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdb format optimization: using fixed seed for bloom filters #2

Merged
merged 1 commit into from
Sep 19, 2024

Conversation

YueTang-Vanessa
Copy link
Contributor

@YueTang-Vanessa YueTang-Vanessa commented Sep 11, 2024

Description

Fixed seeds and sip keys can help user create bloom objects using the same seed and to restore (from RDB save and load) the bloom objects with the same hashers(sip keys are generated based on the seed). This will save RDB space (32 bytes per filter and there can be multiple filters per object).

This commit have following changes:

  • Introduced a seed constant and sip keys constant which generated separately using the getrandom method in rust bloomfilter crate
  • Only save the "number of items in filter" count in the final bloom filter of a bloom object's vector

Test

Loading the module with valkey-server, verified that the rdb can load successfully with same bf.info key item and bf.exists key item output

127.0.0.1:6379> info keyspace
# Keyspace
127.0.0.1:6379> set hello world
OK
127.0.0.1:6379> info keyspace
# Keyspace
db0:keys=1,expires=0,avg_ttl=0
127.0.0.1:6379> bf.add key item
(integer) 1
127.0.0.1:6379> info keyspace
# Keyspace
db0:keys=2,expires=0,avg_ttl=0
127.0.0.1:6379> keys *
1) "hello"
2) "key"
127.0.0.1:6379> bf.exists key item
(integer) 1
127.0.0.1:6379> bf.info key
 1) Capacity
 2) (integer) 100000
 3) Size
 4) (integer) 179952
 5) Number of filters
 6) (integer) 1
 7) Number of items inserted
 8) (integer) 1
 9) Expansion rate
10) (integer) 2
127.0.0.1:6379> bgsave
Background saving started
127.0.0.1:6379> shutdown
not connected> info keyspace
# Keyspace
db0:keys=2,expires=0,avg_ttl=0
127.0.0.1:6379> keys *
1) "key"
2) "hello"
127.0.0.1:6379> bf.exists key item
(integer) 1
127.0.0.1:6379> bf.info key
 1) Capacity
 2) (integer) 100000
 3) Size
 4) (integer) 179952
 5) Number of filters
 6) (integer) 1
 7) Number of items inserted
 8) (integer) 1
 9) Expansion rate
10) (integer) 2

Also passed python tests and unit tests

running 3 tests
test bloom::utils::tests::test_sip_keys ... ok
test bloom::utils::tests::test_non_scaling_filter ... ok
test bloom::utils::tests::test_scaling_filter ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 9.92s


valkey-bloom/tests/test_basic.py::TestBloomBasic::test_basic PASSED           [ 50%]
valkey-bloom/tests/test_save_and_restore.py::TestBloomSaveRestore::test_basic_save_and_restore PASSED [100%]

================================================================ 2 passed in 3.61 seconds ================================================================
Build and Integration Tests succeeded

@YueTang-Vanessa YueTang-Vanessa force-pushed the unstable branch 2 times, most recently from 2503813 to 17e10b3 Compare September 18, 2024 18:53
Signed-off-by: Vanessa Tang <yuetan@amazon.com>

add restore unit test and update restore integration test

Signed-off-by: Vanessa Tang <yuetan@amazon.com>
@YueTang-Vanessa YueTang-Vanessa changed the title rdb optimization rdb format optimization: using fixed seed for bloom filters Sep 19, 2024
@KarthikSubbarao KarthikSubbarao merged commit bc178be into valkey-io:unstable Sep 19, 2024
1 check passed
KarthikSubbarao pushed a commit that referenced this pull request Sep 19, 2024
Signed-off-by: Vanessa Tang <yuetan@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants