Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: add support for bloom filters #99

Merged
merged 21 commits into from
Jan 20, 2023

Conversation

manav2401
Copy link
Contributor

@manav2401 manav2401 commented Nov 12, 2022

This PR attempts to add bloom filters in dice db (See issue: #36).

The commands supported are mentioned below:

  1. BFINIT <key> <error_rate> <capacity>
Parameter Description
key The key which acts as the filter identifier
error_rate (optional) The desired/acceptable probability of false positives
capacity (optional) The number of entries expected to be added to the filter

Returns "OK" in case of successful bloom filter creation and relevant error if not.

0.0.0.0:7379> BFINIT bf 0.01 10000
OK
  1. BFADD <key> <element>
Parameter Description
key The key which acts as the filter identifier
element The element to be added to the filter

Returns "1" in case of successful addition to the filter and "0" if the element (or some other elements which set the same bits) was already added before. It also returns "-1" in case of errors. Moreover, if the filter does not exist, it will create one with defaults.

0.0.0.0:7379> BFADD bf hello
(integer) 1
0.0.0.0:7379> BFADD bf world
(integer) 1
0.0.0.0:7379> BFADD bf hello
(integer) 0
0.0.0.0:7379> BFADD bf test
(integer) 1
  1. BFEXISTS <key> <element>
Parameter Description
key The key which acts as the filter identifier
element The element whose existence is to be checked in the filter

Returns "1" is the element may (or may not) exist in the filter and "0" is the element surely doesn't exist. It also returns "-1" in case of errors.

0.0.0.0:7379> BFEXISTS bf world
(integer) 1
0.0.0.0:7379> BFEXISTS bf hello
(integer) 1
0.0.0.0:7379> BFEXISTS bf programming
(integer) 0
0.0.0.0:7379> BFEXISTS bf dice
(integer) 0
  1. BFINFO <key>
Parameter Description
key The key which acts as the filter identifier

Returns the parameters and metadata of the filter.

0.0.0.0:7379> BFINFO bf
"name: bf, error rate: 0.010000, capacity: 10000, total bits reserved: 100992, bits per element: 9.585058, hash functions: 7"

Sufficient unit tests for most of the functions have been added in this PR.


Future work (not necessary in this PR)

  1. Add benchmarks for each command and bit manipulation (read/write).
  2. Prepare charts / tables for user reference to help them choose appropriate values of error rate and capacity.
  3. Add option to DELETE and RESET a bloom filter.
  4. Add option to "save" and "load" bloom filter to/from a file (for persistence).

core/bloom.go Outdated
Comment on lines 329 to 358
func setBit(buf []byte, b int) {
if b < 0 {
return
}

idx, offset := b/8, 7-b%8
if idx < 0 || idx >= len(buf) {
return
}

buf[idx] = buf[idx] | 1<<offset
}

// isBitSet checks if the bit at index `b` is set to "1" or not in `buf`.
func isBitSet(buf []byte, b int) bool {
if b < 0 {
return false
}

idx, offset := b/8, 7-b%8
if idx >= len(buf) {
return false
}

if buf[idx]&(1<<offset) == 1<<offset {
return true
} else {
return false
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a util directory where you can put byte level utils; also we have something similar for varint and bytelist. Check that implementation once and see if we need to converge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, at the moment, I don't think there's any code which can be converged. Bytelist holds some extra metadata and we might not need any of it for bloom bytes. Although I can move some logic to bloom_utils to separate the logic.

core/eval.go Outdated Show resolved Hide resolved
core/bloom.go Outdated Show resolved Hide resolved
core/bloom.go Outdated Show resolved Hide resolved
core/bloom.go Outdated Show resolved Hide resolved
Copy link
Contributor

@arpitbbhayani arpitbbhayani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such beautifully written code. Loved reading through it 🥇

@@ -24,6 +24,9 @@ var OBJ_ENCODING_QREF uint8 = 1
var OBJ_ENCODING_STACKINT uint8 = 2
var OBJ_ENCODING_STACKREF uint8 = 3

var OBJ_TYPE_BITSET uint8 = 1 << 5 // 00100000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great decision; as we cannot use go strings as byte arrays easily.

Comment on lines +22 to +24
} else {
return false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need an else here; just a blanket return false

@arpitbbhayani arpitbbhayani merged commit 6f6ea03 into DiceDB:master Jan 20, 2023
@manav2401 manav2401 deleted the bloom-filter-support branch July 31, 2024 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants