Skip to content

Commit

Permalink
Incremental initialization API, fixing test macros, fixing contain w/…
Browse files Browse the repository at this point in the history
… default value infinite loop. (#13)

* Adding incremental init
  • Loading branch information
mpope9 authored Nov 8, 2020
1 parent 112ef97 commit 8dd3baa
Show file tree
Hide file tree
Showing 8 changed files with 643 additions and 115 deletions.
60 changes: 42 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,24 @@ Nif wrapper for the xor_filter: https://github.com/FastFilter/xor_singleheader

They're 'Faster and Smaller Than Bloom and Cuckoo Filters'.

Be wary of memory usage when using this module.

This library uses dirty nifs for initializing filters over 10K elements! Make sure your environment is setup correctly. Filters of 10M elements can be initialized within 4 seconds. Within 2.5 seconds if the library is used unsafely.

## Table of Contents
* [Benchmarks](#benchmarks)
* [Installation](#installation)
* [Example Usage](#example-usage)
* [Basic Usage](#basic-usage)
* [Incremental Initialization](#incremental-initialization)
* [Hashing](#hashing)
* [Hashing Example](#hashing-example)
* [Hashing API](#hashing-api)
* [Pre-Hashing and Custom Hashing](pre-hashing-and-custom-hashing)
* [Elixir Example](#elixir-example)
* [Custom Return Values](#custom-return-values)
* [Serialization](#serialization)
* [xor16](#xor16)
* [Buffered Initialization](#buffered-initialization)

## Benchmarks
The [exor_benchmark](https://github.com/mpope9/exor_bechmark) repo was used to compare access times to popular bloom filter libraries.

Expand All @@ -21,7 +35,7 @@ For rebar3:
%% rebar.config

{deps, [
{exor_filter, {git, "git://github.com/mpope9/exor_filter", {tag, "v0.5.2"}}}
{exor_filter, {git, "git://github.com/mpope9/exor_filter", {tag, "v0.6.0"}}}
]}.
```

Expand All @@ -31,20 +45,20 @@ For Mix:

defp deps do
[
{:exor_filter, github: "mpope9/exor_filter", tag: "v0.5.2"}
{:exor_filter, github: "mpope9/exor_filter", tag: "v0.6.0"}
]
end
```

Note, if you're using Erlang below version 23, then use this version of this library: `v0.5.2`. Otherwise, use the latest version.

## Example Usage
### Basic Usage
Basic usage with default hashing is as follows:
```erlang
Filter = xor8:new(["cat", "dog", "mouse"]),
true = xor8:contain(Filter, "cat"),
false = xor8:contain(Filter, "goose"),
ok = xor8:free(Filter).
false = xor8:contain(Filter, "goose").
```

Filters are initialized independently:
Expand All @@ -56,13 +70,26 @@ false = xor8:contain(Filter1, 6),
true = xor8:contain(Filter1, 2),

false = xor8:contain(Filter2, 2),
true = xor8:contain(Filter2, 5),
true = xor8:contain(Filter2, 5).
```

ok = xor8:free(Filter1),
ok = xor8:free(Filter2).
### Incremental Initialization
To create a filter incrementally, the following API should be used. It is more memory efficient than providing the entire list at initialization time.
Only the default hashing method is supported. [See the hashing section](#hashing) for more details.
**WARNING**: Currently, the incremental API does not use dirty nifs for large input sizes. Be cautious of this.
Deduplication / duplicate checks on the data are not implemented yet. `xor8:finalize/1` will fail if the data contains duplicates.
```erlang
Filter0 = xor8:new_empty(2), %% new_empty/0 defaults to 64 elements. Either function
%% will dynamically allocate more space as
%% needed while elements are added.
Filter1 = xor8:add(Filter0, [1, 2]),
Filter2 = xor8:add(Filter1, [3, 4]), %% More space allocated here.
Filter3 = xor8:finalize(Filter3), %% finalize/1 MUST be called to actually intialize the filter.
true = xor8:contain(Filter3, 1),
false = xor8:contain(Filter3, 6).
```

Do not modify the return value of the `xor8:new/1` or `/2` functions. The other APIs will not function correctly.
Do not modify the return value of any of the functions. The other APIs will not function correctly.

## Hashing
* The function `xor8:new/1` uses the default hash algorithm.
Expand All @@ -72,12 +99,11 @@ Do not modify the return value of the `xor8:new/1` or `/2` functions. The other
* **Do not pre-hash the value** being passed to `xor8:contain/2` or `/3`. **Pass the raw value!**
* (Unless you've explicitly set that you're using pre-hashed data. See below).
* The default hashing mechanisms remove duplicate keys. Pre-hashed data should be checked by the user. The libary will return an error on initialization if dupes are detected.
### Example
### Hashing Example
```erlang
Filter = xor8:new([1, 2, 3], none),
true = xor8:contain(Filter, 1),
false = xor8:contain(Filter, 6),
ok = xor8:free(Filter).
false = xor8:contain(Filter, 6).
```

### Hashing API
Expand All @@ -98,8 +124,7 @@ ok = xor8:free(Filter).
Fun = fun(X) -> X + 1 end,
Filter = xor8:new([1, 2, 3], Fun),
true = xor8:contain(Filter, 4),
false = xor8:contain(Filter, 1),
ok = xor8:free(Filter).
false = xor8:contain(Filter, 1).
```

* To pass pre-hashed data, use the hash option `none`. The `xor8:contain/2` and `/3` functions must be passed pre-hashed data in this case.
Expand All @@ -122,8 +147,7 @@ true =
```erlang
Filter1 = xor8:new(["Ricky Bobby", "Cal Naughton Jr."]),
true = xor8:contain(Filter1, "Ricky Bobby", {error, not_found}),
{error, not_found} = xor8:contain(Filter1, "Reese Bobby", {error, not_found}),
ok = xor8:free(Filter1).
{error, not_found} = xor8:contain(Filter1, "Reese Bobby", {error, not_found}).
```

## Serialization
Expand Down Expand Up @@ -159,7 +183,7 @@ Docs

$ rebar3 edoc

## Implements of xor filters in other programming languages
## Implementations of xor filters in other languages
* [Go](https://github.com/FastFilter/xor_filter)
* Rust: [1](https://github.com/bnclabs/xorfilter) and [2](https://github.com/codri/xorfilter-rs)
* [C++](https://github.com/FastFilter/fastfilter_cpp)
Expand Down
Loading

0 comments on commit 8dd3baa

Please sign in to comment.