Incremental initialization API, fixing test macros, fixing contain w/…

… default value infinite loop. (#13) * Adding incremental init
mpope9 · Nov 8, 2020 · 8dd3baa · 8dd3baa
1 parent 112ef97
commit 8dd3baa
Show file tree

Hide file tree

Showing 8 changed files with 643 additions and 115 deletions.
diff --git a/README.md b/README.md
@@ -5,10 +5,24 @@ Nif wrapper for the xor_filter: https://github.com/FastFilter/xor_singleheader
 
 They're 'Faster and Smaller Than Bloom and Cuckoo Filters'.
 
-Be wary of memory usage when using this module.
-
 This library uses dirty nifs for initializing filters over 10K elements!  Make sure your environment is setup correctly.  Filters of 10M elements can be initialized within 4 seconds.  Within 2.5 seconds if the library is used unsafely.
 
+## Table of Contents
+* [Benchmarks](#benchmarks)
+* [Installation](#installation)
+* [Example Usage](#example-usage)
+   * [Basic Usage](#basic-usage)
+   * [Incremental Initialization](#incremental-initialization)
+* [Hashing](#hashing)
+   * [Hashing Example](#hashing-example)
+   * [Hashing API](#hashing-api)
+      * [Pre-Hashing and Custom Hashing](pre-hashing-and-custom-hashing)
+* [Elixir Example](#elixir-example)
+* [Custom Return Values](#custom-return-values)
+* [Serialization](#serialization)
+* [xor16](#xor16)
+* [Buffered Initialization](#buffered-initialization)
+
 ## Benchmarks
 The [exor_benchmark](https://github.com/mpope9/exor_bechmark) repo was used to compare access times to popular bloom filter libraries.
 
@@ -21,7 +35,7 @@ For rebar3:
 %% rebar.config
 
 {deps, [
-  {exor_filter, {git, "git://github.com/mpope9/exor_filter", {tag, "v0.5.2"}}}
+  {exor_filter, {git, "git://github.com/mpope9/exor_filter", {tag, "v0.6.0"}}}
 ]}.
 ```
 
@@ -31,20 +45,20 @@ For Mix:
 
 defp deps do
   [
-    {:exor_filter, github: "mpope9/exor_filter", tag: "v0.5.2"}
+    {:exor_filter, github: "mpope9/exor_filter", tag: "v0.6.0"}
   ]
 end
 ```
 
 Note, if you're using Erlang below version 23, then use this version of this library: `v0.5.2`.  Otherwise, use the latest version.
 
 ## Example Usage
+### Basic Usage
 Basic usage with default hashing is as follows:
 ```erlang
 Filter = xor8:new(["cat", "dog", "mouse"]),
 true   = xor8:contain(Filter, "cat"),
-false  = xor8:contain(Filter, "goose"),
-ok     = xor8:free(Filter).
+false  = xor8:contain(Filter, "goose").
 ```
 
 Filters are initialized independently:
@@ -56,13 +70,26 @@ false   = xor8:contain(Filter1, 6),
 true    = xor8:contain(Filter1, 2),
 
 false   = xor8:contain(Filter2, 2),
-true    = xor8:contain(Filter2, 5),
+true    = xor8:contain(Filter2, 5).
+```
 
-ok      = xor8:free(Filter1),
-ok      = xor8:free(Filter2).
+### Incremental Initialization
+To create a filter incrementally, the following API should be used.  It is more memory efficient than providing the entire list at initialization time.
+Only the default hashing method is supported.  [See the hashing section](#hashing) for more details.
+**WARNING**: Currently, the incremental API does not use dirty nifs for large input sizes.  Be cautious of this.
+Deduplication / duplicate checks on the data are not implemented yet.  `xor8:finalize/1` will fail if the data contains duplicates.
+```erlang
+Filter0 = xor8:new_empty(2),           %% new_empty/0 defaults to 64 elements.  Either function
+                                       %% will dynamically allocate more space as 
+                                       %% needed while elements are added.
+Filter1 = xor8:add(Filter0, [1, 2]),
+Filter2 = xor8:add(Filter1, [3, 4]),   %% More space allocated here.
+Filter3 = xor8:finalize(Filter3),      %% finalize/1 MUST be called to actually intialize the filter.
+true    = xor8:contain(Filter3, 1),
+false   = xor8:contain(Filter3, 6).
 ```
 
-Do not modify the return value of the `xor8:new/1` or `/2` functions.  The other APIs will not function correctly.
+Do not modify the return value of any of the functions.  The other APIs will not function correctly.
 
 ## Hashing
 * The function `xor8:new/1` uses the default hash algorithm.
@@ -72,12 +99,11 @@ Do not modify the return value of the `xor8:new/1` or `/2` functions.  The other
     * **Do not pre-hash the value** being passed to `xor8:contain/2` or `/3`.  **Pass the raw value!**
     *  (Unless you've explicitly set that you're using pre-hashed data.  See below).
 * The default hashing mechanisms remove duplicate keys.  Pre-hashed data should be checked by the user.  The libary will return an error on initialization if dupes are detected.
-### Example
+### Hashing Example
 ```erlang
 Filter = xor8:new([1, 2, 3], none),
 true   = xor8:contain(Filter, 1),
-false  = xor8:contain(Filter, 6),
-ok     = xor8:free(Filter).
+false  = xor8:contain(Filter, 6).
 ```
 
 ### Hashing API
@@ -98,8 +124,7 @@ ok     = xor8:free(Filter).
 Fun    = fun(X) -> X + 1 end,
 Filter = xor8:new([1, 2, 3], Fun),
 true   = xor8:contain(Filter, 4),
-false  = xor8:contain(Filter, 1),
-ok     = xor8:free(Filter).
+false  = xor8:contain(Filter, 1).
 ```
 
 * To pass pre-hashed data, use the hash option `none`.  The `xor8:contain/2` and `/3` functions must be passed pre-hashed data in this case.
@@ -122,8 +147,7 @@ true =
 ```erlang
 Filter1            = xor8:new(["Ricky Bobby", "Cal Naughton Jr."]),
 true               = xor8:contain(Filter1, "Ricky Bobby", {error, not_found}),
-{error, not_found} = xor8:contain(Filter1, "Reese Bobby", {error, not_found}),
-ok                 = xor8:free(Filter1).
+{error, not_found} = xor8:contain(Filter1, "Reese Bobby", {error, not_found}).
 ```
 
 ## Serialization
@@ -159,7 +183,7 @@ Docs
 
     $ rebar3 edoc
 
-## Implements of xor filters in other programming languages
+## Implementations of xor filters in other languages
 * [Go](https://github.com/FastFilter/xor_filter)
 * Rust: [1](https://github.com/bnclabs/xorfilter) and [2](https://github.com/codri/xorfilter-rs)
 * [C++](https://github.com/FastFilter/fastfilter_cpp)