This repository contains C++ implementation of a persistent integer set based on integer Patricia tree data structure.
- Persistency.
- Big-endian order of traversal.
- Support for bitmaps in the leaves of the tree.
- Can store keys of any unsigned integer type.
- Bitmaps can also be any unsigned integer types.
- Allocator support (I would recommend to use TwoPoolsAllocator - one pool for leaf nodes, and one for branch nodes).
- Not thread safe (but it's probably not very hard to implement thread-safe version).
- No
erase
andunion
support yet.
Examples of how to use this data structure are provided in main.cpp file.
IntSet
class defined in PatriciaSet.h implements following methods:
IntSet()
- constructs empty setvoid insert(T key)
- insertskey
into set. Note thatT
must be an unsigned integer type (satisfystd::unsigned_integral<T>
concept).bool contains(T key) const
- checks whetherkey
is present in the set, or not (i.e.key
was previously inserted).void clear()
- clears the set.- Copy-constructor, assignment operator, etc are generated by compiler.
As you can see, void erase(T key)
is not implemented yet, but pull requests are welcome (in my use case, erase
method was not required, so it was not implemented).
Other methods, such as IntSet union(IntSet other) const
(that returns union of two sets), should also be relatively easy to implement.
insert
, contains
have worst-case complexity of O(min(n, W))
, where n
is number of elements stored in a set, W
is number of bits in T
. In practice, it's reasonable to assume that they have complexity O(1)
.
clear
has worst-case complexity of O(n)
when it needs to delete all of the elements in the set.
Copying the data-structure has worst-case complexity of O(1)
.
Performance was not tested via microbenchmarks, however, in my use case it performed better than existing implementations of Patricia trees (NASA-SW-VnV/ikos and facebook/sparta) - see performance.md (3d vs 4th and 5th columns).
Originally, Microsoft Visual Studio 2022 was used to build this project (as a CMake project).
However, code should be crossplatform, and it should be possible to build this project using other methods (with CMake).
Alternatively, you can simply copy header files from PersistentSet
and Allocators
folders, as well as External
folder (make sure to use git clone
with --recurse-submodules
option) and import them into your project.
Using custom allocators from Allocators
folder is optional, but is recommended for performance reasons.
The only depencency is Immer, and it's only purpose is to provide implementation for FreeListAllocator. Feel free to remove this dependency, if you choose not to use FreeListAllocator (and for example use TwoPoolsAllocator instead).
Originally, immer was installed using vcpkg.
Code is based on Haskell's Data.IntSet implementation.
- NASA-SW-VnV/ikos - good implementation, but uses little-endian key traversal, and no support for bitmaps in leaves and allocators (so it's a little bit slower).
- facebook/sparta - similar to ikos.
- sikol/patricia - contains bugs as far as I know
- libstdc++/pat_trie_.hpp - not sure how to build this and include in your own project.
-
arximboldi/immer -
immer::set
. Actually, it is implemented with CHAMP (Compressed Hash-Array Mapped Prefix-tree) and not Patricia tree, so it's more similar to Data.HashSet than Data.IntSet. However, it is very well implemented, and actually performs better than this implementation. It also supportserase
operation, and non-integer keys. However,union
operation would probably be more expensive for CHAMP compared to Patricia trees. -
arximboldi/immer -
immer::vector
. It is implemented using Relaxed Radix Balanced Trees (see authors video and paper on the topic). If the keys are in some continuous range (e.g$0, \ldots, n - 1$ ), then you can store set of them in avector<bool>
of size$n$ ($x \in s \iff s[x] == true$ ), orvector<uint64_t>
of size$n / 64$ . This is the most efficient approach according to our benchmarks (see performance.md, 1st column).
-
I recommend to read the paper:
- Chris Okasaki and Andy Gill, "Fast Mergeable Integer Maps", Workshop on ML, September 1998, pages 77-86
It's relatively short, explains Patricia trees very well, and includes implementations of methods in Standard ML language.
-
Original paper on Patricia trees:
- D.R. Morrison, "PATRICIA -- Practical Algorithm To Retrieve Information Coded In Alphanumeric", Journal of the ACM, 15(4), October 1968, pages 514-534.
- https://github.com/liuxinyu95/AlgoXY/files/11413742/algoxy-en.pdf - great book on algorithms and data structures, also contains description (and sample implementation) of Patricia trees.
- Lambda World 2018 - The Radix Trees How IntMap Works - Tikhon Jelvis
- C++Now 2017: Phil Nash “The Holy Grail!? A Persistent Hash-Array-Mapped Trie for C++" - about HAMT (not Patricia tree)
- Postmodern immutable data structures - Juan Pedro Bolivar Puente [C++ on Sea 2019] - about
immer::vector
(RRB trees), but still very interesting video
- Functional Programming in C++
- libstdc++ Chapter 21. Policy-Based Data Structures Prev Part III. Extensions
- libstdc++ Trie design
- Other implementations and descriptions of RRB trees - great collection of resources on RRB trees