Skip to content

Commit

Permalink
Implement the tree bin optimization (#72)
Browse files Browse the repository at this point in the history
  • Loading branch information
domenicquirl authored Apr 7, 2020
1 parent 7a27f38 commit dcb5a1c
Show file tree
Hide file tree
Showing 12 changed files with 2,636 additions and 295 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
/target
**/*.rs.bk
Cargo.lock
**/.vscode/*
48 changes: 39 additions & 9 deletions src/iter/traverser.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
use crate::node::{BinEntry, Node};
use crate::node::{BinEntry, Node, TreeNode};
use crate::raw::Table;
use crossbeam_epoch::{Guard, Shared};
use std::sync::atomic::Ordering;
Expand Down Expand Up @@ -116,19 +116,28 @@ impl<'g, K, V> Iterator for NodeIter<'g, K, V> {
if let Some(prev) = self.prev {
let next = prev.next.load(Ordering::SeqCst, self.guard);
if !next.is_null() {
// we have to check if we are iterating over a regular bin or a
// TreeBin. the Java code gets away without this due to
// inheritance (everything is a node), but we have to explicitly
// check
// safety: flurry does not drop or move until after guard drop
e = Some(
unsafe { next.deref() }
.as_node()
.expect("only Nodes follow a Node"),
)
match unsafe { next.deref() } {
BinEntry::Node(node) => {
e = Some(node);
}
BinEntry::TreeNode(tree_node) => {
e = Some(&tree_node.node);
}
BinEntry::Moved => unreachable!("Nodes can only point to Nodes or TreeNodes"),
BinEntry::Tree(_) => unreachable!("Nodes can only point to Nodes or TreeNodes"),
}
}
}

loop {
if let Some(e) = e {
self.prev = Some(e);
return Some(e);
if e.is_some() {
self.prev = e;
return e;
}

// safety: flurry does not drop or move until after guard drop
Expand Down Expand Up @@ -160,6 +169,27 @@ impl<'g, K, V> Iterator for NodeIter<'g, K, V> {
BinEntry::Node(node) => {
e = Some(node);
}
BinEntry::Tree(tree_bin) => {
// since we want to iterate over all entries, TreeBins
// are also traversed via the `next` pointers of their
// contained node
e = Some(
// safety: `bin` was read under our guard, at which
// point the tree was valid. Since our guard pins
// the current epoch, the TreeNodes remain valid for
// at least as long as we hold onto the guard.
// Structurally, TreeNodes always point to TreeNodes, so this is sound.
&unsafe {
TreeNode::get_tree_node(
tree_bin.first.load(Ordering::SeqCst, self.guard),
)
}
.node,
);
}
BinEntry::TreeNode(_) => unreachable!(
"The head of a bin cannot be a TreeNode directly without BinEntry::Tree"
),
}
}

Expand Down
97 changes: 63 additions & 34 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -55,13 +55,20 @@
//! operation. When possible, it is a good idea to provide a size estimate by using the
//! [`with_capacity`](HashMap::with_capacity) constructor. Note that using many keys with
//! exactly the same [`Hash`](std::hash::Hash) value is a sure way to slow down performance of any
//! hash table.
//! hash table. To ameliorate impact, keys are required to be [`Ord`](std::cmp::Ord). This is used
//! by the map to more efficiently store bins that contain a large number of elements with
//! colliding hashes using the comparison order on their keys.
//!
/*
//! TODO: dynamic load factor
//! */
//! # Hash Sets
//!
//! TODO: set projection
//! Flurry also supports concurrent hash sets, which may be created through [`HashSet`]. Hash sets
//! offer the same instantiation options as [`HashMap`], such as [`new`](HashSet::new) and
//! [`with_capacity`](HashSet::with_capacity).
//!
/*
//! TODO: frequency map through computeIfAbsent
//!
//! TODO: bulk operations like forEach, search, and reduce
Expand All @@ -76,16 +83,18 @@
//! file](http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/main/java/util/concurrent/ConcurrentHashMap.java?view=markup).
//!
//! The primary design goal of this hash table is to maintain concurrent readability (typically
//! method `get()`, but also iterators and related methods) while minimizing update contention.
//! method [`get`](HashMap::get), but also iterators and related methods) while minimizing update contention.
//! Secondary goals are to keep space consumption about the same or better than java.util.HashMap,
//! and to support high initial insertion rates on an empty table by many threads.
//!
//! This map usually acts as a binned (bucketed) hash table. Each key-value mapping is held in a
//! `BinEntry`. Most nodes are of type `BinEntry::Node` with hash, key, value, and a `next` field.
//! However, some nodes are of type `BinEntry::Moved`; these "forwarding nodes" are placed at the
//! heads of bins during resizing. The Java version also has other special node types, but these
//! have not yet been implemented in this port. These special nodes are all either uncommon or
//! transient.
//! `BinEntry`. Most nodes are of type `BinEntry::Node` with hash, key, value, and a `next` field.
//! However, some other types of nodes exist: `BinEntry::TreeNode`s are arranged in balanced trees
//! instead of linear lists. Bins of type `BinEntry::Tree` hold the roots of sets of `BinEntry::TreeNode`s.
//! Some nodes are of type `BinEntry::Moved`; these "forwarding nodes" are placed at the
//! heads of bins during resizing. The Java version also has other special node types, but these
//! have not yet been implemented in this port. These special nodes are all either uncommon or
//! transient.
//!
/*
//! TODO: TreeNodes, ReservationNodes
Expand All @@ -95,8 +104,8 @@
//! `BinEntry`). Table accesses require atomic reads, writes, and CASes.
//!
//! Insertion (via `put`) of the first node in an empty bin is performed by just CASing it to the
//! bin. This is by far the most common case for put operations under most key/hash distributions.
//! Other update operations (insert, delete, and replace) require locks. We do not want to waste
//! bin. This is by far the most common case for put operations under most key/hash distributions.
//! Other update operations (insert, delete, and replace) require locks. We do not want to waste
//! the space required to associate a distinct lock object with each bin, so we instead embed a
//! lock inside each node, and use the lock in the the first node of a bin list as the lock for the
//! bin.
Expand All @@ -109,7 +118,7 @@
//! The main disadvantage of per-bin locks is that other update operations on other nodes in a bin
//! list protected by the same lock can stall, for example when user `Eq` implementations or
//! mapping functions take a long time. However, statistically, under random hash codes, this is
//! not a common problem. Ideally, the frequency of nodes in bins follows a Poisson distribution
//! not a common problem. Ideally, the frequency of nodes in bins follows a Poisson distribution
//! (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average,
//! given the resizing threshold of 0.75, although with a large variance because of resizing
//! granularity. Ignoring variance, the expected occurrences of list size `k` are `exp(-0.5) *
Expand All @@ -132,30 +141,33 @@
//! #elements)` under random hashes.
//!
//! Actual hash code distributions encountered in practice sometimes deviate significantly from
//! uniform randomness. This includes the case when `N > (1<<30)`, so some keys MUST collide.
//! uniform randomness. This includes the case when `N > (1<<30)`, so some keys MUST collide.
//! Similarly for dumb or hostile usages in which multiple keys are designed to have identical hash
//! codes or ones that differs only in masked-out high bits. Here, the Java implementation uses an
//! optimization where a bin is turned into a binary tree, but this has not yet been ported over to
//! the Rust version.
/* TODO */
//! codes or ones that differs only in masked-out high bits. So we use secondary strategy that
//! applies when the number of nodes in a bin exceeds a threshold. These `BinEntry::Tree` bins use
//! a balanced tree to hold nodes (a specialized form of red-black trees), bounding search time to
//! `O(log N)`. Each search step in such a bin is at least twice as slow as in a regular list, but
//! given that N cannot exceed `(1<<64)` (before running out of adresses) this bounds search steps,
//! lock hold times, etc, to reasonable constants (roughly 100 nodes inspected per operation worst
//! case). `BinEntry::Tree` nodes (`BinEntry::TreeNode`s) also maintain the same `next` traversal
//! pointers as regular nodes, so can be traversed in iterators in a similar way.
//!
//! The table is resized when occupancy exceeds a percentage threshold (nominally, 0.75, but see
//! below). Any thread noticing an overfull bin may assist in resizing after the initiating thread
//! below). Any thread noticing an overfull bin may assist in resizing after the initiating thread
//! allocates and sets up the replacement array. However, rather than stalling, these other threads
//! may proceed with insertions etc. Resizing proceeds by transferring bins, one by one, from the
//! table to the next table. However, threads claim small blocks of indices to transfer (via the
//! field `transfer_index`) before doing so, reducing contention. A generation stamp in the field
//! `size_ctl` ensures that resizings do not overlap. Because we are using power-of-two expansion,
//! the elements from each bin must either stay at same index, or move with a power of two offset.
//! We eliminate unnecessary node creation by catching cases where old nodes can be reused because
//! their next fields won't change. On average, only about one-sixth of them need cloning when a
//! table doubles. The nodes they replace will be garbage collectible as soon as they are no longer
//! referenced by any reader thread that may be in the midst of concurrently traversing table.
//! Upon transfer, the old table bin contains only a special forwarding node (`BinEntry::Moved`)
//! that contains the next table as its key. On encountering a forwarding node, access and update
//! operations restart, using the new table.
//!
/* TODO: note on TreeBins */
//! may proceed with insertions etc. The use of `BinEntry::Tree` bins shields us from the worst case
//! effects of overfilling while resizes are in progress. Resizing proceeds by transferring bins,
//! one by one, from the table to the next table. However, threads claim small blocks of indices to
//! transfer (via the field `transfer_index`) before doing so, reducing contention. A generation
//! stamp in the field `size_ctl` ensures that resizings do not overlap. Because we are using
//! power-of-two expansion, the elements from each bin must either stay at same index, or move with
//! a power of two offset. We eliminate unnecessary node creation by catching cases where old nodes
//! can be reused because their next fields won't change. On average, only about one-sixth of them
//! need cloning when a table doubles. The nodes they replace will be garbage collectible as soon
//! as they are no longer referenced by any reader thread that may be in the midst of concurrently
//! traversing table. Upon transfer, the old table bin contains only a special forwarding node
//! (`BinEntry::Moved`) that contains the next table as its key. On encountering a forwarding node,
//! access and update operations restart, using the new table.
//!
//! Each bin transfer requires its bin lock, which can stall waiting for locks while resizing.
//! However, because other threads can join in and help resize rather than contend for locks,
Expand Down Expand Up @@ -190,10 +202,26 @@
//! occurring at threshold is around 13%, meaning that only about 1 in 8 puts check threshold (and
//! after resizing, many fewer do so).
//! */
/* TODO:
//!
//! TreeBins comparisons and locking
*/
/* NOTE that we don't actually use most of the Java Code's complicated comparisons and tiebreakers
* since we require total ordering among the keys via `Ord` as opposed to a runtime check against
* Java's `Comparable` interface. */
//! `BinEntry::Tree` bins use a special form of comparison for search and related operations (which
//! is the main reason we cannot use existing collections such as tree maps). The contained tree
//! is primarily ordered by hash value, then by [`cmp`](std::cmp::Ord::cmp) order on keys. The
//! red-black balancing code is updated from pre-jdk colelctions (http://gee.cs.oswego.edu/dl/classes/collections/RBCell.java)
//! based in turn on Cormen, Leiserson, and Rivest "Introduction to Algorithms" (CLR).
//!
//! `BinEntry::Tree` bins also require an additional locking mechanism. While list traversal is
//! always possible by readers even during updates, tree traversal is not, mainly because of
//! tree-rotations that may change the root node and/or its linkages. Tree bins include a simple
//! read-write lock mechanism parasitic on the main bin-synchronization strategy: Structural
//! adjustments associated with an insertion or removal are already bin-locked (and so cannot
//! conflict with other writers) but must wait for ongoing readers to finish. Since there can be
//! only one such waiter, we use a simple scheme using a single `waiter` field to block writers.
//! However, readers need never block. If the root lock is held, they proceed along the slow
//! traversal path (via next-pointers) until the lock becomes available or the list is exhausted,
//! whichever comes first. These cases are not fast, but maximize aggregate expected throughput.
//!
//! ## Garbage collection
//!
Expand All @@ -213,6 +241,7 @@
intra_doc_link_resolution_failure
)]
#![warn(rust_2018_idioms)]
#![allow(clippy::cognitive_complexity)]
use crossbeam_epoch::Guard;
use std::ops::Deref;

Expand Down
Loading

0 comments on commit dcb5a1c

Please sign in to comment.