Perceptronix: Linear model template library for C++
This library provides C++ classes for linear classifiers for binary-valued (i.e., nominal, "one-hot") features, such as encountered in natural language processing. Models are trained from labeled examples using the perceptron learning algorithm with or without weight averaging. The former produces an informal form of L1 normalization (with all else held equal, it favors zero weights over non-zero weights, as weights are initialized to zero and a large margin is not part of the objective) whereas the latter produces an informal form of L2 normalization (with all else held equal, it prefers weights with smaller magnitudes, as weights are initialized to zero). Both binomial and multinomial classifiers are supported. Weights may be stored either in sparse (string-keyed hash table) or dense (integer-keyed array) containers. Models may be serialized and deserialized using protocol buffers. Users may specify a large margin constant "C".
The normal workflow is to create a ...Model
object, train it, then average it,
creating an immutable, serializable object, E.g.:
::perceptronix::SparseDenseMultinomialModel model(nfeats, nlabels, c);
// ... training ...
model.Average();
// ... inference ...
The major classes are:
DenseBinomialModel
: A binomial classifier using a dense weight table; users must set the maximum number of unique features at construction time.SparseBinomialModel
: A binomial classifier using a dynamically-sized sparse weight table.DenseMultinomialModel
: A multinomial classifier using dense weight tables; users must set the maximum number of unique features and labels at construction time.SparseDenseMultinomialModel
: A multinomial classifier using a dynamically-sized hash table for the outer table and dense arrays for the inner table.SparseMultinomialModel
: A multinomial classifier using dynamically-sized hash tables for both outer and inner tables.
There are also HMM-like classes for greedy sequential decoding:
SparseBinomialSequentialModel
SparseDenseMultinomialSequentialModel
SparseMultinomialSequentialModel
Each of the above classifiers has a correspondent type in the Python API. Take a
look at apps/sentence_tokenizer/model.py
for a worked example.
Weight averaging is done using a lazy strategy. An averaged weight consists of:
- The true weight (
weight_
), used for inference during training, and updated using the perceptron learning rule - The summed weight (
summed_weight_
), which may or may not be "fresh" - A timestamp (
time_
) indicating when the averaged weight was last "freshened"
At time
, the time elapsed since the summed weight was last freshened is given
by time - time_
.
The summed weight is freshened by adding to it the true weight multiplied by the time elapsed since timestamp.
At averaging time, weights are freshened and then the summed weight is divided by the current time.
The header table.h
wraps the two types of weight containers so that they
provide a common interface for access, mutation, and iteration.
The resulting classes are then used as template template parameters (no, that's not a typo) to the classifiers themselves.
Virtual dispatch is not used anywhere, so there is no performance penalty for this polymorphism.
Labels and features are to either either unsigned integers (for dense tables) or strings (for sparse tables), and therefore are passed by value in both cases. In the latter case, it is assumed that the strings are short so that we can take advantage of "short string" optimizations.
There is a built-in bias term, so users do not need to provide one.
For dense tables, passing a feature integer greater than or equal to nfeats
is
Undefined Behavior.
For sparse tables, the empty string is reserved for internal use, and attempting to use it as either a label or feature is Undefined Behavior.
Instead of the naïve strategy of treating labels as the outer indices and features as the inner indices, this library uses outer feature indices and inner label indices in the multinomial case. This allow sparse multinomial models to perform particularly well in the very common case that most features have zero weights for all labels; the cost of inference for an "irrelevant" feature---one which has zero weights for all labels, or one which has a non-zero weight for some labels but not the one under consideration---is merely that of a hash table miss.
When you have sparse features, but you know the maximum number of labels at
construction time (and this number is small, say, less than 64), consider using
SparseDenseMultinomialModel
rather than SparseMultinomialModel
. This model
uses std::valarray
for inner weight tables, which should allow the "dot
product" during scoring to use fast vectorized arithmetic operations.
If you do use SparseMultinomialModel
, consider tuning the nlabels
constructor argument, which is used to reserve memory for inner tables.
The library depends on C++14 and protobuf 3.0 or greater. It should compile with
g++
or clang++
. Users may also need the protobuf compilation tool protoc
,
which is sometimes distributed separately from protobuf itself.
The Python wrapper and applications are written for Python 3.7+. Compiling the
wrapper requires Cython, and the applications require the nlup
and regex
Python libraries, all of which are available from PyPI.
See LICENSE
.