A C++ implementation of structured perceptron
- googletest (Optional. This is used for unit tests.)
- gperftools (Optional. This is used for profiling)
$ autoreconf -iv
$ ./configure --prefix=/path/to/install
$ make
$ make install
<FEATURE_1>[SPACE]<FEATURE_2>[SPACE] … [SPACE]<LABEL>
<FEATURE_1>[SPACE]<FEATURE_2>[SPACE] … [SPACE]<LABEL>
<FEATURE_1>[SPACE]<FEATURE_2>[SPACE] … [SPACE]<LABEL>
…
You can download one of famous benchmark data of text chunking as follows:
$ cd scripts
$ ./download_conll2000.sh
Then, a model is learned and an output is obtained as follows:
$ train_strpercpp -e 10 ./data/train.txt template.conll model
$ test_strpercpp -v 1 -d 0 model ./data/test.txt > output.txt
Evaluation metrics are calculated as follows:
$ ./scripts/conlleval -d " " < output.txt
processed 47377 tokens with 23852 phrases; found: 23843 phrases; correct: 22295.
accuracy: 95.85%; precision: 93.51%; recall: 93.47%; FB1: 93.49
ADJP: precision: 77.54%; recall: 74.89%; FB1: 76.19 423
ADVP: precision: 83.70%; recall: 80.02%; FB1: 81.82 828
CONJP: precision: 55.56%; recall: 55.56%; FB1: 55.56 9
INTJ: precision: 100.00%; recall: 50.00%; FB1: 66.67 1
LST: precision: 0.00%; recall: 0.00%; FB1: 0.00 0
NP: precision: 93.75%; recall: 93.70%; FB1: 93.72 12415
PP: precision: 96.87%; recall: 97.92%; FB1: 97.40 4863
PRT: precision: 77.98%; recall: 80.19%; FB1: 79.07 109
SBAR: precision: 88.27%; recall: 85.79%; FB1: 87.01 520
VP: precision: 93.56%; recall: 93.90%; FB1: 93.73 4675
Results of existing models are listed in this page.
For unit testing,
$ autoreconf -iv
$ ./configure --prefix=/path/to/install --with-gtest=/path/to/gtest
$ make check
Profiling result is obtained as follows:
$ autoreconf -iv
$ ./configure --prefix=/path/to/install --with-gperf=/path/to/gperf
$ make
$ make install
$ export CPUPROFILE=prof.out; time /path/to/install/bin/train_strpercpp [options]
$ pprof /path/to/install/bin/train_strpercpp prof.out
- Michael Collins, "Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms.", EMNLP, 2002.
- Michael Collins and Brian Roark, "Incremental Parsing with the Perceptron Algorithm", ACL, 2004.