-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
415 lines (334 loc) · 17.3 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
04/05/2012
Nile -- a hierarchical, syntax-based discriminative alignment package
This document describes how to use and run Nile.
============================================
CONTENTS
============================================
I. REQUIREMENTS
II. PREPARING YOUR DATA
III. TRAINING
IV. USER-DEFINED FEATURES
V. ITERATIVE VITERBI TRAINING & INFERENCE
VI. TESTING
VII. OTHER OPTIONS
VIII. QUESTIONS/COMMENTS
IX. REFERENCES
============================================
I. REQUIREMENTS
============================================
Nile currently depends on a few packages for
logging, ui, implementation, and parallelization:
A. python-gflags: a commandline flags module for python
(http://code.google.com/p/python-gflags/)
B. pyglog: a logging facility for python based on google-glog
(http://www.isi.edu/~riesa/software/pyglog)
C. svector: a python module for sparse vectors, by David Chiang
A version is included in this distribution under svector/
You can download the latest version from:
http://www.isi.edu/~chiang/software/svector.tgz
D. An MPI Implementation. We use MPICH2.
(http://www.mcs.anl.gov/research/projects/mpich2/)
Download the latest stable release for your architecture.
Then follow the Installer's Guide, available in the documentation
section of the website.
See Section II for more information.
E. Boost MPI Python bindings.
(http://www.boost.org/users/download/)
Installation instructions:
http://www.boost.org/doc/libs/1_49_0/doc/html/mpi.html
============================================
II. PREPARING YOUR DATA
============================================
A. To train an alignment model you will need some data.
We use some simple canonical filenames below in describing each, but
you can call them anything you'd like.
1. train.f: a file of source-language sentences, one per line.
2. train.e: a file of target-language sentences, one per line.
3. train.a: a file of gold-standard alignments for each sentence pair
in train.f and train.e; each line in the file should be a
sequence of space-separated strings encoding a single link in
f-e format.
e.g.: 0-1 1-2 2-2 2-3 4-5
4. train.e-parse: a file of target-language parse trees, one for each line
in train.e; trees should be in standard Penn Treebank format, e.g.:
(TOP (S (NP (DT the) (NN man)) (VP (VBD ate))))
We use tokens -RRB- and -LRB- to represent right and left parentheses,
respectively (see below).
5. train.f-parse: a file of source-language parse-trees, one for each line
in train.f (OPTIONAL)
Also prepare heldout development and test data in the same manner.
Source-tree files are optional, but all others are required.
Throughout the rest of this document we use the same filename extensions
as above for our development and test data, e.g.:
dev.e <-- target-language sentences in heldout development data
dev.f <-- source-language sentences in heldout development data
test.e <-- target-language sentences in heldout test data
test.f <-- source-language sentences in heldout test data
ADDITIONAL NOTES:
(i) Why use a heldout development (dev) and test set?
After every epoch of training Nile checks it's
current performance on this dev set. When performance is no longer
increasing on this dev set, we say that we've converged and we stop
training. (Section III(D) and III(E))
Since we select the alignment model to ultimately use based on
performance on our development set, we need a second heldout set
to as a way to predict performance on truly unseen future data.
Using the model we select based on development performance in
Section III(E), we align our test.* data and note the accuracy.
(ii) We relabel parentheses tokens before parsing,
i.e. "(" -> -LRB- and ")" -> -RRB-. For example:
$ sed -e 's/(/-LRB-/g' -e 's/)/-RRB-/g' < input > input.clean
And then we parse, by doing:
java -Xmx2600m -Xms2600m -jar berkeleyParser.jar \
-gr eng_grammar.gr \
-binarize \
-maxLength 1000 < input.clean > output
Because of the way cube pruning works, you will encounter far fewer
search errors if you binarize your trees before training by using the
-binarize flag.
(ii) In case of sentences that failed to parse:
Use a blank line, a 0 on a line by itself,
or the Berkeley parser default failure string: (())
to tell Nile to skip the affected sentence pair.
C. Tables from GIZA++ output (Brown et al., 1993; Och and Ney, 2003)
We run GIZA++ Model-4 on a large corpus, and compute p(e|f) and p(f|w)
word association tables from simply counting links in the final Viterbi
alignment.
If you don't have time to run Model-4, that's fine. We've seen benefits
from using counts from just HMM or Model-1 training.
p(e|f) file format:
<e-word> <f-word> p(e|f)
p(f|e) file format:
<f-word> <e-word> p(f|e)
D. Alignment files from GIZA++ (OPTIONAL)
You can pass up to two third-party alignment files to the trainer with flags
--a1 and --a2 in nile.py. For --a1 we use intersection of Model-4 alignments
from e->f and f->e directions. For --a2 we use grow-diag-final-and
symmetrizatized alignments.
These alignments will allow the trainer to fire indicator features for making
the same predictions as your supplied alignments. Feel free to substitute any
other type of alignments here as input. Using GIZA++ Model-4 intersection and
grow-diag-final-and alignments here, we generally see a large F-score increase.
E. Vocabulary files. We'll need to give the trainer (and aligner) some
vocabulary files it will use to filter potentially large p(e|f) and p(f|e)
data files. Keeping these full data files in memory can be prohibitively
expensive.
Concatenate your training and development e and f files and run
prepare-vocab.py:
$ cat train.e dev.e | ./prepare-vocab.py > e.vcb
$ cat train.f dev.f | ./prepare-vocab.py > f.vcb
Use these files as input to nile.py with flags --evcb and --fvcb.
============================================
III. TRAINING
============================================
Training a new model with Nile involves (1) specifying your data files
as commandline arguments, and (2) invoking training mode. We provide a
sample training script, train.sh, in this distribution invoking only the
flags required to get going.
A. Cluster computing
The sample training script uses the Portable Batch System (PBS), a popular
networked subsystem for controlling jobs on a computing cluster. You can
remove the PBS directives at the top of the file if you are running locally
on a single machine (we strongly recommend machines with multiple CPUs), or
just modify the file to suit your architecture.
B. MPI
Take note of where your MPI binaries, libraries,
and MPI Python bindings live. Then modify the MPI Initialization section
with the appropriate paths.
C. Training Name
Every training run has a name. Your run's default name is:
d<date>.k<beam-size>.n<cpu-pool-size>.<langpair>.target-tree.0
D. Running the program. On PBS, do:
$ qsub train.sh
Or, on a local machine with multiple CPUs, do:
$ ./train.sh
D. Inspecting accuracy on the held-out data:
To inspect held-out F-scores, do:
$ grep F-score-dev <name>.err
To sort held-out F-scores in descending order do:
$ grep F-score-dev <name>.err | awk '{print $2}' | cat -n | sort -nr -k 2
E. Convergence
If the highest-scoring epoch, H, is much earlier than your current
epoch number, you have probably converged. Kill the training job
and extract weights from epoch H:
$ ./weights <name> H
Weights will be written to file:
<name>.weights-H
=====================================================
IV. USER-DEFINED FEATURES
=====================================================
You can add your own feature functions to Features.py
or maintain several different Feature modules for different
language pairs.
If you want to have Feature modules for, say,
Arabic-English and Chinese-English, name them:
Features_ar_en.py and Features_zh_en.py respectively.
Setting the --langpair flag with argument LANG1_LANG2 will
tell Nile to use these modules. Nile will look for a file called:
Features_LANG1_LANG2.py.
nile.py --e train.e \
--f train.f \
...
--langpair ar_en
=====================================================
V. ITERATIVE VITERBI TRAINING & INFERENCE (optional)
=====================================================
This procedure is somewhat time-consuming because you will need to train several
models, and align your data several times. However, if you have the time, the
improvement in alignment quality may be worth it.
Parse trees for both target and source text are required for this procedure.
1. Train a target-tree model as in section III.
2. Train a source-tree model by:
(a) Transform your gold-standard data to e-f format;
source-tree models will read and output alignments in
e-f format as opposed to f-e format.
$ perl -pe 's/(\d+)-(\d+)/$2-$1/g' < train.a.f-e > train.e.e-f
(b) flip the argument flags for your e and f data when you run Nile. For example:
python nile.py \
--e train.f \
--f train.e \
--gold train.a.e-f \
--ftrees train.e-parse \
--etrees train.f-parse \
--fdev dev.e \
--edev dev.f \
--ftreesdev dev.e-parse \
--etreesdev dev.f-parse \
--golddev dev.a.e-f \
--fvcb e.vcb \
--evcb f.vcb \
--pfe GIZA++.m4.pef \
--pef GIZA++.m4.pfe \
--a1 train.m4i.e-f \
--a2 train.m4gdfa.e-f \
--a1_dev dev.m4i.e-f \
--a2_dev dev.m4gdfa.e-f \
--langpair zh-en \
--train \
--k 128
3. The next step of training involves learning target-tree and
source-tree models again, but this time giving as input the
outputs of the models learned in the first round. You do this
with the --inverse and --inverse_dev flags.
(a) Run Nile in --align mode and align your training data and then dev data with the
source-tree model you've learned.
(b) Flip the alignment links to f-e format and supply these to your next target-tree
training with --inverse and --inverse_dev, e.g.:
nile.py --e train.e --f train.f --a train.a.f-e --inverse train-st.a.f-e ... etc.
Nile will fire features to softly enforce agreement between the two models.
(c) Analogously, for your next source-tree model, flip the aligned 1-best alignments
of your training and dev data from the target-tree model to e-f format, and supply
it to Nile with the --inverse and --inverse_dev flags:
nile.py --e train.f --f train.e --a train.e.e-f --inverse train-tt.a.e-f ... etc.
============================================
VI. TESTING
============================================
At test time, it is important to use the same types of parameters and input data you used
during training. If you trained a model with a beam of K=128, then keep that beam at test time.
If you used GIZA++ Model-4 alignments as input with flags --a1 and --a2, then similarly also
supply alignment predictions from GIZA++ at test time. Finally, binarize your trees on test
data the same way you did for training and development data.
A. Preparing Vocabulary files:
As with the training and development data, prepare source and target vcb files.
$ ./prepare-vocab.py < test.e > test.e.vcb
$ ./prepare-vocab.py < test.f > test.f.vcb
Use these files as arguments for the --evcb and --fvcb flags to nile.py
in your testing script.
B. Editing test.sh
Edit the WEIGHTS= line in test.sh for your weights filename.
C. Set Nile to "align" mode.
At test time, we replace the --train flag with the
--align flag when running nile.py.
D. Running test.sh
Then, run test script test.sh. Using PBS, do:
$ qsub test.sh
Or, on a local machine with multiple CPUs:
$ ./test.sh
By default, alignment output is written in f-e format to:
<name>.weights-H.test-output.a
E. Evaluation
If you are aligning data for which you have gold-standard alignments,
you can calculate F-measure using our provided scripts. Remember,
alignments are output in f-e format and should be compared against
data in the same format.
Usage: ./Fmeasure.py <your-file> <gold-file>
============================================
VII. OTHER OPTIONS
============================================
A. L1 Regularization (Feature Selection; experimental)
Nile implements a parallelized version of L1 Regularization via projection after each epoch.
(Hastie 1996; Duchi et al., 2008; Martins et al. 2011)
Enabling this feature will allow you to learn a much smaller model that, in our experiments,
should achieve essentially the same accuracy or better. This is useful for scaling to very large
training sets, and you may also see some generalization benefits.
To enable, set Nile's L1 Tau coefficient variable to 1 with commandline flag:
--tau 1
B. Debiasing (experimental)
While L1 yields sparse solutions, these solutions are known to be biased in magnitude which may
negatively affect accuracy. After learning a sparse model with L1, we train a new model but this
time, we only allow the features we learned in our sparse model to fire. This is called Debiasing.
To enable:
1. Turn debiasing mode on:
--debiasing
2. Tell Nile about the weight vector you learned during the L1 feature selection step,
use --debiasing_weights and supply a weight vector in svector format:
--debiasing_weights <sparse-model weights>
(Make sure you have removed the --tau flag from your Nile invocation.)
C. Advanced Perceptron Updates
1. Changing the default Oracle.
In selecting the oracle towards which we update, Chiang et al. (2008)
find that in their task, modify the traditional selection criterion
from minimized loss to a linear combination of minimized loss and
model score. We call this the "hope" oracle, because we have more
of a chance to reach it; it has high model score and low loss.
To use a "hope" oracle, use flag and argument:
--oracle hope
2. Changing the default hypothesis.
In selecting the hypothesis that we update our model away from
(and towards the oracle), we can select a hypothesis somewhat
analogously to selecting the "hope" oracle as described above.
In this case, we modify the default selection criterion from
maximum model score to a linear combination of maximum model
score and maximum loss. We call this the "fear" hypothesis;
it has the nefarious property of having both a high model score
(our model likes it), but also very high loss (it is a bad alignment).
To use a "fear" hypothesis, use flag and argument:
--hyp fear
3. Changing the default learning rate.
There is a single learning rate parameter used in the standard perceptron
update which affects the magnitude of each update. It is set to 1.0 by default.
To use a different learning rate, use:
--learning_rate <new learning rate>
============================================
VIII. QUESTIONS/COMMENTS
============================================
Troubleshooting:
If you've gone through this brief guide and are having trouble getting
this software to work for you, send mail to Jason Riesa <riesa@isi.edu>.
Technical correspondence also welcomed.
If you are interested in contributing to this project please also let us know!
============================================
IX. REFERENCES
============================================
Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, Robert L. Mercer.
The Mathematics of Statistical Machine Translation: Parameter Estimation.
Computational Linguistics, Volume 19, Number 2, pages 263-311. June 1993.
David Chiang, Yuval Marton, and Philip Resnick. Online Large-Margin Training
of Syntactic and Structural Translation Features. 2008. Proceedings of EMNLP,
pages. 224-233.
John Duchi, Shai Shalev-Schwartz, Yoram Singer, and Tushar Chandra. Efficient
Projections onto the L1-Ball for Learning in High Dimensions. 2008. Proceedings of ICML.
Andre F. T. Martins, Noah A. Smith, Pedro M. Q. Aguiar, Mario A. T. Figueiredo.
Structured Sparsity in Structured Prediction. 2011. Proceedings of EMNLP,
pages 1500-1511.
Franz Josef Och and Hermann Ney. A Systematic Comparison of Various Statistical Alignment Models.
Computational Linguistics, Volume 29, Number 1, pages 19-51. March 2003.
Jason Riesa and Daniel Marcu. Hierarchical Search for Word Alignment. 2010.
Proceedings of ACL, pages 157-166.
Jason Riesa, Ann Irvine, and Daniel Marcu. Feature-Rich Language-Independent
Syntax-Based Alignment for Statistical Machine Translation. 2011.
Proceedings of EMNLP, pages 497-507.
Jason Riesa and Daniel Marcu. Automatic Parallel Fragment Extraction from Noisy Data.
2012. Proceedings of the NAACL HLT. To appear.
Robert Tibshirani. Regression shrinkage and selection via the lasso. 1996.
J. Royal. Statist. Soc B., Vol. 58, No. 1, pages 267-288.