Skip to content

Evaluation

matthew edited this page Jan 6, 2017 · 61 revisions

Table of Contents

Unlexicalised tree linearisation grammars

Language Coverage (Test) Mean BLEU Score Sample Mean (Test) Mean BLEU Score Sample Mean (Train)
Basque 0.7718 0.7728 0.7853
Catalan 0.8515 0.7208 0.7366
Chinese 0.7631 0.7407 0.9008
Czech 0.9021 0.7683 0.7580
English 0.7501 0.8200 0.8798
Finnish 0.8139 0.7797 0.8238
German 0.7292 0.7718
Latin 0.6907 0.8245 0.7622
Spanish 0.8119 0.7334
Turkish 0.7364 0.8336 0.9029

Greedy trigram model (inspired by Bohnet et al., 2012)

Language BLEU
Basque 0.83370
Catalan 0.91623
Chinese 0.95922
Czech 0.82162
German 0.78782
English 0.68675
Spanish 0.88910
Finnish 0.79517
Turkish 0.76655

Treebank statistics

Basque

Sentence Length Statistic Test Train
Size 1799 5396
Minimum 3 3
Maximum 39 64
Range 36 61
Median 13 12
First Quartile 9 9
Third Quartile 17 17
Inter-Quartile Range 8 8
Mean 13.5486 13.5237
Standard Deviation 6.2585 6.4141

Catalan

Sentence Length Statistic Test Train
Size 834 5809
Minimum 2 2
Maximum 103 151
Range 101 149
Median 23 22
First Quartile 15.25 14
Third Quartile 32 32
Inter-Quartile Range 16.75 18
Mean 24.8261 24.6741
Standard Deviation 13.3844 14.1310

Chinese

Sentence Length Statistic Test Train
Size 500 3997
Minimum 7 4
Maximum 97 111
Range 90 107
Median 21.5 22
First Quartile 15 16
Third Quartile 30 31
Inter-Quartile Range 15 15
Mean 24.0240 24.6705
Standard Deviation 11.8940 12.3369

Czech

Sentence Length Statistic Test Train
Size 9835
Minimum 1
Maximum 132
Range 131
Median 15
First Quartile 9
Third Quartile 23
Inter-Quartile Range 14
Mean 16.7784
Standard Deviation 10.7498

English

Sentence Length Statistic Test Train
Size 2077 12543
Minimum 1 1
Maximum 81 159
Range 80 158
Median 9 14
First Quartile 4 7
Third Quartile 17 23
Inter-Quartile Range 13 16
Mean 12.0828 16.3108
Standard Deviation 10.6025 12.4011

Finnish

Sentence Length Statistic Test Train
Size 587 11042
Minimum 1 1
Maximum 79 136
Range 78 135
Median 12 11
First Quartile 8 7
Third Quartile 17 16
Inter-Quartile Range 9 9
Mean 13.0528 12.3861
Standard Deviation 7.8022 7.9240

German

Sentence Length Statistic Test Train
Size 746
Minimum 1
Maximum 49
Range 48
Median 13
First Quartile 8
Third Quartile 20
Inter-Quartile Range 12
Mean 14.8029
Standard Deviation 8.7081

Latin

Sentence Length Statistic Test Train
Size 230 2660
Minimum 5 1
Maximum 44 78
Range 39 77
Median 21 12
First Quartile 16 7
Third Quartile 25 18
Inter-Quartile Range 9 11
Mean 21.0087 14.2177
Standard Deviation 6.5342 9.6283

Spanish

Sentence Length Statistic Test Train
Size 164
Minimum 3
Maximum 76
Range 73
Median 19
First Quartile 12
Third Quartile 29
Inter-Quartile Range 17
Mean 23.0244
Standard Deviation 15.3432

Turkish

Sentence Length Statistic Test Train
Size 612 3022
Minimum 1 1
Maximum 47 49
Range 46 48
Median 7 7
First Quartile 4.75 5
Third Quartile 11 11
Inter-Quartile Range 6.25 6
Mean 8.5588 9.1343
Standard Deviation 5.8055 6.6422
Clone this wiki locally