Skip to content

Commit

Permalink
Update MERLIN URL and add m2_to_parallel step
Browse files Browse the repository at this point in the history
  • Loading branch information
Adriane Boyd committed Sep 4, 2018
1 parent e7327a6 commit d4a4094
Showing 1 changed file with 9 additions and 2 deletions.
11 changes: 9 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ files for the experiment.

The original corpora are available at:

- Falko: https://www.linguistik.hu-berlin.de/de/institut/professuren/korpuslinguistik/forschung/falko/zugang
- MERLIN: https://merlin-platform.eu
- [Falko](https://www.linguistik.hu-berlin.de/de/institut/professuren/korpuslinguistik/forschung/falko/zugang)
- [MERLIN](http://hdl.handle.net/20.500.12124/6)

Tables linking the Falko/MERLIN sentence pairs to their text IDs from
the original corpora are in `data/source/`. For both corpora, the `ctok`
Expand Down Expand Up @@ -187,3 +187,10 @@ data:
```
python filter_m2.py -filt wiki-unfiltered.m2 -ref fm-train.m2 -out wiki-filtered.m2
```

Convert the filtered wiki m2 back to a plaintext file of parallel
sentences:

```
python m2_to_parallel.py -m2 wiki-filtered.m2 -out wiki-filtered.src-trg.txt
```

0 comments on commit d4a4094

Please sign in to comment.