forked from jweese/thrax
-
Notifications
You must be signed in to change notification settings - Fork 6
Hadoop-based tool for extraction of large scale synchronous grammars for paraphrasing and machine translation
License
joshua-decoder/thrax
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Thrax uses Apache hadoop (an open-source implementation of MapReduce) to efficiently extract a synchronous context-free grammar translation model for use in modern machine translation systems. Thrax currently has support for both Hiero-style grammars (with a single non-terminal symbol) and SAMT-style grammars (where non-terminal symbols are calculated by projecting onto the span from a target-side parse tree). COMPILING: First, you need to set two environment variables: $HADOOP should point to the directory where Hadoop is installed. $AWS_SDK should point to the directory where the Amazon Web Services SDK is installed. To compile, type ant This will compile all classes and package them into a jar for use on a Hadoop cluster. At the end of the compilation, ant should report that the build was successful. RUNNING THRAX: Thrax can be invoked with hadoop jar $THRAX/bin/thrax.jar <configuration file> Some example configuration files have been included with this distribution: example/hiero.conf example/samt.conf COPYRIGHT AND LICENSE: Copyright (c) 2010-13 by the Thrax team: Jonny Weese <jonny@cs.jhu.edu> Juri Ganitkevitch <juri@cs.jhu.edu> See LICENSE.txt (included with this distribution) for the complete terms.
About
Hadoop-based tool for extraction of large scale synchronous grammars for paraphrasing and machine translation
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Java 84.6%
- JavaScript 14.6%
- Other 0.8%