Skip to content
forked from xutaima/jhu-mt-hw

Implementation of reordering-capable decoding step for Statistical Machine Translation.

Notifications You must be signed in to change notification settings

tsimafeip/WiDec

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WiDec (Wizard Decoder) 🧙‍♂️

This project aims to implement the decoding step of the SMT, which can reorder phrases.

It is based on HW3 (description, github) from JHU Machine Translation class.

I have implemented two approaches to decoding: Beam Search and Greedy Decoding. Combining two techniques, I was able to significantly improve baseline quality. Detailed evaluation results can be found in the report.

Repository

Structure of folders is the following:

  • 'data'

    • input French sentences
    • language model in ARPA format
    • translation model
  • 'meta' - meta-information, currently here is only report file with full description of the project

  • 'model_translations' - translations produced by different decoding algorithms

  • 'src/cpp' - initial version of cpp code, currently only translation model is implemented

  • 'src/py' - main code repository in Python There are several python programs here (-h for usage):

    • decode translates input sentences from French to English using monotone decoding.
    • widecode translates input sentences from French to English using beam search decoding.
    • widecode_greedy translates input sentences from French to English using greedy decoding.
    • compute-model-score computes the model score of a translated sentence.
    • helper.py holds common functions for different models in one place.
    • models.py implements very simple interfaces for language models and translation models.

    These commands work in a pipeline or via files. For example:

    python3 decode | python3 compute-model-score
    python3 decode > output.txt
    python3 compute-model-score < output.txt

About

Implementation of reordering-capable decoding step for Statistical Machine Translation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 94.1%
  • C++ 5.6%
  • CMake 0.3%