As a part of the paper "DEplain: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification.", we developed and evaluated a simple method utilizing sentence transformers to align German text datasets automatically.
After cloning the repository
- Setup the environment
python3 -m venv env
source env/bin/activate
pip install -U pip setuptools
pip install -r requirements.txt
- Go through the
procedure.ipynb
notebook for aligning your documents