Name		Name	Last commit message	Last commit date
parent directory ..
data-import		data-import
.gitignore		.gitignore
README.md		README.md
download_data.sh		download_data.sh
generate_data.ipynb		generate_data.ipynb
generate_exercise_precursors.ipynb		generate_exercise_precursors.ipynb
generate_sentence_audio.ipynb		generate_sentence_audio.ipynb
import_sql.ipynb		import_sql.ipynb
populate_exercise_table.py		populate_exercise_table.py
requirements.txt		requirements.txt
similar_words.ipynb		similar_words.ipynb

README.md

How to generate exercises

First you'll have to download the Tatoeba dataset. Run
```
./download-data.sh
```
You will find the data under data-tatoeba/ inside this folder.
Download any required sentence pairs you are interested in and place it in data-tatoeba/.
As this sentence pairs are always created on the fly, this step has to happen manually.
Sentence pair data should be named sentences_{SRC_LANG}_{TGT_LANG}.tsv, where SRC_LANG and TGT_LANG are lowercased two-letter language codes. For example sentences_uk_de.tsv.
Run the ./generate_data.ipynb notebook to process raw data into TSVs. Change input variables as needed.
Run the ./import_sql.ipynb notebook to import the generated CSVs into a local SQLite database.
Run the ./generate_exercise_precursors.ipynb notebook to generate exercise precursors.
Run the ./similar_words.ipynb notebook to generate similar words.
Optionally do a manual quality control check over the generated data.
To populate the exercise table which is the main entity in the API server, run
```
python3 populate_exercise_table.py
```
The script expects the following two files to exist (they have to be moved to that folder manually):
1. data-import/exercise-import.tsv: which is the output of the ./generate_exercise_precursors.ipynb notebook
2. data-import/similar-words-import.tsv: which is the output of the ./similar_words.ipynb notebook
Optionally run the ./generate_sentence_audio notebook to generate audio files.

After all those above steps. You should have a ready-to-use tasbkpool.db SQLite DB in the parent folder which will be used by the API server.