- More info in the model card: https://huggingface.co/maximxls/text-normalization-ru-terrible
- Clone this repo:
git clone https://github.com/maximxlss/text_normalization
cd text_normalization
- Install requirements:
pip install -r requirements.txt
- Install PyTorch
- Download
ru_train.csv
from this Kaggle challenge - Run
python preprocess.py
(takes time) - Run
python train_tokenizer.py
(also takes time) - Tweak settings in
train.py
- Run
python train.py
- I have reset the scheduler (see
train.py
) manually when training so keep that in mind. You can see the details of the training process in the metrics