text_normalization

More info in the model card: https://huggingface.co/maximxls/text-normalization-ru-terrible

Clone this repo: git clone https://github.com/maximxlss/text_normalization
cd text_normalization
Install requirements: pip install -r requirements.txt
Install PyTorch
Download ru_train.csv from this Kaggle challenge
Run python preprocess.py (takes time)
Run python train_tokenizer.py (also takes time)
Tweak settings in train.py
Run python train.py
I have reset the scheduler (see train.py) manually when training so keep that in mind. You can see the details of the training process in the metrics

Provide feedback