Baseline model assuming that everything works here :)
Also good to see how much time it actually takes to train with HF
- BERT wordpiece tokenizer with proper pre-tokenization
- trained on latin oscar+wiki
- train-script based on
run_mlm.py
- maybe add some
deepspeed
sbatch
etc.