🤗 Back translation: Papago 번역기를 web crawling.
🤗 Pre-trianer: HuggingFace의 maksed language model을 Pre-train.
🤗 Parameters: Best model에 Pre-trained model을 load.
pip install -r requirements.txt
- chromedriver
# 기본 사용 방법
python back_translation.py
# nohup으로 log 보고 싶으면
bt.sh
Options
- --remove_stop_words: remove stop words (default: False)
- --only_kor_to_en: translate only kor to en (default: False)
- --only_en_to_kor: translate only en to kor (default: False)
- --len: specify length of csv file (default: False)
Outputs
- final_kor_to_eng_{file_time}.npy
- final_en_to_kor_{file_time}.npy
- back_translation_result.csv: contain kor_to_eng and eng_to_kor.
python pretrain.py
Pre-trained model saved dir = './pretrined_model'.
- Import back_trans/parameters.BackTransPreTrain in your model.
- MODEL_NAME: must use 'klue/roberta-large'
- 'klue/roberta-large'를 transformers.AutoModel나 transformers.AutoModelForMaskedLM로 load해야지만 pre-trained model을 사용 가능.
from back_trans import BackTransPreTrain
model = AutoModel.from_pretrained(MODEL_NAME)
bpt = BackTransPreTrain(pretrain_path)
model.load_state_dict(bpt.load_parameters(MODEL_NAME))