🤓 KLUE MRC(Machine Reading Comprehension) Dataset으로 Open-Domain Question Answering을 수행하는 Task.
🤓 질문에 관련된 문서를 찾아주는 Retriever와 관련된 문서를 읽고 답변을 하는 Reader로 구성.
🤓 Leaderboard에서 Public 240개, Private 360개로 평가가 이루어짐.
🤓 하루 10회로 모델 제출 제한
- retrieval
- Elastic search
- Pororo NER
- Data Augmentation
- Negative Sampling
- Question Generation
- Post Processing
- Top-k Passages Seperate
- Answer scoring with softmax
- Similiarity scoring with KSS(Korean Sentence Spliter)
- Other post-processing via Mecab
- Ensemble
- Hard voting
- Post processing
- python module: requirements.txt
- Elatsic Search: Installation docs
- Mecab: Korean mecab doc
- Elastic Search
- root userelastic_test.py
python elastic_test.py
- non-root user
./bin/elasticsearch -d -p pid
- Genertate NER tagged files
# outputs = train_tagged.csv, inference_tagged.csv
python make_ner_tag.py
- Generate K-fold trainig files
# outputs = fold{n}.csv
python make_ner_tag.py
- Indexing wikipedia files using Elastic search
python elastic_search.py
- wandb setting
# default wandb setting in train.py
run = wandb.init(project= 'klue', entity= 'quarter100', name= f'Any training name')
- Copy qg_dataset in ../data directory
python train.py
Models are saved in "./models/train_dataset_{experiment_name}/".
python inference.py --output_dir ./outputs/test_dataset/ --dataset_name ../data/test_dataset/ --model_name_or_path ./models/train_dataset/ --do_predict
Prediction csv files are saved in "./outputs/test_dataset/".
- Hard voting
- Ensemble result is saved in "./submission_fold_total.csv".