Code for - 'FastDoc: Domain-Specific Fast Continual Pre-training Technique using Document-Level Metadata and Taxonomy'
Please run pip install -r requirements.txt
(python3
required). For fine-tuning on the TechQA Dataset, use this.
- Proposed RoBERTa-based variants
- Proposed BERT-based variants
- Baselines
- Proposed Variants
- Baseline
-
To download the training set, run
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json
. -
Run
python3 finetune_squad.py <MODEL_TYPE> <MODEL_PATH>
<MODEL_TYPE>
can bebert
orroberta
<MODEL_PATH>
is the model path/HuggingFace model name.
To get the models fine-tuned on SQuAD 2.0, follow the following format to get the link -
https://huggingface.co/AnonymousSub/<SUBSTRING AFTER THE LAST '/' IN PRE-TRAINED MODEL LINK>_squad2.0
(For example, the link to the model obtained after fine-tuning FastDocRoBERTa - https://huggingface.co/AnonymousSub/rule_based_roberta_hier_triplet_epochs_1_shard_1 on SQuAD 2.0 is https://huggingface.co/AnonymousSub/rule_based_roberta_hier_triplet_epochs_1_shard_1_squad2.0)
- Go to this link
- Go to this link
- Check all notebooks here.
- Check all notebooks here.