Code for - 'FastDoc: Domain-Specific Fast Continual Pre-training Technique using Document-Level Metadata and Taxonomy'

Required dependencies

Please run pip install -r requirements.txt (python3 required). For fine-tuning on the TechQA Dataset, use this.

Links to models pre-trained on the EManuals Corpus

Proposed RoBERTa-based variants

Proposed BERT-based variants

Baselines

Links to the models pre-trained in the Scientific Domain

Proposed Variants

Baseline

SciBERT

Fine-tuning on SQuAD 2.0

To download the training set, run wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json.
Run python3 finetune_squad.py <MODEL_TYPE> <MODEL_PATH>
- <MODEL_TYPE> can be bert or roberta
- <MODEL_PATH> is the model path/HuggingFace model name.

To get the models fine-tuned on SQuAD 2.0, follow the following format to get the link - https://huggingface.co/AnonymousSub/<SUBSTRING AFTER THE LAST '/' IN PRE-TRAINED MODEL LINK>_squad2.0 (For example, the link to the model obtained after fine-tuning FastDoc_RoBERTa - https://huggingface.co/AnonymousSub/rule_based_roberta_hier_triplet_epochs_1_shard_1 on SQuAD 2.0 is https://huggingface.co/AnonymousSub/rule_based_roberta_hier_triplet_epochs_1_shard_1_squad2.0)

Fine-tuning on TechQA Dataset

Go to this link

Fine-tuning on S10 QA Dataset

Go to this link

Fine-tuning on some of the SciBERT Paper Datasets

Check all notebooks here.

Fine-tuning on GLUE Benchmark Datasets

Check all notebooks here.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
GLUE_code		GLUE_code
S10_Code		S10_Code
Scibert_datasets_code		Scibert_datasets_code
TechQA_code		TechQA_code
pre_training		pre_training
README.md		README.md
finetune_squad.py		finetune_squad.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code for - 'FastDoc: Domain-Specific Fast Continual Pre-training Technique using Document-Level Metadata and Taxonomy'

Required dependencies

Links to models pre-trained on the EManuals Corpus

Links to the models pre-trained in the Scientific Domain

Fine-tuning on SQuAD 2.0

Fine-tuning on TechQA Dataset

Fine-tuning on S10 QA Dataset

Fine-tuning on some of the SciBERT Paper Datasets

Fine-tuning on GLUE Benchmark Datasets

About

Releases

Packages

Contributors 2

Languages

manavkapadnis/FastDoc-Fast-Pre-training-Technique

Folders and files

Latest commit

History

Repository files navigation

Code for - 'FastDoc: Domain-Specific Fast Continual Pre-training Technique using Document-Level Metadata and Taxonomy'

Required dependencies

Links to models pre-trained on the EManuals Corpus

Links to the models pre-trained in the Scientific Domain

Fine-tuning on SQuAD 2.0

Fine-tuning on TechQA Dataset

Fine-tuning on S10 QA Dataset

Fine-tuning on some of the SciBERT Paper Datasets

Fine-tuning on GLUE Benchmark Datasets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages