This study aims to compare three NLP methods for extracting named entities with complex labels and very limited training data.
We compare:
- Fine-tuning BERT on NER classification
- Data augmentation with GPT-3.5 and fine-tuning BERT on both the original and data-augmented training datasets
- OpenAI (GPT-3.5 and GPT-4) with RAG based on the same training dataset
We trained our methods on an Influenza corpus and evaluated the ability of these approaches to generalize to other diseases (Leptospirosis and Chikungunya).
- Download the manual annotations:
- Manual scoping review made by the MOOD project. Manual annotation at document level with normalization of entities extracted
- Manual annotation at sentence level: Work in progress
-
Download the papers used for the manual annotations:
- Download all the papers mentioned in the manual annotation using their DOI.
-
Convert PDFs into TEI:
- We suggest using GROBID through its HuggingFace space.
The two methods described below can be run using this notebook: generate_annotation.ipynb.
Work in progress: the notebook needs to be adapted to the data from the Zenodo repository.
From Manual Annotations:
The manual annotations are at the document level. To fine-tune BERT-like pre-trained models, we need to generate a SpaCy annotation schema.
From Data Augmentation Using GPT-3.5:
Use GPT-3.5 to create synthetic data from the manual annotations.
Train BERT-like Models:
Train 3 models:
- roberta-base
- microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
- FacebookAI/xlm-roberta-base
On two datasets:
- From the manual annotations
- From the manual annotations + synthetic data
All these 6 trainings can be done using this notebook: train_models.ipynb.
Work in progress: the path to the training dataset needs to be adapted to the current environment.
Then infer with the models trained on the whole datasets (the 3 diseases), using this script: full_article_inference.py.
RAG Process for LLMs:
Create a RAG database (FAISS) and a Langchain pipeline for:
- GPT-3.5
- GPT-4
Using this notebook: RAG.ipynb.
Compare cosine similarity between pairs (annotation/prediction). Extract only the best match for each article (even if some articles have several covariates annotated).
Run this script: Evaluate_at_document_level.py.
Acknowledgement:
This study was partially funded by EU grant 874850 MOOD. The contents of this publication are the sole responsibility of the authors and do not necessarily reflect the views of the European Commission