This repo contains code for the paper GPT-NER: Named Entity Recognition via Large LanguageModels.
@article{wang2023gpt,
title={GPT-NER: Named Entity Recognition via Large Language Models},
author={Wang, Shuhe and Sun, Xiaofei and Li, Xiaoya and Ouyang, Rongbin and Wu, Fei and Zhang, Tianwei and Li, Jiwei and Wang, Guoyin},
journal={arXiv preprint arXiv:2304.10428},
year={2023}
}
- python>=3.7.3
- openai==0.27.2
- simcse==0.4
This repor mainly use two addtional packages: SimCSE and OpenAI. So, if you want to know more about the arguments used in codes, please refer to the corresponding documents.
For the full NER dataset, we follow MRC-NER for preprocessing, and you can directly download these here.
For sampled 100-dataset, we have put them on the Google Drive.
For sentence-level embeddings, run openai_access/extract_mrc_knn.py
.
Note that you should change the directory for the input/output file and the used SimCSE model. In this repo, the model sup-simcse-roberta-large
is used for SimCSE, and you can find it here.
We follow the official steps to access the GPT-* Models, and the document can be found here. Before you run our scripts, you need to add OPENAI_API_KEY, which you can find it in your account profile, to the environment variable by the command export OPENAI_API_KEY="YOUR_KEY"
.
To get preditions, please run openai_access/scripts/access_ai.sh
, and the used arguments are listed in file openai_access/get_results_mrc_knn.py
.
For self-verification, please run openai_access/scripts/verify.sh
, and the used arguments are listed in file openai_access/verify_results.py
.
Note that accessing to the GPT-3
is very expensive, we thus strongly advise you to start from our sampled 100-dataset.
We use span-level precession, recall and F1-score for evaluation, and to do this, please run the script openai_access/scripts/compute_f1.sh
.
Table 1: Results of sampled 100 pieces of data for two Flat NER datasets: CoNLL2003 and OntoNotes5.0.
EnglishCoNLL2003 (Sampled 100) | EnglishOntoNotes5.0 (Sampled 100) | |||||
Model | Precision | Recall | F1 | Precision | Recall | F1 |
Baselines (Supervised Model) | ||||||
ACE+document-context | 97.8 | 98.28 | 98.04 (SOTA) | - | - | - |
BERT-MRC+DSC | - | - | - | 93.81 | 93.95 | 93.88 (SOTA) |
GPT-NER | ||||||
GPT-3 + random retrieval | 88.18 | 78.54 | 83.08 | 64.21 | 65.51 | 64.86 |
GPT-3 + sentence-level embedding | 90.47 | 95 | 92.68 | 76.08 | 83.06 | 79.57 |
GPT-3 + entity-level embedding | 94.06 | 96.54 | 95.3 | 78.38 | 83.9 | 81.14 |
Self-verification (zero-shot) | ||||||
GPT-3 + random retrieval | 88.95 | 79.73 | 84.34 | 64.94 | 65.90 | 65.42 |
GPT-3 + sentence-level embedding | 91.77 | 96.36 | 94.01 | 77.33 | 83.29 | 80.31 |
GPT-3 + entity-level embedding | 94.15 | 96.77 | 95.46 | 79.05 | 83.71 | 81.38 |
Self-verification (few-shot) | ||||||
GPT-3 + random retrieval | 90.04 | 80.14 | 85.09 | 65.21 | 66.25 | 65.73 |
GPT-3 + sentence-level embedding | 92.92 | 95.45 | 94.17 | 77.64 | 83.22 | 80.43 |
GPT-3 + entity-level embedding | 94.73 | 96.97 | 95.85 | 79.25 | 83.73 | 81.49 |
Table 2: Results of full data for two Flat NER datasets: CoNLL2003 and OntoNotes5.0.
English CoNLL2003 (FULL) | English OntoNotes5.0 (FULL) | |||||
Model | Precision | Recall | F1 | Precision | Recall | F1 |
Baselines (Supervised Model) | ||||||
BERT-Tagger | - | - | 92.8 | 90.01 | 88.35 | 89.16 |
BERT-MRC | 92.33 | 94.61 | 93.04 | 92.98 | 89.95 | 91.11 |
GNN-SL | 93.02 | 93.40 | 93.2 | 91.48 | 91.29 | 91.39 |
ACE+document-context | - | - | 94.6 (SOTA) | - | - | - |
BERT-MRC+DSC | 93.41 | 93.25 | 93.33 | 91.59 | 92.56 | 92.07 (SOTA) |
GPT-NER | ||||||
GPT-3 + random retrieval | 77.04 | 68.69 | 72.62 | 53.8 | 59.36 | 56.58 |
GPT-3 + sentence-level embedding | 81.04 | 88.00 | 84.36 | 66.87 | 73.77 | 70.32 |
GPT-3 + entity-level embedding | 88.54 | 91.4 | 89.97 | 74.17 | 79.29 | 76.73 |
Self-verification (zero-shot) | ||||||
GPT-3 + random retrieval | 77.13 | 69.23 | 73.18 | 54.14 | 59.44 | 56.79 |
GPT-3 + sentence-level embedding | 83.31 | 88.11 | 85.71 | 67.29 | 73.81 | 70.55 |
GPT-3 + entity-level embedding | 89.47 | 91.77 | 90.62 | 74.64 | 79.52 | 77.08 |
Self-verification (few-shot) | ||||||
GPT-3 + random retrieval | 77.50 | 69.38 | 73.44 | 54.23 | 59.65 | 56.94 |
GPT-3 + sentence-level embedding | 83.73 | 88.07 | 85.9 | 67.35 | 73.79 | 70.57 |
GPT-3 + entity-level embedding | 89.76 | 92.06 | 90.91 | 74.89 | 79.51 | 77.20 |
Table 3: Results of full data for three Nested NER datasets: ACE2004, ACE2005 and GENIA.
English ACE2004 (FULL) | English ACE2005 (FULL) | English GENIA (FULL) | |||||||
Model | Precision | Recall | F1 | Precision | Recall | F1 | Precision | Recall | F1 |
Baselines (Supervised Model) | |||||||||
BERT-MRC | 85.05 | 86.32 | 85.98 | 87.16 | 86.59 | 86.88 | 85.18 | 81.12 | 83.75 (SOTA) |
Triaffine+BERT | 87.13 | 87.68 | 87.40 | 86.70 | 86.94 | 86.82 | 80.42 | 82.06 | 81.23 |
Triaffine+ALBERT | 88.88 | 88.24 | 88.56 | 87.39 | 90.31 | 88.83 | - | - | - |
BINDER | 88.3 | 89.1 | 88.7 (SOTA) | 89.1 | 89.8 | 89.5 (SOTA) | - | - | - |
GPT-NER | |||||||||
GPT-3 + random retrieval | 55.04 | 41.76 | 48.4 | 44.5 | 46.24 | 45.37 | 44.1 | 38.64 | 41.37 |
GPT-3 + sentence-level embedding | 65.31 | 53.67 | 60.68 | 58.04 | 58.97 | 58.50 | 63.43 | 44.17 | 51.68 |
GPT-3 + entity-level embedding | 72.23 | 75.01 | 73.62 | 71.72 | 74.2 | 73.96 | 61.38 | 66.74 | 64.06 |
Self-verification (zero-shot) | |||||||||
GPT-3 + random retrieval | 55.44 | 42.22 | 48.83 | 45.06 | 46.62 | 45.84 | 44.31 | 38.79 | 41.55 |
GPT-3 + sentence-level embedding | 69.64 | 54.98 | 62.31 | 59.49 | 60.17 | 59.83 | 59.54 | 44.26 | 51.9 |
GPT-3 + entity-level embedding | 73.58 | 74.74 | 74.16 | 72.63 | 75.39 | 73.46 | 61.77 | 66.81 | 64.29 |
Self-verification (few-shot) | |||||||||
GPT-3 + random retrieval | 55.63 | 42.49 | 49.06 | 45.49 | 46.73 | 46.11 | 44.68 | 38.98 | 41.83 |
GPT-3 + sentence-level embedding | 70.17 | 54.87 | 62.52 | 59.69 | 60.35 | 60.02 | 59.87 | 44.39 | 52.13 |
GPT-3 + entity-level embedding | 73.29 | 75.11 | 74.2 | 72.77 | 75.51 | 73.59 | 61.89 | 66.95 | 64.42 |
If you have any issues or questions about this repo, feel free to contact wangshuhe@stu.pku.edu.cn.