This repository contains the QEMP corpus, a metadata corpus from biodiversity research with 50 metadata files selected from 5 different repositories and biodiversity related projects as well as the BiodivTagger, a text mining pipeline that extracts biological entities.
- Pipeline contains the text mining pipeline to annotate biological Named Entities.
- Evaluation contains the python script to evaluate the pipeline with the gold standard and the evaluation results.
- QEMP Corpus contains the raw metadata xml files per data repository and the gold standard in json format.
- Ontological Issues List provides a list with missing ontological entries and ontological conflicts.
- Pipeline The BiodivTagger is distributed under the GNU GPL v3.0
- QEMP Corpus The QEMP corpus is distributed under the CC-BY-4.0
Löffler, F., Abdelmageed, N., Babalou, S., Kaur, P., König-Ries, B.: Tag Me If You Can! Semantic Annotation of Biodiversity Metadata with the QEMP Corpus and the BiodivTagger, Language Resources and Evaluation Conference (LREC), Marseille, France, 2020