COVID19-Entity-Recognition uses scispaCy to locate entities in the COVID-19 Open Research Dataset (CORD-19) corpus and then searches these entities on Wikidata to recognize whether it is a possible symptom or a medication.
docker-compose -f docker-compose.yml up -d
To create an ElasticSearch index and index the documents in the dataset you can use the index_manager.py script.
python index_manager.py [-n NAME] [-d DATASET] [-r] [-c] --host HOST
python index_manager.py -n covid19-index -d covid19_dataset -c --host localhost
python index_manager.py -n covid19-index -r --host localhost
docker-compose -f docker-compose.yml down
To search the entities of the CORD-19 corpus we can use the script entity_recognition.py.
python entity_recognition.py -n INDEX_NAME --host HOST
The output of this script is a file called entities.txt with the list of entities ordered by number of occurrences in the paragraphs.
Using the script search.py we can search, once obtained the terms, which of those terms are a symptom or are a medication.
python search.py ENTITIES_FILE [-s] [-m]
python search.py entities.txt -s
python search.py entities.txt -m
time,1046
data,971
model,917
model-the',901
number,893
time-and,876
infected,746
covid,724
covid-,724
covid',724
covid-19,666
infection,588
cases,583
case,558
different,550
population,545
disease,500
epidemic,490
rate,463
individuals,456
[Original entity, Wikidata entity, Relevance]
cough,cough,19
confusion,mental confusion,18
fever,fever,15
coughing,cough,12
collapse,collapse,7
intermittent,intermittent claudication,4
inflammation,inflammation,3
fatigue,fatigue,3
nostrils,nostrils distended,3
contraction,uterine contraction,2
burnin,heartburn,1
[Original entity, Wikidata entity, Relevance]
veneto,venetoclax,10
serine,L-serine,4
arginine,L-Arginine,4
nitrogen,nitrous oxide,3
phyton,(E)-phytonadione,2
antibacterial,antibiotic,1
hydroxychloroquine,hydroxychloroquine,1
methionine,L-methionine,1
betax,betaxolol,0