Releases: allenai/scispacy
v0.5.5
Support for python 3.12
This release adds support for python 3.12 by updating scipy and using nmslib-metabrainz rather than nmslib.
What's Changed
- Fix export_umls_json.py by @ethanhkim in #511
- Add support matrix for nmslib installation by @dakinggg in #524
- Update Dockerfile by @dakinggg in #525
- Support Python 3.12 via newer scipy and nmslib-metabrainz by @jason-nance in #523
- Add shorter version of pip installing nmslib from source by @svlandeg in #529
- Version bump by @dakinggg in #530
New Contributors
- @ethanhkim made their first contribution in #511
- @jason-nance made their first contribution in #523
- @svlandeg made their first contribution in #529
Full Changelog: v0.5.4...v0.5.5
v0.5.4
Update for spacy 3.7.x
What's Changed
- Fixes #485 Project Page URL in setup.py by @sajedjalil in #495
- add progress bar to http_get by @WeixiongLin in #499
- Update for spacy 3.7 compatibility by @dakinggg in #507
- Update publish workflow to trusted publisher by @dakinggg in #508
New Contributors
- @sajedjalil made their first contribution in #495
- @WeixiongLin made their first contribution in #499
Full Changelog: v0.5.3...v0.5.4
Version 0.5.3
Retrains the models with spacy 3.6.x to be compatible with the latest spacy version
What's Changed
- Update README.md by @dakinggg in #476
- Update EntityLinker docstring by @andyjessen in #472
- Support UMLS filtering by language (Solves #477) by @nachollorca in #478
- Add a note about make_serializable argument by @JohnGiorgi in #484
- Drop umls and umls_ents attributes in linker by @JohnGiorgi in #489
- Updating nmslib hyperparameters guide url by @kaushikacharya in #493
- Update to latest spacy version by @dakinggg in #494
New Contributors
- @nachollorca made their first contribution in #478
- @JohnGiorgi made their first contribution in #484
Full Changelog: v0.5.2...v0.5.3
v0.5.2
This release includes an update of the entity linkers to use the latest UMLS release (2022AB), which includes information about newer entities like COVID-19.
In [10]: doc = nlp("COVID-19 is a global pandemic.")
In [11]: linker = nlp.get_pipe('scispacy_linker')
In [12]: linker.kb.cui_to_entity[doc.ents[0]._.kb_ents[0][0]]
Out[12]:
CUI: C5203670, Name: COVID19 (disease)
Definition: A viral disorder generally characterized by high FEVER; COUGH; DYSPNEA; CHILLS; PERSISTENT TREMOR; MUSCLE PAIN; HEADACHE; SORE THROAT; a new loss of taste and/or smell (see AGEUSIA and ANOSMIA) and other symptoms of a VIRAL PNEUMONIA. In severe cases, a myriad of coagulopathy associated symptoms often correlating with COVID-19 severity is seen (e.g., BLOOD COAGULATION; THROMBOSIS; ACUTE RESPIRATORY DISTRESS SYNDROME; SEIZURES; HEART ATTACK; STROKE; multiple CEREBRAL INFARCTIONS; KIDNEY FAILURE; catastrophic ANTIPHOSPHOLIPID ANTIBODY SYNDROME and/or DISSEMINATED INTRAVASCULAR COAGULATION). In younger patients, rare inflammatory syndromes are sometimes associated with COVID-19 (e.g., atypical KAWASAKI SYNDROME; TOXIC SHOCK SYNDROME; pediatric multisystem inflammatory disease; and CYTOKINE STORM SYNDROME). A coronavirus, SARS-CoV-2, in the genus BETACORONAVIRUS is the causative agent.
TUI(s): T047
Aliases (abbreviated, total: 47):
2019 Novel Coronavirus Infection, SARS-CoV-2 Disease, Human Coronavirus 2019 Infection, SARS-CoV-2 Infection, Disease caused by severe acute respiratory syndrome coronavirus 2 (disorder), Disease caused by SARS-CoV-2, 2019 nCoV Disease, 2019 Novel Coronavirus Disease, COVID-19 Virus Disease, Virus Disease, COVID-19
It also includes a small bug fix to the abbreviation detector.
Note: The models (e.g. en_core_sci_sm
) are still labeled as version v0.5.1
, as this release did not involve retraining the base models, only the entity linkers.
What's Changed
- Fix typo by @andyjessen in #453
- Update README.md by @dakinggg in #456
- Update to the latest UMLS version by @dakinggg in #474
New Contributors
- @andyjessen made their first contribution in #453
Full Changelog: v0.5.1...v0.5.2
Version 0.5.1
Retrains the models with spacy 3.4.x to be compatible with the latest spacy version
Release v0.5.0
Updates scispacy to be compatiable with the latest spacy version (3.2.3)
Scispacy 0.4.0 - Compatible with Spacy 3
This release of scispacy is compatible with Spacy 3. It also includes a new model 🥳 , en_core_sci_scibert
, which uses scibert base uncased to do parsing and POS tagging (but not NER, yet. This will come in a later release).
Version 0.3.0
New Features
Hearst Patterns
This component implements Automatic Aquisition of Hyponyms from Large Text Corpora using the SpaCy Matcher component.
Passing extended=True
to the HyponymDetector
will use the extended set of hearst patterns, which include higher recall but lower precision hyponymy relations (e.g X compared to Y, X similar to Y, etc).
This component produces a doc level attribute on the spacy doc: doc._.hearst_patterns
, which is a list containing tuples of extracted hyponym pairs. The tuples contain:
- The relation rule used to extract the hyponym (type:
str
) - The more general concept (type:
spacy.Span
) - The more specific concept (type:
spacy.Span
)
Usage:
import spacy
from scispacy.hyponym_detector import HyponymDetector
nlp = spacy.load("en_core_sci_sm")
hyponym_pipe = HyponymDetector(nlp, extended=True)
nlp.add_pipe(hyponym_pipe, last=True)
doc = nlp("Keystone plant species such as fig trees are good for the soil.")
print(doc._.hearst_patterns)
>>> [('such_as', Keystone plant species, fig trees)]
Ontonotes Mixin: Clear Format > UD
Thanks to Yoav Goldberg for this fix! Yoav noticed that the dependency labels for the Onotonotes data use a different format than the converted GENIA Trees. Yoav wrote some scripts to convert between them, including normalising of some syntactic phenomena that were being treated inconsistently between the two corpora.
Bug Fixes
#252 - removed duplicated aliases in the entity linkers, reducing the size of the UMLS linker by ~10%
#249 - fix the path to the rxnorm linker
Version 0.2.5
New Features 🥇
New Models
- Models compatible with Spacy 2.3.0 🥳
Entity Linkers
-
Updated the UMLS KB to use the 2020AA release, categories 0,1,2,9.
-
umls
: Links to the Unified Medical Language System, levels 0,1,2 and 9. This has ~3M concepts. -
mesh
: Links to the Medical Subject Headings. This contains a smaller set of higher quality entities, which are used for indexing in Pubmed. MeSH contains ~30k entities. NOTE: The MeSH KB is derrived directly from MeSH itself, and as such uses different unique identifiers than the other KBs. -
rxnorm
: Links to the RxNorm ontology. RxNorm contains ~100k concepts focused on normalized names for clinical drugs. It is comprised of several other drug vocabularies commonly used in pharmacy management and drug interaction, including First Databank, Micromedex, and the Gold Standard Drug Database. -
go
: Links to the Gene Ontology. The Gene Ontology contains ~67k concepts focused on the functions of genes. -
hpo
: Links to the Human Phenotype Ontology. The Human Phenotype Ontology contains 16k concepts focused on phenotypic abnormalities encountered in human disease.
Bug Fixes 🐛
#217 - Fixes a bug in the Abbreviation detector
API Changes
- Entity Linkers now modify the
Span._.kb_ents
rather than theSpan._.umls_ents
to reflect the fact that we now have more than one entity linker.Span._.umls_ents
will be deprecated in v1.0.