Skip to content

Releases: allenai/scispacy

v0.5.5

27 Oct 05:42
b5687f5
Compare
Choose a tag to compare

Support for python 3.12

This release adds support for python 3.12 by updating scipy and using nmslib-metabrainz rather than nmslib.

What's Changed

New Contributors

Full Changelog: v0.5.4...v0.5.5

v0.5.4

08 Mar 05:57
29b1e46
Compare
Choose a tag to compare

Update for spacy 3.7.x

What's Changed

New Contributors

Full Changelog: v0.5.3...v0.5.4

Version 0.5.3

30 Sep 19:50
7da5117
Compare
Choose a tag to compare

Retrains the models with spacy 3.6.x to be compatible with the latest spacy version

What's Changed

New Contributors

Full Changelog: v0.5.2...v0.5.3

v0.5.2

29 Apr 21:21
5368cc3
Compare
Choose a tag to compare

This release includes an update of the entity linkers to use the latest UMLS release (2022AB), which includes information about newer entities like COVID-19.

In [10]: doc = nlp("COVID-19 is a global pandemic.")

In [11]: linker = nlp.get_pipe('scispacy_linker')

In [12]: linker.kb.cui_to_entity[doc.ents[0]._.kb_ents[0][0]]
Out[12]:
CUI: C5203670, Name: COVID19 (disease)
Definition: A viral disorder generally characterized by high FEVER; COUGH; DYSPNEA; CHILLS; PERSISTENT TREMOR; MUSCLE PAIN; HEADACHE; SORE THROAT; a new loss of taste and/or smell (see AGEUSIA and ANOSMIA) and other symptoms of a VIRAL PNEUMONIA. In severe cases, a myriad of coagulopathy associated symptoms often correlating with COVID-19 severity is seen (e.g., BLOOD COAGULATION; THROMBOSIS; ACUTE RESPIRATORY DISTRESS SYNDROME; SEIZURES; HEART ATTACK; STROKE; multiple CEREBRAL INFARCTIONS; KIDNEY FAILURE; catastrophic ANTIPHOSPHOLIPID ANTIBODY SYNDROME and/or DISSEMINATED INTRAVASCULAR COAGULATION). In younger patients, rare inflammatory syndromes are sometimes associated with COVID-19 (e.g., atypical KAWASAKI SYNDROME; TOXIC SHOCK SYNDROME; pediatric multisystem inflammatory disease; and CYTOKINE STORM SYNDROME). A coronavirus, SARS-CoV-2, in the genus BETACORONAVIRUS is the causative agent.
TUI(s): T047
Aliases (abbreviated, total: 47):
         2019 Novel Coronavirus Infection, SARS-CoV-2 Disease, Human Coronavirus 2019 Infection, SARS-CoV-2 Infection, Disease caused by severe acute respiratory syndrome coronavirus 2 (disorder), Disease caused by SARS-CoV-2, 2019 nCoV Disease, 2019 Novel Coronavirus Disease, COVID-19 Virus Disease, Virus Disease, COVID-19

It also includes a small bug fix to the abbreviation detector.

Note: The models (e.g. en_core_sci_sm) are still labeled as version v0.5.1, as this release did not involve retraining the base models, only the entity linkers.

What's Changed

New Contributors

Full Changelog: v0.5.1...v0.5.2

Version 0.5.1

07 Sep 00:26
e30b8f4
Compare
Choose a tag to compare

Retrains the models with spacy 3.4.x to be compatible with the latest spacy version

Release v0.5.0

10 Mar 20:15
cc1a717
Compare
Choose a tag to compare

Updates scispacy to be compatiable with the latest spacy version (3.2.3)

Scispacy 0.4.0 - Compatible with Spacy 3

12 Feb 22:55
aad640f
Compare
Choose a tag to compare

This release of scispacy is compatible with Spacy 3. It also includes a new model 🥳 , en_core_sci_scibert, which uses scibert base uncased to do parsing and POS tagging (but not NER, yet. This will come in a later release).

Version 0.3.0

16 Oct 17:13
1b456f5
Compare
Choose a tag to compare

New Features

Hearst Patterns

This component implements Automatic Aquisition of Hyponyms from Large Text Corpora using the SpaCy Matcher component.

Passing extended=True to the HyponymDetector will use the extended set of hearst patterns, which include higher recall but lower precision hyponymy relations (e.g X compared to Y, X similar to Y, etc).

This component produces a doc level attribute on the spacy doc: doc._.hearst_patterns, which is a list containing tuples of extracted hyponym pairs. The tuples contain:

  • The relation rule used to extract the hyponym (type: str)
  • The more general concept (type: spacy.Span)
  • The more specific concept (type: spacy.Span)

Usage:

import spacy
from scispacy.hyponym_detector import HyponymDetector

nlp = spacy.load("en_core_sci_sm")
hyponym_pipe = HyponymDetector(nlp, extended=True)
nlp.add_pipe(hyponym_pipe, last=True)

doc = nlp("Keystone plant species such as fig trees are good for the soil.")

print(doc._.hearst_patterns)
>>> [('such_as', Keystone plant species, fig trees)]

Ontonotes Mixin: Clear Format > UD

Thanks to Yoav Goldberg for this fix! Yoav noticed that the dependency labels for the Onotonotes data use a different format than the converted GENIA Trees. Yoav wrote some scripts to convert between them, including normalising of some syntactic phenomena that were being treated inconsistently between the two corpora.

Bug Fixes

#252 - removed duplicated aliases in the entity linkers, reducing the size of the UMLS linker by ~10%
#249 - fix the path to the rxnorm linker

Version 0.2.5

08 Jul 16:12
Compare
Choose a tag to compare

New Features 🥇

New Models

  • Models compatible with Spacy 2.3.0 🥳

Entity Linkers

#246, #233

  • Updated the UMLS KB to use the 2020AA release, categories 0,1,2,9.

  • umls: Links to the Unified Medical Language System, levels 0,1,2 and 9. This has ~3M concepts.

  • mesh: Links to the Medical Subject Headings. This contains a smaller set of higher quality entities, which are used for indexing in Pubmed. MeSH contains ~30k entities. NOTE: The MeSH KB is derrived directly from MeSH itself, and as such uses different unique identifiers than the other KBs.

  • rxnorm: Links to the RxNorm ontology. RxNorm contains ~100k concepts focused on normalized names for clinical drugs. It is comprised of several other drug vocabularies commonly used in pharmacy management and drug interaction, including First Databank, Micromedex, and the Gold Standard Drug Database.

  • go: Links to the Gene Ontology. The Gene Ontology contains ~67k concepts focused on the functions of genes.

  • hpo: Links to the Human Phenotype Ontology. The Human Phenotype Ontology contains 16k concepts focused on phenotypic abnormalities encountered in human disease.

Bug Fixes 🐛

#217 - Fixes a bug in the Abbreviation detector

API Changes

  • Entity Linkers now modify the Span._.kb_ents rather than the Span._.umls_ents to reflect the fact that we now have more than one entity linker. Span._.umls_ents will be deprecated in v1.0.

v0.2.4

23 Oct 02:12
Compare
Choose a tag to compare

Retrains the models to be compatible with spacy 2.2.1 and rewrites the optional sentence splitting pipe to use pysbd. This pipe is experimental at this point and may be rough around the edges.