Release 0.99: Improve span merging, internal refactoring · explosion/spaCy

0.99
49aa9b3
Compare

Choose a tag to compare

Loading

View all tags

0.99: Improve span merging, internal refactoring

0.99
49aa9b3
Compare

Choose a tag to compare

Loading

View all tags

syllog1sm tagged this 08 Nov 15:47

* Merging multi-word tokens into one, via the doc.merge() and span.merge() methods, no longer invalidates existing Span objects. This makes it much easier to merge multiple spans, e.g. to merge all named entities, or all base noun phrases. Thanks to @andreasgrv for help on this patch.
* Lots of internal refactoring, especially around the machine learning module, thinc. The thinc API has now been improved, and the spacy._ml wrapper module is no longer necessary.
* The lemmatizer now lower-cases non-noun, noun-verb and non-adjective words.
* A new attribute, .rank, is added to Token and Lexeme objects, giving the frequency rank of the word.

Assets 2

Source code (zip)

2015-11-08T15:47:05Z
Source code (tar.gz)

2015-11-08T15:47:05Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly