Releases: proycon/colibri-core
Releases · proycon/colibri-core
v2.4.10
Important bugfix release:
- Fixes data-clipping bug on loading large corpora in memory (used by indexed patternmodels) #41
(All users are urged to upgrade!)
v2.4.8
- Minor update: made
setup.py
more robust for manual installation mode (without compiling C++ lib) (v2.4.7 was skipped)
v2.4.6
- fix: colibri-classencode
-t
(threshold) behaviour was wrong (was interpreted as +1)
v2.4.5
- Refactored alignment model
- added BasicPatternAlignmentModel
- Major cleanup of warnings and possible issues (thanks to @kosloot)
v2.4.4
- Bugfix: fixes covered token count per category/n (issue #26)
- New feature: colibri-patternmodeller has a
--simplereport
(-r
) option that generates a report without coverage information (more limited but a lot faster)
v2.4.3
v2.4.2 was prematurely released, one minor test was corrupt. Fixed now in this release.
v2.4.2
Bugfix release, fixes issue #25
v2.4.1
Minor fix release prior to paper publication:
- Python 2.7 compatibility fix
- Updated python tutorial
- Added benchmarks
v2.4.0
Various fixes:
- Speed up in ngrams() computation (issue #21)
- Performance fix for processing long lines
Pattern.instanceof()
should be faster and is now available from Python too
- Attempt to fix compilation issue on certain platforms (issue #22), unconfirmed
New features:
- Implemented new filtering mechanism that supports actively checking whether patterns are instances of a limited set of specified skipgrams, or a superset of specified ngrams.
- Implemented
ignorenewlines
option in class encoding. Useful if you have source text split by for instance sentences (one per line), but want a model that crosses sentence boundaries.
- Implemented vocabulary import for the class encoding stage (issue #2)