2.0.3

Release date 21/03/2022

#384: Unpin tokenizers to allow users to upgrade to latest transformers

2.0.2

Release date 01/01/2022

#379: Fixes a problem in the required dependencies, moving flake8 and black to tests

2.0.1

Release date; 24/11/2021

#376: Fixes a bug with relative import of EPMC API client

2.0.0

Release date: 05/11/2021

Improvements:

#328: Expose build_model to CNNClassifier and don't rebuild in fit
#372: Add more fine grained extras to make vanilla WellcomeML light (109MB)
#368: Expand LOGGING_LEVEL to TF_CPP_MIN_LOG_LEVEL
#332: Add wellcomeml.viz to vizualize clusters
#344: Add filter by variable in visualize clusters

Breaking changes:

#371: Delete wellcomeml.ml.__init__ so all ml models need to be explicitly imported

Bug fixes:

#357: Fix sent2vec test error

1.2.1

Release date: 05/08/2021

Bug fixes:

#346: Fixes problem with tf-idf vectoriser lemmatising twice
#337: Pin spacy to fix problem with enity-linking test

1.2.0

Release date: 22/07/2021

Improvements:

#327: Adds verbose and tensorboard_log_path to CNN and SemanticEquivalenceClassifier
#315: Implements decode to KerasTokenizer and TransformersTokenizer.
#308: Adds EPMCClient to download data from EPMC
#283: Disable umap to make imports in wellcomeml faster
#279: Extend LOGGING_LEVEL env variable to control more libraries logger
#306: Break down deep-learning extra to tensorflow,torch,spacy for more control over what's installed
#289: Re-factors frequency vectoriser saving function to make it more efficient

Bug fixes:

#313: Fix concat feature_approach in CNN
#297: Fix OOM error in CNN predict
#292: Fix clustering pipeline when umap used and input length > 4096

1.1.0

Release date: 26/04/2021

Improvements

#140: Additions to text clustering, exposing each step of the pipeline and adding load/save
#272: Spacy lemmatiser speed greatly improved
#255: Improve memory efficiency of TransformersTokenizer
#245: Voting classifier extras (better input flexibility, parameter for how many models should agree, etc)
General improvements to continuous testing pipelines

Bug fixes:

#233: Fix CNN's dataset generator
#240: Fix multiGPU in SemanticEquivalenceBer
#237: Fix KerasVectorizer return length

1.0.2

Release date: 25/02/2021

Due to an error, some of the improvements in the previous release weren't built in the PyPI whl.

#233 CNN predict when X tf.data.Dataset
#214 Configurable logs
#204 Efficient CNN enabling tensorflow datasets as input
#193 TransfomersTokenizer class

Bug fixes:

Disables HDBScan for clustering, to solve #197.

1.0.1

Release date: 24/02/2021

From now on we will be using Semantic Versioning, instead of the previous CalVer system. Previous versions will still be installabe by pinning it via pip (e.g. pip install wellcomeml==2021.2.0), but won't be listed in the PyPI registry anymore. See discussion here for more details.

Improvements

#211 Batch prediction for wellcomeml.ml.SemanticSimilarityClassifier
~~#214 Configurable logs~~ only avaliable in 1.0.2
~~#204 Efficient CNN enabling tensorflow datasets as input~~ only available in 1.0.2
~~#193 TransfomersTokenizer class~~ only available in 1.0.2

Bug fixes:

~~Disables HDBScan for clustering, to solve #197.~~

Notice: for a technical reason, we are starting with 1.0.1 and not 1.0.0

2021.2.1

Bug fixes:

Update version to 2021.2.1 which was out of sync in latest release #206

2021.2.0

Major changes

Upgrade spacy to v3.0
Add native HuggingFace support (#191), re-writting BertClassifier using transformers
Disables HDBscan from the possible clustering techniques due to a conflict with the new numpy version (#197)

Bug fixes

Resolves issues #195 and #198 with thew pip reference resolver, introduced in pip>20.3

2021.1.1

Major changes

Upgrade spacy to 2.3, transformers to < 2.9 and spacy-tranformers to 0.6
Transformers functionality (e.g. SemanticEquivalenceClassifier and SemanticEquivalenceClassifier) are now enabled automatically with extras 'deep-learning'

Bug fixes

#173 Fix BertVectorizer for long sequences

2020.11.1

Bug fixes

#179 Dataclasses dependency version conflict coming from spacy-transformers
#177 BertClassifier path for scibert when fine tuning not working

2020.11.0

Models

CNN/BiLSTM

predict_proba
threshold param default 0.5
learning_rate decay param default 1
run training on multiple GPUs if more than one GPUs available
early_stopping param default false
sparse_y param default false
save and load methods

SemanticEquivalenceClassifier

callbacks param with default tensorboard and minibatch history
dropout_rate param default 0.1
dropout param default true
batch_norm param default true

KerasVectorizer

accepts gensim compatible word vector names and downloads and caches them

Metrics

f1 loss and metric

Bug fixes

#157 about saving CNN/BiLSTM with attention layer
#80 that fixes indentation and other doc issues
#134 which allows to use tsne as reducer in clustering

2020.9.0

Models

Add param l2, dense_size, attention_heads, metrics, callbacks, feature_approach to cnn and bilstm classifiers
Faster predictions in BertClassifier through the use of spacy's pipe
Adds TextClustering

2020.07.1

Bugs

Fixes pypi conflict by pinning down dependencies.

2020.07.0

Models

Adds Doc2VecVectorizer
Adds WellcomeVotingClassifier
Adds Sent2VecVectorizer
Adds SemanticEquivalenceMetaClassifier
Adds CategoricalMetrics and MetricMiniBatchHistory

Datasets

Adds CONLL dataset
Adds Winer dataset

Features

Automatically load models like en_core_web_sm and en_trf_bertbaseuncased_lg but also download packages like sent2vec, only when needed
Adds docs based on sphinx and read the docs

Repo

Adds pep8 / flake8 checks and address violations
Adds badges for build, codecov and license
Adds pull request template that forces link to issue or trello

Bugs

Fix dependency on non pypi packages for tests
Pin spacy transformers to 0.5.1
Fix codecov running in separate travis venv

2020.5.1

ML

Add l2 and validation_split to BertClassifier
Add kwargs and load model to semantic similarity

Extras

Add command line interface to download models
Removed models from deployment
Moved package to pypi

2020.5.0

ML

Add CNNClassifier
Add BiLSTMClassifier
Add attention layers
Add Semantic equivalence classifier
Add embedding based entity linker

Datasets

Add Hoc dataset

2020.4.0

Add partial_fit to BERTClassifier
Add mean_last_four embedding to BertVectorizer
Use nlp.pipe for prediction as its quicker
Add generator to transform data on demand for spacy to reduce memory usage
Add multilabel and architecture parameter in SpacyClassifier
Modify SpacyClassifier to accept sparse Y for multilabel classification
Add pretrain_vectors_path parameters to SpacyClassifier
Add speed metric to SpacyClassifier and BertClassifier
Fix tests in BertClassifier to check for loss reduction after 5 iterations

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

2.0.3

2.0.2

2.0.1

2.0.0

1.2.1

1.2.0

1.1.0

1.0.2

1.0.1

2021.2.1

2021.2.0

Major changes

Bug fixes

2021.1.1

Major changes

Bug fixes

2020.11.1

Bug fixes

2020.11.0

Models

CNN/BiLSTM

SemanticEquivalenceClassifier

KerasVectorizer

Metrics

Bug fixes

2020.9.0

Models

2020.07.1

Bugs

2020.07.0

Models

Datasets

Features

Repo

Bugs

2020.5.1

ML

Extras

2020.5.0

ML

Datasets

2020.4.0