Releases · J535D165/recordlinkage

28 Dec 12:57

J535D165

v0.9.0

2b4d1ac

Version 0.9.0 (21 June 2017)

A new index API. The new index API is no longer a single class
(recordlinkage.Pairs(...)) with all the functionality in it. The new API
is based on Tensorflow and FEBRL. With the new structure, it easier to
parallise the record linkage process. In future releases, this will be
implemented natively. See the reference page for more information and migrating. <http://recordlinkage.readthedocs.io/en/latest/ref-index.html>_
Significant speed improvement of the Sorted Neighbourhood Indexing
algorithm. Thanks to @perryvais (PR #32).
The function binary_comparisons is renamed. The new name of the function
is binary_vectors. Documentation added to RTD.
Added unit tests to test the generation of random comparison vectors.
Logging module added to separate module logs from user logs. The
implementation is based on Tensorflow.

Assets 4

27 Jan 10:54

J535D165

v0.8.1

d8c92f6

Version 0.8.1 (27 Jan 2017)

Issues solved with rendering docs on ReadTheDocs. Still not clear what is
going on with the autodoc_mock_imports in the sphinx conf.py file. Maybe
a bug in sphinx.
Move six to dependencies.
The reference part of the docs is split into separate subsections. This
makes the reference better readable.
The landing page of the docs is slightly changed.

Assets 3

23 Jan 12:34

J535D165

v0.8.0

bb374b5

Version 0.8.0 (22 Jan 2017)

Add additional arguments to the function that downloads and loads the
krebsregister data. The argument missing_values is used to fill missing
values. Default: nothing is done. The argument shuffle is used to
shuffle the records. Default is True.
Remove the lastest traces of the old package name. The new package name is
'Python Record Linkage Toolkit'
Better error messages when there are only matches or non-matches are passed
to train the classifier.
Add AirSpeedVelocity tests to test the performance.
Compare for deduplication fixed. It was broken.
Parameterized tests for the Compare class and its algorithms. Making use
of nose-parameterized module.
Update documentation about contributing.
Bugfix/improvement when blocking on multiple columns with missing values.
Fix bug #29. Package
not working with pandas 0.18 and 0.17. Dropped support pandas 0.17 and fixed
support for 0.18. Also added multi-dendency tests for TravisCI.
Support for dedicated deduplication algorithms
Special algorithm for full index in case of finding duplicates. Performce is
100x better.
Function max_number_of_pairs to get the maximum number of pairs.
low_memory for compare class.
Improved performance in case of comparing a large number of record pairs.
New documentation about custom algorithms
New documentation about the use of classifiers.
Possible to compare arrays and series directly without using labels.
Make a dataframe with random comparison vectors with the
binary_comparisons in the recordlinkage.datasets.random module.
Set KMeans cluster centers by hand.
Various documentation updates and improvements.
Jellyfish is now a required dependency. Fixes bug #30.
Added tox.ini to test packaging and installation of package.
Drop requirements.txt file.
Many small fixes and changes. Most of the changes cover the Compare
module. Especially label handling is improved.

Assets 3

09 Nov 11:30

J535D165

v0.7.2

c37f82f

Version 0.7.2 (9 Nov 2016)

v0.7.2

Bugfix in levenshtein algorithms

Assets 3

09 Nov 10:58

J535D165

v0.7.1

d24466d

Version 0.7.1 (9 Nov 2016)

v0.7.1

Improve importing workflow + dist bug fix

Assets 3

12 Oct 12:32

J535D165

v0.6.0

3c7c411

Version 0.6.0

This version includes the following updates:

Reformatting the code such that it follows PEP8.
Add Travis-CI and codecov support.
Switch to distributing wheels.
Fix bugs with depreciated pandas functions. __sub__ is no longer used for computing the difference of Index objects. It is now replaced by ``INDEX.difference(OTHER_INDEX).
Exclude pairs with NaN's on the index-key in Q-gram indexing.
Add tests for krebsregister dataset.
Fix Python3 bug on krebsregister dataset.
Improve unicode handling in phonetic encoding functions.
Strip accents with the clean function.
Add documentation
Bug for random indexing with incorrect arguments fixed and tests added.
Improved deployment workflow
And much more

Assets 3

09 Sep 12:13

J535D165

v0.5.0

149a734

Version 0.5.0 (9 Sep 2016)

Batch comparing added. Signifant speed improvement.
rldatasets are now included in the package itself.
Added an experimental gender imputation tool.
Blocking and SNI skip missing values
No longer need for different index names
FEBRL datasets included
Unit tests for indexing and comparing improved
Documentation updated

Assets 3

20 Aug 20:44

J535D165

v0.4.0

4faba70

Version 0.4.0 (20 Aug 2016)

Fixes a serious bug with deduplication (thanks to https://github.com/dserban).
Fixes undesired behaviour for sorted neighbourhood indexing with missing values.
Add new datasets to the package like Febrl datasets
Move Krebsregister dataset to this package.
Improve and add some tests
Various documentation updates

Assets 3

15 Jun 18:32

J535D165

v0.3.1

48fa135

Version 0.3.1: Fix installation bug

v0.3.1

Fix problems with installing with pip

Assets 4

11 Jun 11:36

J535D165

v0.3

561788c

Version 0.3 (11 June 2016)

This version contains a lot of changes to the API. Hopefully, there are no large API changes needed for now.

Total restructure of compare functions (The end of changing the API is close to now.)
Compare method numerical is now named numeric and fuzzy is now named string.
Add haversine formula to compare geographical records.
Use numexpr for computing numeric comparisons.
Add step, linear and squared comparing.
Add eye index method.
Improve, update and add new tests.
Remove iterative indexing functions.
New add chunks for indexing functions. These chunks are defined in the class Pairs. If chunks are defined, then the indexing functions returns a generator with an Index for each element.
Update documentation.
Various bug fixes.

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: J535D165/recordlinkage

Version 0.9.0 (21 June 2017)

Version 0.8.1 (27 Jan 2017)

Version 0.8.0 (22 Jan 2017)

Version 0.7.2 (9 Nov 2016)

Version 0.7.1 (9 Nov 2016)

Version 0.6.0

Version 0.5.0 (9 Sep 2016)

Version 0.4.0 (20 Aug 2016)

Version 0.3.1: Fix installation bug

Version 0.3 (11 June 2016)