Skip to content

Commit

Permalink
Merge branch 'new-compare-api'
Browse files Browse the repository at this point in the history
  • Loading branch information
J535D165 committed Aug 24, 2017
2 parents 5feedb7 + 9577d0b commit 7b89603
Show file tree
Hide file tree
Showing 17 changed files with 2,475 additions and 1,064 deletions.
11 changes: 4 additions & 7 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,15 +50,12 @@ are returned.
block_class = recordlinkage.BlockIndex('surname')
candidate_links = block_class.index(df_a, df_b)
**Older versions of Python Record Linkage Toolkit use a different syntax for
indexing.** `More info about migrating can be found here. <http://recordlinkage.readthedocs.io/en/latest/ref-index.html#migrating>`_

For each candidate link, compare the records with one of the
comparison or similarity algorithms in the Compare class.

.. code:: python
c = recordlinkage.Compare(candidate_links, df_a, df_b)
c = recordlinkage.Compare()
c.string('name_a', 'name_b', method='jarowinkler', threshold=0.85)
c.exact('sex', 'gender')
Expand All @@ -68,7 +65,7 @@ comparison or similarity algorithms in the Compare class.
c.numeric('income', 'income', method='gauss', offset=3, scale=3, missing_value=0.5)
# The comparison vectors
c.vectors
feature_vectors = c.compute(candidate_links, df_a, df_b)
This Python Record Linkage Toolkit contains multiple classification algorithms.
Plenty of the algorithms do need training data (supervised learning) while
Expand All @@ -79,14 +76,14 @@ others are unsupervised. An example of supervised learning:
logrg = recordlinkage.LogisticRegressionClassifier()
logrg.learn(TRAINING_COMPARISON_VECTORS, TRAINING_CLASSES)
logrg.predict(c.vectors)
logrg.predict(feature_vectors)
and an example of unsupervised learning (the well known ECM-algorithm):

.. code:: python
ecm = recordlinkage.ECMClassifier()
ecm.learn(c.vectors)
ecm.learn(feature_vectors)
Main Features
-------------
Expand Down
10 changes: 5 additions & 5 deletions docs/about.rst
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ and similarity measures.

.. code:: python
compare = recordlinkage.Compare(candidate_links, df_a, df_b)
compare = recordlinkage.Compare()
compare.string('name', 'name', method='jarowinkler', threshold=0.85)
compare.exact('sex', 'gender')
Expand All @@ -117,7 +117,7 @@ and similarity measures.
compare.exact('haircolor', 'haircolor', missing_value=9)
# The comparison vectors
compare.vectors
compare_vectors = compare.compute(candidate_links, df_a, df_b)
This record linkage package contains several classification algorithms.
Plenty of the algorithms need trainings data (supervised learning) while
Expand All @@ -128,15 +128,15 @@ some others are unsupervised. An example of supervised learning:
true_linkage = pandas.Series(YOUR_GOLDEN_DATA, index=pandas.MultiIndex(YOUR_MULTI_INDEX))
logrg = recordlinkage.LogisticRegressionClassifier()
logrg.learn(compare.vectors[true_linkage.index], true_linkage)
logrg.learn(compare_vectors[true_linkage.index], true_linkage)
logrg.predict(compare.vectors)
logrg.predict(compare_vectors)
and an example of unsupervised learning (the well known ECM-algorithm):

.. code:: python
ecm = recordlinkage.BernoulliEMClassifier()
ecm.learn(compare.vectors)
ecm.learn(compare_vectors)
5 changes: 5 additions & 0 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@
Release notes
*************

Version 0.10.0
==============

- A new compare API.

Version 0.9.0
=============

Expand Down
84 changes: 22 additions & 62 deletions docs/notebooks/classifiers.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,7 @@
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -284,9 +282,7 @@
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -454,9 +450,7 @@
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"golden_pairs = krebs_data[0:5000]\n",
Expand All @@ -474,9 +468,7 @@
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stdout",
Expand All @@ -502,9 +494,7 @@
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand All @@ -527,9 +517,7 @@
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand All @@ -551,9 +539,7 @@
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -588,9 +574,7 @@
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stdout",
Expand All @@ -614,9 +598,7 @@
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -655,9 +637,7 @@
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand All @@ -684,9 +664,7 @@
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand All @@ -708,9 +686,7 @@
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -740,9 +716,7 @@
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand All @@ -769,9 +743,7 @@
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand All @@ -793,9 +765,7 @@
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -833,9 +803,7 @@
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -866,9 +834,7 @@
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand Down Expand Up @@ -898,9 +864,7 @@
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand All @@ -924,9 +888,7 @@
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand All @@ -948,9 +910,7 @@
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"data": {
Expand All @@ -971,7 +931,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Anaconda Python 3",
"language": "python",
"name": "python3"
},
Expand All @@ -989,5 +949,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 1
}
Loading

0 comments on commit 7b89603

Please sign in to comment.