You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thank you for pandas-dedupe. This significantly simplified the use of the python library dedupe and has allowed me to make significant progress on a long-running set of data linkage problems I face in cultural non-profit CRM databases. Thank You.
Right now, I'm working with data coming out of Salesforce which includes Salesforce Ids. Based on the way I'm tackling the current set of records I want to use the Salesforce Ids as one part of my dedupe criteria for clustering. (If two records share the same SalesForce ID, they are the same record and should be clustered together.) Some of my SalesForce records will have several to dozens of plausible variations, that will be represented on different lines.
Unfortunately, Salesforce Ids are apparently coded in base-62 with the case of letters in Id columns being significant and important in their scheme.
By default, it appears that pandas-dedupe does a variety of cleanup on records. (Thank You). However, the switch to lower case breaks SalesForces ID columns.
As an enhancement request, it would be great if one could identify certain columns not to be cleaned up in your switch to lower case.
As a workaround, I'm creating a new numerical key for just the dataset I'm working with and then going to use it as an Exact data type.
The text was updated successfully, but these errors were encountered:
First of all, thank you for pandas-dedupe. This significantly simplified the use of the python library dedupe and has allowed me to make significant progress on a long-running set of data linkage problems I face in cultural non-profit CRM databases. Thank You.
Right now, I'm working with data coming out of Salesforce which includes Salesforce Ids. Based on the way I'm tackling the current set of records I want to use the Salesforce Ids as one part of my dedupe criteria for clustering. (If two records share the same SalesForce ID, they are the same record and should be clustered together.) Some of my SalesForce records will have several to dozens of plausible variations, that will be represented on different lines.
Unfortunately, Salesforce Ids are apparently coded in base-62 with the case of letters in Id columns being significant and important in their scheme.
By default, it appears that pandas-dedupe does a variety of cleanup on records. (Thank You). However, the switch to lower case breaks SalesForces ID columns.
As an enhancement request, it would be great if one could identify certain columns not to be cleaned up in your switch to lower case.
As a workaround, I'm creating a new numerical key for just the dataset I'm working with and then going to use it as an Exact data type.
The text was updated successfully, but these errors were encountered: