Skip to content

Commit

Permalink
CRM-18842 Dedupe query: remove OR join in favour of more performant U…
Browse files Browse the repository at this point in the history
…NION

Unions are much faster than OR joins. This change took the length of the query to get the dedupes on a large database from
'as long as it took for the server to fall over' to less than one second on a small group of contacts

This query is only affecting one path - ie Individuals - at the moment as I can only extend that as fast as I can write tests.
  • Loading branch information
eileenmcnaughton committed May 3, 2016
1 parent 1cdbae9 commit 668f6e4
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 3 deletions.
24 changes: 24 additions & 0 deletions CRM/Dedupe/BAO/QueryBuilder.php
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,28 @@ public static function internalFilters($rg, $strID1 = 'contact1.id', $strID2 = '
}
}

/**
* If a contact list is specified then adjust the query to ensure one contact is in that list.
*
* Doing an OR join here will lead to a server-killing unindexed query. However, a union will
* perform better.
*
* @param array $contactList
* @param string $query
* @param string $strID1
* @param string $strID2
*
* @return string
*/
protected static function filterQueryByContactList(array $contactList, $query, $strID1 = 'contact1.id', $strID2 = 'contact2.id') {
if (empty($contactList)) {
return $query . " AND ($strID1 < $strID2)";
}
$contactIDs = implode(',', $contactList);
return "$query AND $strID1 IN ($contactIDs) AND $strID1 > $strID2
UNION $query AND $strID1 > $strID2 AND $strID2 IN ($contactIDs) AND $strID1 NOT IN ($contactIDs)
";

}

}
6 changes: 3 additions & 3 deletions CRM/Dedupe/BAO/QueryBuilder/IndividualSupervised.php
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ public static function record($rg) {
* @return array
*/
public static function internal($rg) {
$query = "
$query = self::filterQueryByContactList($rg->contactIds, "
SELECT contact1.id as id1, contact2.id as id2, {$rg->threshold} as weight
FROM civicrm_contact as contact1
JOIN civicrm_email as email1 ON email1.contact_id=contact1.id
Expand All @@ -63,8 +63,8 @@ public static function internal($rg) {
JOIN civicrm_email as email2 ON
email2.contact_id=contact2.id AND
email1.email=email2.email
WHERE contact1.contact_type = 'Individual'
AND " . self::internalFilters($rg);
WHERE contact1.contact_type = 'Individual'");

return array(
"civicrm_contact.{$rg->name}.{$rg->threshold}" => $query,
);
Expand Down

0 comments on commit 668f6e4

Please sign in to comment.