Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ACL contact cache building performance by splitting insert and select #15480

Conversation

seamuslee001
Copy link
Contributor

@seamuslee001 seamuslee001 commented Oct 10, 2019

Overview

This aims to improve the performance of the cache building by splitting out the finding of contacts from the actual insertion. We have been finding that building cache has been locking tables such as the group_contact table for too long causing other processes to fail.

Before

Tables being locked for too long

After

Table locks improved

Technical Details

MySQL docs recommend

If you use INSERT INTO … SELECT to copy some or all rows from one table to another, consider using a lesser locking transaction isolation level (e.g. READ_COMMITTED) and set the binary log format to row/mixed for that transaction. Alternatively, design your application to put a single INSERT statement in a loop and copy row(s) into the table.

Source https://www.percona.com/community-blog/2018/09/24/minimize-mysql-deadlocks-3-steps/

I believe we hae sufficient tests in this area to prove if this works or not

ping @eileenmcnaughton @totten

@civibot civibot bot added the master label Oct 10, 2019
@civibot
Copy link

civibot bot commented Oct 10, 2019

(Standard links)

@seamuslee001
Copy link
Contributor Author

ping @johntwyman

@seamuslee001 seamuslee001 force-pushed the improve_acl_insert_performance branch from eaa7e56 to 393372a Compare October 10, 2019 16:23
CRM_Core_DAO::executeQuery("
INSERT INTO civicrm_acl_contact_cache ( user_id, contact_id, operation )
SELECT DISTINCT $userID as user_id, contact_a.id as contact_id, '{$operation}' as operation
$contats = CRM_Core_DAO::excuteQuery("SELECT DISTINCT $userID as user_id, contact_a.id as contact_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seamuslee001 typos.. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dang fixed @mattwire

@seamuslee001 seamuslee001 force-pushed the improve_acl_insert_performance branch from 393372a to bcfc188 Compare October 10, 2019 17:08
@seamuslee001 seamuslee001 force-pushed the improve_acl_insert_performance branch from bcfc188 to 697f356 Compare October 10, 2019 17:09
@mattwire
Copy link
Contributor

@seamuslee001 @johntwyman @eileenmcnaughton @totten I agree with this change... because I've already made it :-) However, instead of duplicating effort could you take a look at https://github.com/mattwire/civicrm-groupaclcache That's been working well on quite a few sites that had deadlock issues but it is nominally version specific (5.13) because it overrides 3 core files. I should get a bit more funding to work on it soon, however we are pretty much at the point where we need to do a big replace (with what I've done) rather than incremental bits. With a bit of luck this diff will apply cleanly to latest: https://github.com/mattwire/civicrm-groupaclcache/blob/master/patches/master...mattwire:refactor_groupaclcache.diff

@seamuslee001
Copy link
Contributor Author

@mattwire how is doing the insert into a temporary table any more performance, the docs suggest looping the insertion as i am doing here i would have thought that would be the more performant way?

@mattwire
Copy link
Contributor

@mattwire how is doing the insert into a temporary table any more performance, the docs suggest looping the insertion as i am doing here i would have thought that would be the more performant way?

@seamuslee001 I'm not sure which is bets for performance. However this code (rightly or wrongly) gets called a lot and you can easily end up more than one process trying to update the tables. By looping and running an INSERT for each contact we lock and unlock the table multiple times, and can end up (if there are deadlocks etc) with a half-finished update. By writing all the changes to a temporary table and bulk copying them across using a single INSERT->SELECT. Frustratingly I've used the slower INSERT IGNORE because it seems that race conditions can often cause duplicates to appear and we don't want to trigger an exception if the value we are adding already exists.

@seamuslee001
Copy link
Contributor Author

@mattwire i suppose the potential answer is to run that part in a transcation right? The biggest problem i think for us is because we use the multisite extension it does a LEFT JOIN onto the group_contact table which can be pretty slow on things. a sample query of ours is

INSERT INTO civicrm_acl_contact_cache ( user_id, contact_id, operation )
SELECT DISTINCT 153312 as user_id, contact_a.id as contact_id, 'View' as operation
FROM civicrm_contact contact_a
LEFT JOIN civicrm_group_contact multisiteGroupTable ON contact_a.id = multisiteGroupTable.contact_id
LEFT JOIN civicrm_acl_contact_cache ac ON ac.user_id = 153312 AND ac.contact_id = contact_a.id AND ac.operation = 'View'
WHERE (multisiteGroupTable.group_id IN (116) AND multisiteGroupTable.status IN ('Added') ) AND (contact_a.is_deleted = 0)
AND ac.user_id IS NULL

@mattwire
Copy link
Contributor

By the way I need to switch over the temp table to use CRM_Utils_SQL_TempTable

@eileenmcnaughton
Copy link
Contributor

@davejenx this might interest you

@seamuslee001
Copy link
Contributor Author

@mattwire we have also considered doing it like this

    $from = CRM_Contact_BAO_Query::fromClause($whereTables);
    $values = '';
    $valuesKey = 2;
    $params = [1 => [$operation, 'String']];
    $contacts = CRM_Core_DAO::executeQuery("SELECT DISTINCT $userID as user_id, contact_a.id as contact_id
         $from
         LEFT JOIN civicrm_acl_contact_cache ac ON ac.user_id = $userID AND ac.contact_id = contact_a.id AND ac.operation = '{$operation}'
WHERE    $permission
AND ac.user_id IS NULL");
    while ($contacts->fetch()) {
      if (!empty($values)) {
        $values .= ', ';
      }
      $key1 = $valuesKey;
      $valuesKey++;
      $key2 = $valuesKey;
      $valuesKey++;
      $values .= "(%{$key1}, %{$key2}, %1)";
      $params[$key1] = [$contacts->user_id, 'Positive'];
      $params[$key2] = [$contacts->contact_id, 'Positive'];
    }
    CRM_Core_DAO::executeQuery("INSERT INTO civicrm_acl_contact_cache ( user_id, contact_id, operation ) VALUES " . $values, $params);

Which puts more of the issue onto php layer

@artfulrobot
Copy link
Contributor

Has anyone looked into what EXPLAIN says about the slow SELECT?

Also why LEFT JOIN the multisite, looks like an INNER JOIN, no?

And I often find that WHERE NOT EXISTS is more performant than LEFT JOIN...WHERE leftjoined.field IS NULL.

On the php insert loop, you could use a prepared statement if you didn't mind stepping away from the pear db layer. You could also trust mysql to generate valid SQL safe data instead of using the params validation on what's already there.

I haven't looked in detail at these though.

@johntwyman
Copy link
Contributor

I've had a brief look at the EXPLAIN statement. The largest cost component is multisiteGroupTable. In our case, the query planner sees UI_contact_group and FK_civicrm_group_contact_group_id among the possible indexes. It's picking FK_civicrm_group_contact_group_id.

All that said, it's not like it's a particularly expensive query. The issue is the locking that occurs because we're doing INSERT INTO...SELECT. From everything I've read I have the sense our options available are:

  • Re-factor as @seamuslee001 has done to separate the SELECT out, then generate multiple INSERT statements
  • Separate the SELECT out, use PHP to construct one larger INSERT statement
  • Set the isolation level to READ COMMITTED for this transaction (ref #1, #2)

Performance of the first option might not be fantastic. The second might max out PHP memory or the MySQL query limit (however unlikely?).

I won't pretend I fully understand the consequences of the last option. However I do note (from the first ref):

Only row-based binary logging is supported with the READ COMMITTED isolation level. If you use READ COMMITTED with binlog_format=MIXED, the server automatically uses row-based logging.

I believe binary logging is enabled by default in standard MySQL installations. So maybe this isn't a problem.

@artfulrobot
Copy link
Contributor

Another issue with multiple inserts was mentioned by @mattwire but I'll expand:

  • If you don't use a transaction, it could fail half way through and leave the cache partially updated.
  • If you do use a transaction, you'll have more deadlocks and I imagine it would be worse than the INSERT ... SELECT as it would be slower, too.

I still think optimising the query and applying locks (may need some experimentation) is the best way to go. The faster the query can run, the less chance of deadlocks with other simultaneous processes. I believe this is why @mattwire's has found that the temp table improves things (by the way, Matt, have you tried locking tables then deleting from the queue table before insertion to avoid having to use INSERT IGNORE?)

I don't have a site that uses ACLs with enough contacts to be a useful measure to test. Nb. in case it's helpful, with my recent queue patch I tried various locking strategies before settling on one that's pretty full on, but works safely under high load - but only because my queries within the lock are fast.

@JoeMurray
Copy link
Contributor

FWIW, Monish used a reapplied version of @mattwire 's patch https://gist.github.com/monishdeb/99bf7421de5b5bb24edf28abb7907442 and so far our site has been stable for a day, and we don't see zombie mysql processes asleep.

@mattwire
Copy link
Contributor

@JoeMurray Which civi version did you reapply it against?

@JoeMurray
Copy link
Contributor

5.14.0

@mattwire mattwire changed the title Improve ACL contact cache building performance by splitting insert an… Improve ACL contact cache building performance by splitting insert and select Nov 1, 2019
@mattwire
Copy link
Contributor

mattwire commented Nov 1, 2019

@seamuslee001 I think this PR can now be closed as it's superseded?

@yashodha
Copy link
Contributor

@mattwire do we have PR that replaces this?

@eileenmcnaughton eileenmcnaughton deleted the improve_acl_insert_performance branch November 13, 2019 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants