[REF] Refactor Smart Group Cache population code to be less intensive #15588

seamuslee001 · 2019-10-23T04:15:38Z

Overview

Refactor Smart Group Cache population to be less intensive. AUG have deployed this on our system and appears to be working fine, This is from @mattwire https://github.com/mattwire/civicrm-groupaclcache/blob/master/patches/master...mattwire:refactor_groupaclcache.diff#L108

Before

Cache building can be a intensive

After

Cache building less intensive

ping @mattwire @eileenmcnaughton @monishdeb @JoeMurray

civibot · 2019-10-23T04:15:40Z

(Standard links)

If this is your first pull-request for CiviCRM, please browse CONTRIBUTING.md for information about the development and testing processes.
If you are reviewing this pull-request, you may wish to consult the test sites and the Review Standards (long template, short template).

eileenmcnaughton · 2019-10-23T04:19:01Z

CRM/Contact/BAO/GroupContactCache.php

@@ -480,30 +480,15 @@ public static function load(&$group, $force = FALSE) {
      return;
    }

-    // grab a lock so other processes don't compete and do the same query
-    $lock = Civi::lockManager()->acquire("data.core.group.{$groupID}");


Why are we removing this - seems like a good thing?

@eileenmcnaughton its moved further lower

Yes but why? It seems like a good thing where it was

We should only lock the database for as short a time as we need and that is when we are doing the actual operations on civicrm_group_contact_cache. We're using a temp table much more than what the previous code was doing

We're not locking the database though - we are specifically telling other processes that want to rebuild the cache for the specific group not to because a process is already doing that

It's as @seamuslee001 says. We're telling other processes via a static self::$_alreadyLoaded[$groupID] = 1; at the start of the function that we're already building. But the building can take a significant (in cpu/database terms) amount of time and we only want the mysql lock (which is what Civi::lockmanager() gives us) for the shortest amount of time which is why it's moved down.

I still think the broken lock system should be gotten rid of, rather than allowing it to be configured to work on certain versions of mysql. It causes weird side effects like this desire to minimize the time that locks are held. A simple database table for scoped locks works fine.

I don't think it IS broken if you have the define set - which @mattwire proposes to do later on.

@mattwire I think you are saying 'we only really need to lock it when we are moving data from the temp table to the live table'. So that means we have 2 processes

Process one

starts potentially slow query to fill temp table

grabs lock

moves data from temp table to live table

releases lock

Process two

starts potentially slow query to fill temp table

tries to grab lock & fails

aborts

Doing the potentially slow query is only useful if it will then be able to do something with the results. The alternative is

Process one

starts potentially-slow query to fill temp table

grabs lock

moves data from temp table to live table

releases lock

Process two

tries to grab lock & fails

aborts without running potentially-slow query to fill temp table

That seems better to me - especially since it might not be process 2 running the potentially slow query but processes 2,3,4,5,6,7,8,9,10.

eileenmcnaughton · 2019-10-23T04:19:58Z

CRM/Contact/BAO/GroupContactCache.php

+    // Don't call clearGroupContactCache as we don't want to clear the cache dates
+    // The will get updated by updateCacheTime() below and not clearing the dates reduces
+    // the chance that loadAll() will try and rebuild at the same time.
+    $clearCacheQuery = "


Why is this back in here?

I'm unsure why but maybe @mattwire can comment, however we only want the contacts that match the criteria in the cache so this seems to do it

hmm we should only ever be calling 'load' when the cache has already been emptied - ie on demand check if cache is valid. If not purge (if required) and then call load

The idea here is that actively clearing the cache is expensive and we don't need to do it in advance. Setting the cache to expired is enough because that will trigger a rebuild next time it is required. But obviously we do need to clear it before re-populating so the DELETE query + INSERT query are next to each other to be as atomic as possible. As soon as the cache table is actually up to date we set the cache time so that any process accessing the cache knows that it is up to date and can be used.

eileenmcnaughton · 2019-10-23T04:21:10Z

@seamuslee001 my big picture thoughts here are that the PR template needs a lot more detail about the technical changes made in this & why and also that we'll need to get some more sites to test this 'in the wild' before merging to core

eileenmcnaughton · 2019-10-23T04:22:27Z

@bjendres @davejenx @mfb @lucianov88 @pfigel have an interest in performance

eileenmcnaughton · 2019-10-23T04:26:58Z

CRM/Contact/BAO/GroupContactCache.php

+    CRM_Core_DAO::executeQuery($clearCacheQuery, $params);
+
+    CRM_Core_DAO::executeQuery(
+      "INSERT IGNORE INTO civicrm_group_contact_cache (contact_id, group_id)


We shouldn't need INSERT IGNORE here

No we shouldn't, but we do! Without INSERT IGNORE I was still seeing occasional database exceptions because of duplicate entries (every few hours or so). It suggests that there is still a hidden contention issue somewhere deep down but I couldn't find out where. It would be good to add a comment to that effect here to be clear why we have INSERT IGNORE but solving that problem shouldn't hold up this PR and could be a follow-up once we've sorted out the major deadlocks via this PR. It is worth noting that INSERT IGNORE is significantly slower than INSERT by itself so would be nice to resolve at some point!

Actually just looking - this is a copy of the code in self::clearGroupContactCache($groupID); - which is called a few lines earlier - so this isn't a behaviour change - just a duplication

monishdeb · 2019-10-23T04:32:13Z

I am happy to get this patch merged because I have reviewed this code earlier and deployed it in a large prod site. And its day 11, without any glitch or crash in the cache rebuild process.

mattwire · 2019-10-23T11:34:04Z

CRM/Contact/BAO/GroupContactCache.php

+    $groupContactsTempTable->drop();
+    self::updateCacheTime([$groupID], TRUE);


I guess it would be nice to swap these lines over as updating the cache time is more "urgent" that cleaning up.

mattwire · 2019-10-23T11:35:14Z

@seamuslee001 @monishdeb Thanks so much for picking this up! @eileenmcnaughton Thanks for reviewing.

mattwire · 2019-10-23T11:37:01Z

It would be worth a parallel PR to remove the CIVICRM_SUPPORT_MULTIPLE_LOCKS define and make it the default as it's been available for quite some time now and without it being enabled your groupcontact/ACL caches die horribly if you send a mail. FWIW I use it on all my client sites.

eileenmcnaughton · 2019-10-24T01:35:45Z

@mattwire we are running that define in production - if you are too then I think that is enough to pull it out (I always thought it overly cautious TBH)

eileenmcnaughton · 2019-10-24T01:51:12Z

I still think we need a really good PR template on this one. My take on it is the main change is inserting into the temp table first so we lock the main table for a much shorter time period

mattwire · 2019-11-01T11:31:49Z

I'm going to merge this on the basis that:

It's been in use on some of my client sites which previously exhibited serious mysql deadlocks for over a year and there have been no deadlock issues.
@JoeMurray have this on at least one production site and it's resolved issues for them.
@seamuslee001 has this on a very large production site and it's resolved issues for them.

One of the things this PR does is make it much clearer what the caching code is doing - we get away from things like $sqlA and $sqlB. There are almost certainly multiple ways this could be improved further but we can build on it later.

davejenx · 2019-11-01T11:57:10Z

Good work! Hoped to be able to test sooner, a new person in place at the relevant site makes that more feasible now.

Is the PR expected to improve performance both with the older & newer locking mechanisms (MySQL >= 5.7)?

mattwire · 2019-11-01T12:21:49Z

@davejenx The multiple locks is enabled by default if available now (GET_LOCK() in mysql). But it will always be broken on older versions of MySQL (< 5.7.5) or MariaDB (< 10.0.2) so you will need to make sure you have at least those SQL versions.

monishdeb · 2019-11-01T12:22:08Z

Congrats everyone esp @mattwire @seamuslee001 on getting this PR merged. Its a big leap in resolving the intermittent deadlock situation that occurs during smart cache rebuild.

davejenx · 2019-11-01T12:30:50Z

@mattwire Thanks, so to clarify, the PR will only improve performance when using the newer locking mechanism (MySQL >= 5.7.5) and won't benefit sites on older MySQL versions?

mattwire · 2019-11-01T12:36:28Z

@davejenx Correct

[REF] Refactor Smart Group Cache population code to be less intensive

34e7abb

civibot bot added the master label Oct 23, 2019

eileenmcnaughton reviewed Oct 23, 2019

View reviewed changes

mattwire reviewed Oct 23, 2019

View reviewed changes

This was referenced Oct 25, 2019

[REF] Refactor ACL Contact Cache generation to be more efficient #15592

Merged

Remove CIVICRM_SUPPORT_MULTIPLE_LOCKS and make it always enabled if available #15604

Merged

mattwire merged commit be2d4a0 into civicrm:master Nov 1, 2019

colemanw mentioned this pull request Apr 7, 2020

Allow other base tables for api4-based smart groups #17003

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REF] Refactor Smart Group Cache population code to be less intensive #15588

[REF] Refactor Smart Group Cache population code to be less intensive #15588

seamuslee001 commented Oct 23, 2019

civibot bot commented Oct 23, 2019

eileenmcnaughton Oct 23, 2019

seamuslee001 Oct 23, 2019

eileenmcnaughton Oct 23, 2019

seamuslee001 Oct 23, 2019

eileenmcnaughton Oct 23, 2019

mattwire Oct 23, 2019

mfb Oct 23, 2019 •

edited

Loading

eileenmcnaughton Oct 24, 2019

eileenmcnaughton Oct 23, 2019

seamuslee001 Oct 23, 2019

eileenmcnaughton Oct 23, 2019

mattwire Oct 23, 2019

eileenmcnaughton commented Oct 23, 2019

eileenmcnaughton commented Oct 23, 2019

eileenmcnaughton Oct 23, 2019

mattwire Oct 23, 2019

eileenmcnaughton Oct 24, 2019

monishdeb commented Oct 23, 2019

mattwire Oct 23, 2019

mattwire commented Oct 23, 2019

mattwire commented Oct 23, 2019

eileenmcnaughton commented Oct 24, 2019

eileenmcnaughton commented Oct 24, 2019

mattwire commented Nov 1, 2019

davejenx commented Nov 1, 2019

mattwire commented Nov 1, 2019

monishdeb commented Nov 1, 2019 •

edited

Loading

davejenx commented Nov 1, 2019

mattwire commented Nov 1, 2019

		$groupContactsTempTable->drop();
		self::updateCacheTime([$groupID], TRUE);

[REF] Refactor Smart Group Cache population code to be less intensive #15588

[REF] Refactor Smart Group Cache population code to be less intensive #15588

Conversation

seamuslee001 commented Oct 23, 2019

Overview

Before

After

civibot bot commented Oct 23, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfb Oct 23, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eileenmcnaughton commented Oct 23, 2019

eileenmcnaughton commented Oct 23, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

monishdeb commented Oct 23, 2019

Choose a reason for hiding this comment

mattwire commented Oct 23, 2019

mattwire commented Oct 23, 2019

eileenmcnaughton commented Oct 24, 2019

eileenmcnaughton commented Oct 24, 2019

mattwire commented Nov 1, 2019

davejenx commented Nov 1, 2019

mattwire commented Nov 1, 2019

monishdeb commented Nov 1, 2019 • edited Loading

davejenx commented Nov 1, 2019

mattwire commented Nov 1, 2019

mfb Oct 23, 2019 •

edited

Loading

monishdeb commented Nov 1, 2019 •

edited

Loading