Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRM-19798 fix memory leak #11615

Closed

Conversation

eileenmcnaughton
Copy link
Contributor

@eileenmcnaughton eileenmcnaughton commented Jan 31, 2018

Overview

Fix memory leak that manifests (among other places) when calling api EntityTag.get

Before

Memory in use before & after 100 iterations of EntityTag.get

Start :'49,864,896'
End :'53,503,264'

After

Memory in use before & after 100 iterations of EntityTag.get

Start :'49,851,680'
End:'49,852,168'

Technical Details

The PR adds a class destructor to the DAO that frees the mysql resource. From testing it's fine if it has already been freed it gets most code paths (there is one that it does not - see below)

I dug fairly heavily into the way the DAO cache is currently being used. It builds up queries in the global
$_DB_DATAOBJECT['RESULTS'];
From my investigations these are used for iterating through a result set rather than caching the results of a query. ie. in the case of this function

function demonstrate() {
$dao = new CRM_Core_DAO_Domain();
$dao->find(TRUE)
return $dao->locales;
}
there will be an entry remaining in
$_DB_DATAOBJECT['RESULTS']
representing the mysql results - but it can only be used from that $dao. Once we have left the function the $dao will be destroyed but the object to retrieve the result set will remain in the global, unusable but memory hogging.

If the $dao is freed before it's time then $dao->fetch() will no longer retrieve results - potentially causing unpredictable errors (this is the risk of calling CRM_Core_DAO::freeResult()) as an outer loop may be disbanded. I removed these calls as they are no longer doing any good (& can do harm)

Path still not releasing memory:

   $rule = new CRM_ACL_BAO_ACL();
   $rule->query($query);

I think that is a pretty clumsy construct & we should swap to
CRM_Core_DAO::executeQuery();

Comments

I submitted various versions of this change for CI & noticed no particular difference in the duration of the jenkins job. I think I saw more difference locally but did not do multiple tests on that. Definitely no worse! I was able to replicate the memory management issue in a test & submitted the test without the changes to see if that is replication on jenkins #11616


@eileenmcnaughton eileenmcnaughton force-pushed the memory_more branch 3 times, most recently from 1908957 to 6d98a75 Compare January 31, 2018 23:56
This is an alternate methodology using __destruct on the DAO object.

Note that I have found some places bypass this - ie.

```
   $rule = new CRM_ACL_BAO_ACL();
   $rule->query($query);
```

But I think that is a pretty clumsy construct & we should swap to
CRM_Core_DAO::executeQuery();
@eileenmcnaughton eileenmcnaughton changed the title Memory more CRM-19798 fix memory leak Feb 1, 2018
@eileenmcnaughton
Copy link
Contributor Author

unrelated test fail

@eileenmcnaughton
Copy link
Contributor Author

longer discussion on #10844

@eileenmcnaughton
Copy link
Contributor Author

test this please

@totten
Copy link
Member

totten commented Feb 2, 2018

(CiviCRM Review Template WORD-1.0)

  • (r-jira) Pass
  • (r-test) Pass
  • (r-code) Pass
  • (r-doc) Pass
  • (r-maint) Undecided: My feelings are oscillating a bit... I wish we had more automated coverage of high-volume interactions/performance issues (like this). However, it's not really this PR's fault that we lack a suite for that. In this particular case, it feels like it wouldn't be too hard to write. (You've probably been using some scripts that could be copied into a phpunit test?) Maybe put it this way... I don't want to prod you to spend 10 extra hours, but if your manual test scripts would be easy to adapt, then that might help prevent regressions (future memory leaks).
  • (r-run) Undecided: There's no singular use-case for this, so it's hard to conceive a decisive test scenario.
  • (r-user) Pass: Does not affect user experience.
  • (r-tech) Undecided: Some follow questions/comments....

My old copy of universe shows three more places calling freeResult: civicrm-core/bin/ContributionProcessor.php, civicrm-core/bin/cli.class.php, civihr/hrjobcontract/CRM/Hrjobcontract/Dedupe/Merger.php. Should we be concerned about these?

Would the problem of breaking an outerloop span different functions? Like:

function frobnicateAll() {
   $widgets = new CRM_Widget_DAO_Widget();
   $widgets->find();
   while ($widgets->fetch()) {
    frobnicateById($widgets->id);
  }
}

function frobnicateById($widgetId) {
  $widget = new CRM_Widget_DAO_Widget();
  $widget->id = $widgetId;
  $widget->find();
  ...
  // implicit call to `$widget->__destruct()` and thus `free()`
}

Would we expect frobnicateAll() to fail -- because frobnicateById freed all widget-related caches?

@eileenmcnaughton
Copy link
Contributor Author

@totten so in terms on unit tests - I found that the unit test I enclosed showed a clear memory leak without vs with - ie. the unit test failed without this patch.

I did find that adding a couple of lines into the tear down

global $_DB_DATAOBJECT;
$this->assertEmpty($_DB_DATAOBJECT['RESULTS']);

allowed me to identify where the DAO was not being freed - but after adding a dozen ->free commands I decided the deconstructor was a better approach. Apart from the isMultiLingual function there were none that were truly high volume. Most were in the extensions codebase or the Domain object or related to defaultLocationType. This patch saved adding $dao->free() to all those places - and I'm pretty strongly convinced the resultSet is unusable once the DAO is deconstructed.

However, maybe 15 places remain after this patch that use the

   $rule = new CRM_ACL_BAO_ACL();
   $rule->query($query);

pattern - so I can't add the assert to the tests as yet.

Regarding testing - pretty much 90% of the unit tests test this code path.

RE the CRM_Core_DAO::freeResult()) function. I left the places that I see as being marginally supported. FreeResult does more harm than good in general I would say but those places don't have any known issues & this PR does not make them any better or worse.

$dao->free()
releases the mysql resource stored in $_DB_DATAOBJECT['RESULTS']; that relates to that particular DAO
If it is NOT freed it still exists in the global but I don't believe it is still accessible.

CRM_Core_DAO::freeResult()) ;
releases ALL mysql resources stored in $_DB_DATAOBJECT whether or not the $dao is still in use.

The code you grabbed doesn't touch the problem because it's only freeing the DAOs in use

function frobnicateALL) {
   $widgets = new CRM_Widget_DAO_Widget();
   while ($widgets->fetch()) {
    frobnicateById($widgets->id);
  }
}
function frobnicateById($widgetId) {
  $widget = new CRM_Widget_DAO_Widget();
  $widget->id = $widgetId;
  $widget->find();
  ...
  // implicit call to `$widget->__destruct()` and thus `free()` does not cause any problem 


  //  Calling this releases the results in the outer DAO meaning that the next iteration of ->fetch() will
  // not retrieve a result & the outer loop will end early
   CRM_Core_DAO::freeResult()) ;
}

@eileenmcnaughton
Copy link
Contributor Author

eileenmcnaughton commented Feb 21, 2018

I just did some testing on this - we were using runs of 2000 import tasks (contact + contribution + whatever custom processing we do).

Our speed ranged from 345 per second to 411 (improved from 308 to 333 on 4.6 ) with the speed generally being in the low 400s after the first run.

In my 'real' test I got

Without the patch:
Processed 2000 messages in 295 second(s)
Average performance is 407 per minute.
Memory use 118 MB

With
Processed 2000 messages in 298 second(s)
Average performance is 403 per minute.
Memory use 102.68 MB

The speed difference is well within the normal variation between runs but the memory usage difference was consistent & replicable.

Summary - no performance degradation with this & less memory use

wmfgerrit pushed a commit to wikimedia/wikimedia-fundraising-crm that referenced this pull request Feb 21, 2018
It seems our internal changes have made the invoke method no longer work. I believe the
results are still comparable

On 4.6 we recorded on T127133 between 308 & 333 imports per minute.

I just ran on staging now & got between 345 & 411 per minute. Notably the
345 was the first run & presumably some mysql caching kicked in after that
as subsequent runs were faster.

command is
drush profile_donation 1000 "4.7.31 with dao patch"

I did not see a speed difference with or without civicrm/civicrm-core#11615
(once mysql was 'warmed up') although there was a memory usage difference 77.2 MB used vs 83.97MB
without the destruct patch.

Change-Id: I13228690c7d225963442386e26355c47842ca631
@mlutfy
Copy link
Member

mlutfy commented Mar 14, 2018

@eileenmcnaughton Is it safe to assume that you have been running this patch in production for some time now, so that if it had unintended consequences, it would have been detected by now?

@eileenmcnaughton
Copy link
Contributor Author

@mlutfy we have - but the unintended consequence we hit - #11703 - is not yet merged. I did grep for other instances of that pattern & there was one more in the export class which I'll tackle once that is merged & then grep again

@eileenmcnaughton
Copy link
Contributor Author

I'm going to close this for now as I think I should try updating it with a __clone() function to prevent the #11703 issue even if someone does clone it. I can re-open once that is done (the PR queue is not my to-do list)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants