Speed up loading of contribution tab on contacts with large number of contributions in a large database #13512

eileenmcnaughton · 2019-01-29T23:25:11Z

Overview

Alters the query used to load 'annual totals' on the contact contribution tab such that is is faster in some high-volume situations

Before

Query takes 6 seconds on contact with 18k contributions in DB of 30m contributions

After

Query takes 0.05 seconds on contact with 18k contributions in DB oof 30m contributions

No discernable difference in medium sized (400k contributions) DB. Slight (.02 seconds) improvement in minor donor within large DB

Technical Details

Alter the query used to render the yearly totals on the contact contribution tab to make a better index choice.

This PR restructures the annual query to use HAVING clause for contribution_status_id in order to
avoid the (edge case) situation where it uses a combined index resulting in a slow query. When a contribution is entered
the screen loaded on-close is the contribution tab so when this is slow the data entry process is much slowed.

Mmsql index merge optimisation is a default enabled setting which can choose to use 'merged indexes' rather than
just one. In this query the best index is contact_id because even when a contact has a relatively large number of
contributions they are still trivial compared to the overall DB. By contrast contribution_status_id is not a good
choice as it has fairly low variability and when used in combination with contact_id adds very little as trade off for the time
it takes to construct a merge index

What we are seeing is that the index choice on this query is
index_contribution_status,received_date,FK_civicrm_contribution_contact_id,contact_id_contribution_status | FK_civicrm_contribution_contact_id,index_contribution_status

which results in a query that takes around 6 seconds on a contact with 18k contributions in a DB of around 30 million contributions.

If we restructure the query so that it looks like

     SELECT COUNT(*) as count,
             SUM(total_amount) as amount,
             AVG(total_amount) as average,
             currency
      FROM civicrm_contribution  b

      WHERE b.contact_id IN (72) AND b.receive_date >= 20180701 AND b.receive_date <  20190701
      GROUP BY currency, contribution_status_id
      HAVING contribution_status_id = 1;

the selected index is FK_civicrm_contribution_contact_id and the query takes .05 seconds.

This restructure achieves much the same as using an index hint but as there is some index naming inconsistency
'in the wild' changing the query structure like this makes more sense.

Note that I ALSO tested this

on the same database on a contact with a negligible number of contributions
on a much smaller database (~400k contributions) on a contact with 122 contributions.

In both cases it made no difference to which key was chosen as the merge index appears to only be chosen by mysql
when more than a certain number of on contributions adhere to a single contact.

In the latter case the query time for both versions was recorded as 0.00 so I couldn't see any difference.

In the former case the query change sped up the query from 0.02 to 0.00 so there is some additional efficiency
in the new query beyond the index choice

Links on index merge optimisation
https://www.percona.com/blog/2012/12/14/the-optimization-that-often-isnt-index-merge-intersection/
http://gobitcan.com/blog/2014-12-03-what-to-do-when-mysql-ignores-your-index
http://mysqlopt.blogspot.com/2013/05/mysql-slow-query-example-of-index-merge.html
https://www.percona.com/blog/2012/12/14/the-optimization-that-often-isnt-index-merge-intersection/
https://stackoverflow.com/questions/16283472/why-is-mysql-showing-index-merge-on-this-query
https://mariadb.com/kb/en/library/fair-choice-between-range-and-index_merge-optimizations/
https://dba.stackexchange.com/questions/178286/mysql-on-rds-avoiding-index-merge
https://mariadb.com/kb/en/library/index_merge-sort_intersection/
https://stackoverflow.com/questions/47943755/mysql-optimizing-subquery-with-index-merge
https://dev.mysql.com/doc/refman/8.0/en/index-merge-optimization.html

Comments

@pfigel you might want to see if you have any donors where this seems to make a difference for you - you should be able to test the query change without necessarily the patch

… join gives the wrong results As demonstrated in the test the left join does not do any filtering but it DOES join in as many line items as match the permitted types - resulting in too many rows & the sum calculated on them being incorrect. As this doesn't work it should be removed - I would aniticpate that sites with the financialreportacls extensions would get this adequately added through CRM_Financial_BAO_FinancialType::addACLClausesToWhereClauses but regardless this should go as it causes a bug without achieving it's intended result and adding a test prevents a later fix from re-breaking

civibot · 2019-01-29T23:25:15Z

(Standard links)

If this is your first pull-request for CiviCRM, please browse CONTRIBUTING.md for information about the development and testing processes.
If you are reviewing this pull-request, you may wish to consult the test sites and the Review Standards (long template, short template).

…ibution tab to make a better index choice. This PR restructures the annual query to use HAVING clause for contribution_status_id in order to avoid the (edge case) situation where it uses a combined index resulting in a slow query. When a contribution is entered the screen loaded on-close is the contribution tab so when this is slow the data entry process is much slowed. Mmsql index merge optimisation is a default enabled setting which can choose to use 'merged indexes' rather than just one. In this query the best index is contact_id because even when a contact has a relatively large number of contributions they are still trivial compared to the overall DB. By contrast contribution_status_id is not a good choice as it has fairly low variability and when used in combination with contact_id adds very little as trade off for the time it takes to construct a merge index What we are seeing is that the index choice on this query is index_contribution_status,received_date,FK_civicrm_contribution_contact_id,contact_id_contribution_status | FK_civicrm_contribution_contact_id,index_contribution_status which results in a query that takes around 6 seconds on a contact with 18k contributions in a DB of around 30 million contributions. If we restructure the query so that it looks like ``` SELECT COUNT(*) as count, SUM(total_amount) as amount, AVG(total_amount) as average, currency FROM civicrm_contribution b WHERE b.contact_id IN (72) AND b.receive_date >= 20180701 AND b.receive_date < 20190701 GROUP BY currency, contribution_status_id HAVING contribution_status_id = 1; ``` the selected index is FK_civicrm_contribution_contact_id and the query takes .05 seconds. This restructure achieves much the same as using an index hint but as there is some index naming inconsistency 'in the wild' changing the query structure like this makes more sense. Note that I ALSO tested this 1) on the same database on a contact with a negligible number of contributions 2) on a much smaller database (~400k contributions) on a contact with 122 contributions. In both cases it made no difference to which key was chosen as the merge index appears to only be chosen by mysql when more than a certain number of on contributions adhere to a single contact. In the latter case the query time for both versions was recorded as 0.00 so I couldn't see any difference. In the former case the query change sped up the query from 0.02 to 0.00 so there is some additional efficiency in the new query beyond the index choice Links on index merge optimisation https://www.percona.com/blog/2012/12/14/the-optimization-that-often-isnt-index-merge-intersection/ http://gobitcan.com/blog/2014-12-03-what-to-do-when-mysql-ignores-your-index http://mysqlopt.blogspot.com/2013/05/mysql-slow-query-example-of-index-merge.html https://www.percona.com/blog/2012/12/14/the-optimization-that-often-isnt-index-merge-intersection/ https://stackoverflow.com/questions/16283472/why-is-mysql-showing-index-merge-on-this-query https://mariadb.com/kb/en/library/fair-choice-between-range-and-index_merge-optimizations/ https://dba.stackexchange.com/questions/178286/mysql-on-rds-avoiding-index-merge https://mariadb.com/kb/en/library/index_merge-sort_intersection/ https://stackoverflow.com/questions/47943755/mysql-optimizing-subquery-with-index-merge https://dev.mysql.com/doc/refman/8.0/en/index-merge-optimization.html

eileenmcnaughton · 2019-01-30T02:02:59Z

test this please

JoeMurray · 2019-01-30T15:24:57Z

On code review, new query looks good.

pfigel · 2019-01-30T15:36:28Z

@eileenmcnaughton seeing 0.07s with the new query vs 3.5s for a contact with 30k contributions, so that looks great!

Was the removal of that line item/financial type code intentional? Not sure what that does, to be honest. 😄

eileenmcnaughton · 2019-01-30T20:12:38Z

@pfigel yes it was but it was in this preliminary PR #13469

the line item join is intended to implement ACLS but it actually doesn't filter anything out, what it DOES do is is cause some contributions to be duplicated in the tally

JoeMurray · 2019-01-30T21:37:20Z

Noting positive performance test by @pfigel I'm merging.

eileenmcnaughton · 2019-01-30T21:38:27Z

Thanks @JoeMurray

civibot bot added the master label Jan 29, 2019

eileenmcnaughton force-pushed the cont_annual_speed branch from 1ec168e to 0c54553 Compare January 29, 2019 23:30

JoeMurray merged commit 58ab7ba into civicrm:master Jan 30, 2019

eileenmcnaughton deleted the cont_annual_speed branch January 30, 2019 21:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up loading of contribution tab on contacts with large number of contributions in a large database #13512

Speed up loading of contribution tab on contacts with large number of contributions in a large database #13512

eileenmcnaughton commented Jan 29, 2019 •

edited

Loading

civibot bot commented Jan 29, 2019

eileenmcnaughton commented Jan 30, 2019

JoeMurray commented Jan 30, 2019

pfigel commented Jan 30, 2019

eileenmcnaughton commented Jan 30, 2019

JoeMurray commented Jan 30, 2019

eileenmcnaughton commented Jan 30, 2019

Speed up loading of contribution tab on contacts with large number of contributions in a large database #13512

Speed up loading of contribution tab on contacts with large number of contributions in a large database #13512

Conversation

eileenmcnaughton commented Jan 29, 2019 • edited Loading

Overview

Before

After

Technical Details

Comments

civibot bot commented Jan 29, 2019

eileenmcnaughton commented Jan 30, 2019

JoeMurray commented Jan 30, 2019

pfigel commented Jan 30, 2019

eileenmcnaughton commented Jan 30, 2019

JoeMurray commented Jan 30, 2019

eileenmcnaughton commented Jan 30, 2019

eileenmcnaughton commented Jan 29, 2019 •

edited

Loading