Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ORDER BY and GROUP BY from alphabetQuery to improve performance #16877

Merged
merged 1 commit into from
Aug 3, 2020

Conversation

agileware-justin
Copy link
Contributor

Overview

Copied from https://lab.civicrm.org/dev/core/issues/1665

MySQL uses filesort index when building the query which can cause performance issues. This can be solved by simply removing the GROUP BY and ORDER BY options from query in function alphabetQuery, CRM/Contact/BAO/Query.php see https://github.com/civicrm/civicrm-core/blob/master/CRM/Contact/BAO/Query.php#L5012

This change has no visible impact on the search results pager or listing.

Before

function alphabetQuery, CRM/Contact/BAO/Query.php see https://github.com/civicrm/civicrm-core/blob/master/CRM/Contact/BAO/Query.php#L5012

Original query:

    $query = "SELECT DISTINCT LEFT(contact_a.sort_name, 1) as sort_name
      {$sqlParts['from']}
      {$sqlParts['where']}
      {$sqlParts['having']}
      GROUP BY sort_name
      ORDER BY sort_name asc";

The query was generated by CiviCRM when doing an Advanced Search and selecting 6 CiviCRM Groups (not Smart Groups) - displaying results as contacts.

Before this change, note the "Using filesort" on 24754 rows. Query takes longer than 90 seconds to complete - can trigger a PHP timeout.

MariaDB [ajpulmse_crm]> explain  SELECT DISTINCT LEFT(contact_a.sort_name, 1) as sort_name FROM civicrm_contact contact_a LEFT JOIN civicrm_group_contact `civicrm_group_contact-5e742600a44d
7` ON (contact_a.id = `civicrm_group_contact-5e742600a44d7`.contact_id AND `civicrm_group_contact-5e742600a44d7`.status IN ('Added')) LEFT JOIN civicrm_group_contact `civicrm_group_contact-
5e742600a47a7` ON (contact_a.id = `civicrm_group_contact-5e742600a47a7`.contact_id AND `civicrm_group_contact-5e742600a47a7`.status IN ('Added')) LEFT JOIN civicrm_group_contact `civicrm_gr
oup_contact-5e742600a4a78` ON (contact_a.id = `civicrm_group_contact-5e742600a4a78`.contact_id AND `civicrm_group_contact-5e742600a4a78`.status IN ('Added')) LEFT JOIN civicrm_group_contact
 `civicrm_group_contact-5e742600a4d39` ON (contact_a.id = `civicrm_group_contact-5e742600a4d39`.contact_id AND `civicrm_group_contact-5e742600a4d39`.status IN ('Added')) LEFT JOIN civicrm_g
roup_contact `civicrm_group_contact-5e742600a4fde` ON (contact_a.id = `civicrm_group_contact-5e742600a4fde`.contact_id AND `civicrm_group_contact-5e742600a4fde`.status IN ('Added')) LEFT JO
IN civicrm_group_contact `civicrm_group_contact-5e742600a529d` ON (contact_a.id = `civicrm_group_contact-5e742600a529d`.contact_id AND `civicrm_group_contact-5e742600a529d`.status IN ('Adde
d')) WHERE ( ( ( ( `civicrm_group_contact-5e742600a44d7`.group_id IN ("498") ) ) ) OR ( ( ( `civicrm_group_contact-5e742600a47a7`.group_id IN ("499") ) ) ) OR ( ( ( `civicrm_group_contact-5
e742600a4a78`.group_id IN ("505") ) ) ) OR ( ( ( `civicrm_group_contact-5e742600a4d39`.group_id IN ("504") ) ) ) OR ( ( ( `civicrm_group_contact-5e742600a4fde`.group_id IN ("503") ) ) ) OR
( ( ( `civicrm_group_contact-5e742600a529d`.group_id IN ("502") ) ) ) ) AND (contact_a.is_deleted = 0) GROUP BY sort_name;
+------+-------------+-------------------------------------+------+----------------------------+----------------------------+---------+---------------------------+-------+------------------
-----------------------------------------+
| id   | select_type | table                               | type | possible_keys              | key                        | key_len | ref                       | rows  | Extra
                                         |
+------+-------------+-------------------------------------+------+----------------------------+----------------------------+---------+---------------------------+-------+-----------------------------------------------------------+
|    1 | SIMPLE      | contact_a                           | ref  | index_is_deleted_sort_name | index_is_deleted_sort_name | 1       | const                     | 24754 | Using where; Using index; Using temporary; Using filesort |
|    1 | SIMPLE      | civicrm_group_contact-5e742600a44d7 | ref  | UI_contact_group           | UI_contact_group           | 4       | ajpulmse_crm.contact_a.id |     1 | Using where; Distinct                                     |
|    1 | SIMPLE      | civicrm_group_contact-5e742600a47a7 | ref  | UI_contact_group           | UI_contact_group           | 4       | ajpulmse_crm.contact_a.id |     1 | Using where; Distinct                                     |
|    1 | SIMPLE      | civicrm_group_contact-5e742600a4a78 | ref  | UI_contact_group           | UI_contact_group           | 4       | ajpulmse_crm.contact_a.id |     1 | Using where; Distinct                                     |
|    1 | SIMPLE      | civicrm_group_contact-5e742600a4d39 | ref  | UI_contact_group           | UI_contact_group           | 4       | ajpulmse_crm.contact_a.id |     1 | Using where; Distinct                                     |
|    1 | SIMPLE      | civicrm_group_contact-5e742600a4fde | ref  | UI_contact_group           | UI_contact_group           | 4       | ajpulmse_crm.contact_a.id |     1 | Using where; Distinct                                     |
|    1 | SIMPLE      | civicrm_group_contact-5e742600a529d | ref  | UI_contact_group           | UI_contact_group           | 4       | ajpulmse_crm.contact_a.id |     1 | Using where; Distinct                                     |
+------+-------------+-------------------------------------+------+----------------------------+----------------------------+---------+---------------------------+-------+-----------------------------------------------------------+
7 rows in set, 1 warning (0.00 sec)

After

Remove the GROUP BY and ORDER BY options from query in function alphabetQuery, CRM/Contact/BAO/Query.php see https://github.com/civicrm/civicrm-core/blob/master/CRM/Contact/BAO/Query.php#L5012

Change query to:

    $query = "SELECT DISTINCT LEFT(contact_a.sort_name, 1) as sort_name
      {$sqlParts['from']}
      {$sqlParts['where']}
      {$sqlParts['having']}";

After this change, note absence of the "Using filesort". Query completes in 5 seconds or less.

This change has no visible impact on the search results pager or listing.

MariaDB [ajpulmse_crm]> explain
    -> SELECT DISTINCT LEFT(contact_a.sort_name, 1) as sort_name FROM civicrm_contact contact_a LEFT JOIN civicrm_group_contact `civicrm_group_contact-5e742600a44d7` ON (contact_a.id = `civ
icrm_group_contact-5e742600a44d7`.contact_id AND `civicrm_group_contact-5e742600a44d7`.status IN ('Added')) LEFT JOIN civicrm_group_contact `civicrm_group_contact-5e742600a47a7` ON (contact
_a.id = `civicrm_group_contact-5e742600a47a7`.contact_id AND `civicrm_group_contact-5e742600a47a7`.status IN ('Added')) LEFT JOIN civicrm_group_contact `civicrm_group_contact-5e742600a4a78`
 ON (contact_a.id = `civicrm_group_contact-5e742600a4a78`.contact_id AND `civicrm_group_contact-5e742600a4a78`.status IN ('Added')) LEFT JOIN civicrm_group_contact `civicrm_group_contact-5e
742600a4d39` ON (contact_a.id = `civicrm_group_contact-5e742600a4d39`.contact_id AND `civicrm_group_contact-5e742600a4d39`.status IN ('Added')) LEFT JOIN civicrm_group_contact `civicrm_grou
p_contact-5e742600a4fde` ON (contact_a.id = `civicrm_group_contact-5e742600a4fde`.contact_id AND `civicrm_group_contact-5e742600a4fde`.status IN ('Added')) LEFT JOIN civicrm_group_contact `
civicrm_group_contact-5e742600a529d` ON (contact_a.id = `civicrm_group_contact-5e742600a529d`.contact_id AND `civicrm_group_contact-5e742600a529d`.status IN ('Added')) WHERE ( ( ( ( `civicr
m_group_contact-5e742600a44d7`.group_id IN ("498") ) ) ) OR ( ( ( `civicrm_group_contact-5e742600a47a7`.group_id IN ("499") ) ) ) OR ( ( ( `civicrm_group_contact-5e742600a4a78`.group_id IN
("505") ) ) ) OR ( ( ( `civicrm_group_contact-5e742600a4d39`.group_id IN ("504") ) ) ) OR ( ( ( `civicrm_group_contact-5e742600a4fde`.group_id IN ("503") ) ) ) OR ( ( ( `civicrm_group_conta
ct-5e742600a529d`.group_id IN ("502") ) ) ) ) AND (contact_a.is_deleted = 0);
+------+-------------+-------------------------------------+------+----------------------------+----------------------------+---------+---------------------------+-------+------------------
------------+
| id   | select_type | table                               | type | possible_keys              | key                        | key_len | ref                       | rows  | Extra
            |
+------+-------------+-------------------------------------+------+----------------------------+----------------------------+---------+---------------------------+-------+------------------
------------+
|    1 | SIMPLE      | contact_a                           | ref  | index_is_deleted_sort_name | index_is_deleted_sort_name | 1       | const                     | 24754 | Using index; Usin
g temporary |
|    1 | SIMPLE      | civicrm_group_contact-5e742600a44d7 | ref  | UI_contact_group           | UI_contact_group           | 4       | ajpulmse_crm.contact_a.id |     1 | Using where; Dist
inct        |
|    1 | SIMPLE      | civicrm_group_contact-5e742600a47a7 | ref  | UI_contact_group           | UI_contact_group           | 4       | ajpulmse_crm.contact_a.id |     1 | Using where; Dist
inct        |
|    1 | SIMPLE      | civicrm_group_contact-5e742600a4a78 | ref  | UI_contact_group           | UI_contact_group           | 4       | ajpulmse_crm.contact_a.id |     1 | Using where; Dist
inct        |
|    1 | SIMPLE      | civicrm_group_contact-5e742600a4d39 | ref  | UI_contact_group           | UI_contact_group           | 4       | ajpulmse_crm.contact_a.id |     1 | Using where; Dist
inct        |
|    1 | SIMPLE      | civicrm_group_contact-5e742600a4fde | ref  | UI_contact_group           | UI_contact_group           | 4       | ajpulmse_crm.contact_a.id |     1 | Using where; Dist
inct        |
|    1 | SIMPLE      | civicrm_group_contact-5e742600a529d | ref  | UI_contact_group           | UI_contact_group           | 4       | ajpulmse_crm.contact_a.id |     1 | Using where; Dist
inct        |
+------+-------------+-------------------------------------+------+----------------------------+----------------------------+---------+---------------------------+-------+------------------
------------+
7 rows in set (0.00 sec)

Technical Details

None.

Comments

See Gitlab, https://lab.civicrm.org/dev/core/issues/1665

Agileware ref: CIVICRM-1457

@civibot
Copy link

civibot bot commented Mar 22, 2020

(Standard links)

@civibot civibot bot added the master label Mar 22, 2020
@mattwire
Copy link
Contributor

Does this work with fullgroupby enables on mysql?

@seamuslee001
Copy link
Contributor

Jenkins test this please

@seamuslee001
Copy link
Contributor

@mattwire the test system uses the MySQL 5.7 Only Full Group By sql mode

@mattwire
Copy link
Contributor

@agileware-justin @seamuslee001 Please see #13772 where this code had a major refactor and switched to newer methods. I think the reason those group by/order by clauses are there are just because that's what the old code did. Please have a quick read through that PR to be happy that this change won't break anything.

This code is responsible for generating the "alphabet" pager along the top of search results. The letters should appear in the right order (including non-english characters eg. á etc.).

If you test and confirm this is the case then happy for it to be merged.

{$sqlParts['having']}
GROUP BY sort_name
ORDER BY sort_name asc";
{$sqlParts['having']}";
Copy link
Contributor

@eileenmcnaughton eileenmcnaughton Mar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note it's invalid to have a having without a group by - however, it's also invalid to have one BEFORE the group by I think so ??? (WTF)

@jaapjansma
Copy link
Contributor

Betty and I are reviewing PRs and we came across this one. No work has happened since March 23. @agileware-justin are you planning to do any work on this? If not we will advice to close this PR.

@seamuslee001
Copy link
Contributor

Jenkins re test this please

@seamuslee001
Copy link
Contributor

I started testing this last week but still need more testing to do will report back

@jaapjansma jaapjansma added the needs-work-not-review-ready Submitter required to take action, will be closed after 1-2 weeks in this state label Jun 8, 2020
@bastienho
Copy link
Contributor

Hi,

As explained in a question on Stack Exchange, usage of HAVING in alphabetQuery also generate an exception becasue of SELECT alteration.

The SELECT should take care of HAVING dependencies

@eileenmcnaughton
Copy link
Contributor

Ok - so here is my take on this

  1. HAVING must be broken prior to this change & remains broken with this change
  2. If we take having out of the picture group by is unnecessary - since we are doing an implicit group by on sort name & the php is doing an order by
  3. the alphabet query is often turned of on larger sites (as is the wildcard prefixing) to improve search performance so we can't read too much into the lack of other reports

So I think it's OK to remove the orderBy & GroupBy as long as we ALSO remove HAVING - since having without group by makes no sense.

@agileware-justin are you OK to make that change?

@eileenmcnaughton eileenmcnaughton removed the needs-work-not-review-ready Submitter required to take action, will be closed after 1-2 weeks in this state label Jul 27, 2020
@agileware-justin
Copy link
Contributor Author

OK I'll review

@agileware-justin
Copy link
Contributor Author

the alphabet query is often turned of on larger sites (as is the wildcard prefixing) to improve search performance so we can't read too much into the lack of other reports

@eileenmcnaughton why do only "larger sites" get the benefit of this performance tweak? Surely, the "smaller sites" would have a much better user experience if they could also have a more responsive site and disabling these options as you mentioned should really be the default, not some obsure "large site only hack". Don't you agree?

@eileenmcnaughton
Copy link
Contributor

@agileware-justin lots of things that perform badly on large data sets don't have noticeable issues on small datasets

@bastienho
Copy link
Contributor

It seems that a similar error occurs in CRM_Contact_BAO_Query::query() near:

$select = 'SELECT count(DISTINCT contact_a.id) as rowCount';

HAVING clauses become orphan, as SELECT is limited to rowCount

@eileenmcnaughton
Copy link
Contributor

@bastienho what query generates a HAVING clause?

@bastienho
Copy link
Contributor

None in the core, but a one added by an extension.
For example, I'm working on an search by sum of contribution_line_item, which will not be possible without patching the core.

@seamuslee001
Copy link
Contributor

I'm going to merge this and follow up with a subsequent PR to remove the having clause

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants