Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) (dev/core#217) Allow replacement of PrevNextCache implementation for search screens #12377

Closed
wants to merge 25 commits into from

Conversation

totten
Copy link
Member

@totten totten commented Jun 30, 2018

Overview

A brief description of the pull request. Try to keep it non-technical.

Before

The current status. Please provide screenshots or gifs (LICEcap, SilentCast) where appropriate.

After

What has been changed. Please provide screenshots or gifs (LICEcap, SilentCast) where appropriate.

Technical Details

If the PR introduces noteworthy technical changes, please describe them here. Provide code snippets if necessary

Comments

Anything else you would like the reviewer to note

@civibot
Copy link

civibot bot commented Jun 30, 2018

(Standard links)

totten added 17 commits July 20, 2018 13:21
…s of contact-search

The `getPositions()` function is used by both contact-search and dedupe-merge use-cases.

```
CRM/Contact/Form/Merge.php:      $pos = CRM_Core_BAO_PrevNextCache::getPositions($cacheKey, $this->_cid, $this->_oid, $this->_mergeId, $join, $where, $flip);
CRM/Contact/Form/Merge.php:      $pos = CRM_Core_BAO_PrevNextCache::getPositions($cacheKey, NULL, NULL, $this->_mergeId, $join, $where);
CRM/Contact/Page/View.php:       $pos = CRM_Core_BAO_PrevNextCache::getPositions("civicrm search $qfKey", $this->_contactId, $this->_contactId);
```

Our aim in developing `CRM_Core_PrevNextCache_Interface` is to allow contact-search to swap a MySQL
backend with a Redis backend -- and dedupe-merge should continue as-is (whether or not Redis is
available).  This basically means:

* Contact-search switches to using `Civi::service('prevnext')->getPositions()`
* Dedupe-merge continues using `CRM_Core_BAO_PrevNextCache::getPositions()`

Note that the `Interface::getPositions()` is simpler than the BAO's variant. This is good because:

* Contact-search doesn't need as many parameters.
* Dedupe-merge still needs all the parameters.
* Adding all parameters would make it hard to implement on other backends. (This is esp true of SQL-style options `$join` and `$where`.)
…of contact-search

The `deleteItem()` function is used by both contact-search and dedupe-merge use-cases. We can classify
several of these just based on the files:

* Contact-search use-cases. (These should be updated to use the interface.)
    * `CRM/Campaign/Selector/Search.php:     CRM_Core_BAO_PrevNextCache::deleteItem(NULL, $cacheKey, 'civicrm_contact');`
    * `CRM/Contact/Form/Search.php:          CRM_Core_BAO_PrevNextCache::deleteItem(NULL, $cacheKey);`
    * `CRM/Contact/Selector.php:             CRM_Core_BAO_PrevNextCache::deleteItem(NULL, $cacheKey, 'civicrm_contact');`
* Dedupe-merge use-cases. (These should contiue using the BAO.)
    * `CRM/Contact/Form/DedupeRules.php:     CRM_Core_BAO_PrevNextCache::deleteItem(NULL, $cacheKey);`
    * `CRM/Contact/Page/DedupeFind.php:      CRM_Core_BAO_PrevNextCache::deleteItem(NULL, CRM_Dedupe_Merger::getMergeCacheKeyString($rgid, $gid, $criteria))`
    * `CRM/Dedupe/Merger.php:                CRM_Core_BAO_PrevNextCache::deleteItem(NULL, "{$cacheKeyString}_stats");`

Additionally, there are two oddballs which are harder to categorize.

* `CRM_Contact_BAO_Contact_Utils::clearContactCaches($isEmptyPrevNextTable = FALSE)` deletes *all*
  prev-next cache-records (`CRM_Core_BAO_PrevNextCache::deleteItem();`).  It only does so in one
  scenario (as part of `CRM/Contact/Import/Form/Preview.php`), which has this explanatory comment:
  "Clear all caches, forcing any searches to recheck the ACLs or group membership as the import may
  have changed it."
* `CRM_Contact_BAO_Contact::deleteContact(...)` deletes any prev-next cache-records which
  reference a specific contact (`CRM_Core_BAO_PrevNextCache::deleteItem($id)`).
  I suppose this provides a low-grade form of referential integrity.

Part of me thinks those should be re-considered (e.g.  to use a hook/event -- and reduce the
coupling between `Contact` and `PrevNext` subsystems).  However, for purposes of dev/core#217, it
seems OK to send `deleteItem(...)` to both BAO (SQL-only) and service (SQL-or-memory) variants.
This function is used one time -- when you run a search, select some
contacts, and perform a task (like "Delete contacts"), the
`CRM/Contact/Form/Task.php` displays a table with the names of the selected
contacts.

This patch makes the logic portable -- so that it can work regardless of
whether selections are stored in MySQL or Redis.

Before
------

* The contacts are selected `FROM civicrm_prevnext_cache WHERE cacheKey LIKE {$key}% AND cacheKey NOT LIKE {$key}_alphabet%`.
* The contact names come from `civicrm_prevnext_cache.data`, which has been
  pre-filled with either `civicrm_contact.sort_name` (for most contact
  searches) or `civicrm_contact.display_name` (for campaign respondent
  searches).

After
-----

* The selections are chosen with `FROM civicrm_prevnext_cache WHERE cacheKey LIKE {$key}%`.
* The contact names are loaded directly from `civicrm_contact.sort_name`.

Comments
--------

* The use of wildcards with `cacheKey` seems like a code-smell suggest a
  somewhat deeper problem in how `cacheKey` is understood.  However, for our
  purposes, it shouldn't matter.
* The before and after queries are very similar in how they use
  `cacheKey`...  and slightly different.  (Ugh.) Is the new one better or
  worse?  Well, look at how `CRM_Contact_Form_Task` uses `getSelection()`
  (for finding contact IDs, which feed into the task logic) and
  `getSelectedContacts()` (for finding contact names, which are displayed to
  user).  If the subtle difference in `cacheKey` filtering matters, then our
  UX is buggy because we're showing the user one contact-list (based on old
  `getSelectedContacts()`) and we're executing on a different contact-list
  (based on the old `getSelection()`).
* Is the old technique for getting names (querying
  `civicrm_prevnext_cache.data`) better than the new technique (querying
  civicrm_contact.sort_name)?  I haven't benchmarked, but I'm skepitcal.
  Both techniques transfer the full `O(n)` list from mysql to php.
  In typical usage, the size of `n` is limited by what an admin is
  willing to click through in the UI (which is probably a few hundred
  IDs). The contact tables is indexed by ID. Maybe... if you manually
  check several thousand records, it might make a small difference. But
  if you're clicking that many, then other things are also getting more
  expensive (like the actual task). In this case, it feels like an
  unnecessary optimization.
… contact-search

The `getCount()` function is used by both contact-search and dedupe-merge use-cases.

* Contact-search
    * `CRM/Contact/Selector.php:            $countRow = CRM_Core_BAO_PrevNextCache::getCount($cacheKey, NULL, "entity_table = 'civicrm_contact'");`
* Dedupe-merge
    * `CRM/Contact/Page/AJAX.php:           $iTotal   = CRM_Core_BAO_PrevNextCache::getCount($cacheKeyString, $join, $whereClause, '=', $queryParams);`
    * `CRM/Contact/Page/DedupeMerge.php:    $total    = CRM_Core_BAO_PrevNextCache::getCount($cacheKeyString, NULL, ($onlyProcessSelected ? "pn.is_selected = 1" : NULL));`

Our aim in developing `CRM_Core_PrevNextCache_Interface` is to allow contact-search to swap a MySQL
backend with a Redis backend -- and dedupe-merge should continue as-is (whether or not Redis is
available).  This basically means:

* Contact-search switches to using `Civi::service('prevnext')->getCount()`
* Dedupe-merge continues using `CRM_Core_BAO_PrevNextCache::getCount()`

Note that the `Interface::getCount()` is simpler than the BAO's variant. This is good because:

* Contact-search doesn't need as many parameters.
* Dedupe-merge still needs all the parameters.
* Adding all parameters would make it hard to implement on other backends. (This is esp true of SQL-style options `$join` and `$where`.)
The contract feels quirky -- e.g.

* Who would guess that `markSelection()` defaults to `$action == 'unselect'`?
* What's the point of accepting `$entity_table` if it's never used?

Fortunately, this function is only called from `CRM_Contact_Page_AJAX::selectUnselectContacts`,
so it's fairly easy to audit and see that:

* The `$action` is always passed in -- it never relies on the default value.
* The `$entity_table` is never specified explicitly -- it always relies on the default value.
1. Improve docblock formatting
2. The `$entity_table` is never passed in. You can see this by grepping universe for `getSelection`.
…ing filters

Suppose you run a search ("Find Contact", "Advanced Search", "Custom Search", etc). The result screen includes:

1. Standard pagination (Previous/Next; First/Last; Jump-To)
2. Numerical option for page-size
3. Sortable columns
4. An alphabetical filter
5. Checkboxes

As you work with these options, the content of the `civicrm_prevnext_cache` table may change. This patch does
not substantively change what's in that cache, but makes of the column `cacheKey` simpler and more consistent.

Both Before and After (Unchanged)
---------------------------------
* The form's qfKey identifies the current screen/filters/cache.
* If you navigate to the next/previous page (`#1`) or adjust the page-size (`#2`), the content in `civicrm_prevnext_cache` remains the same (for the given qfKey).
* If you change the sort column (`civicrm#3`) or alphabetic filter (`civicrm#4`), the content in `civicrm_prevnext_cache` is deleted and repopulated (for the given qfKey).
* If you toggle a checkbox, the `civicrm_prevnext_cache.is_selected` property updates accordingly. These selections are retained when changing pages (`#1`/`#2`),
  but they're reset if you use sort or alphabet options (`civicrm#3`/`civicrm#4`).

Before
------
* The content of `civicrm_prevnext_cache.cacheKey` takes one of two forms, depending on whether you're using an alphabetic filter (`civicrm#4`).
    * `civicrm search {qfKey}` (typical)
    * `civicrm search {qfKey}_alphabet` (less common)
* The queries which read or delete the query-cache use a prefix+wildcard, i.e. `WHERE cacheKey LIKE 'civicrm search {qfKey}%'`.

After
-----
* The content of `civicrm_prevnext_cache.cacheKey` takes only one form
    * `civicrm search {qfKey}`
* The queries which read or delete the query-cache use an exact match, i.e. `WHERE cacheKey = 'civicrm search {qfKey}'`.`
* The text `_alphabet` does not appear in the PHP source folders (CRM, Civi, bin, api, extern, tests).

Comments
--------
In theory, one can imagine that it's desireable to keep the cached results for each of the sorted/filtered variants of the query.
That might allow the user to quickly switch among different sortings and different alphabetic-filters, or it might
allow some kind of clever management of the selections. But this is not so. As we see (both before and after), the substance
of the cache is deleted whenever the user changes `civicrm#3`/`civicrm#4`. In reality, one user browsing a search screen corresponds to exactly
one query-cache. As near as I can tell, the old code made the names change for now real reason at all.

To observe the behavior empirically, I would twiddle the UI widgets and concurrently inspect the content of the cache tables:

```
mysql> select group_name, path, FROM_BASE64(data), expired_date  from civicrm_cache where path like 'civicrm search%';
select 'Total records in' as label, cacheKey, count(*), min(id), max(id) from civicrm_prevnext_cache group by cacheKey
union select 'Selected records in ', cacheKey, count(*), min(id), max(id) from civicrm_prevnext_cache where is_selected=1 group by cacheKey;
+------------------------------+------------------------------------------------------+--------------------------------------------------------------+--------------+
| group_name                   | path                                                 | FROM_BASE64(data)                                            | expired_date |
+------------------------------+------------------------------------------------------+--------------------------------------------------------------+--------------+
| CiviCRM Search PrevNextCache | civicrm search a8ed1e2039241c41457a88f65aa8a8ee_7845 | s:52:"civicrm search a8ed1e2039241c41457a88f65aa8a8ee_7845"; | NULL         |
+------------------------------+------------------------------------------------------+--------------------------------------------------------------+--------------+
1 row in set (0.00 sec)

+----------------------+------------------------------------------------------+----------+---------+---------+
| label                | cacheKey                                             | count(*) | min(id) | max(id) |
+----------------------+------------------------------------------------------+----------+---------+---------+
| Total records in     | civicrm search a8ed1e2039241c41457a88f65aa8a8ee_7845 |        6 |     787 |     792 |
| Selected records in  | civicrm search a8ed1e2039241c41457a88f65aa8a8ee_7845 |        1 |     789 |     789 |
+----------------------+------------------------------------------------------+----------+---------+---------+
2 rows in set (0.01 sec)
```
…y_id2 from service

The PrevNext service/interface aims to be a replaceable component for use
the search-caching (but not deduping).

The interface that we produced from refactoring includes several references
to `entity_table` and `entity_id2` -- these values are part of the SQL
table, and they're needed for dedupe, but they don't seem to convey anything
meaningful for search-caching.  Including these fields makes the interface
more complicated -- which will make it hard to implement other variants.

The general gist of this commit is that we no longer fill those two columns,
and we no longer read them.

In a couple functions, we split the new OOP implementation
(`CRM_Core_PrevNextCache_Sql`) from the traditional static BAO
implementation (`CRM_Core_BAO_PrevNext`) so that we can omit these fields.
totten added 3 commits July 20, 2018 13:21
…ctions

Before
------

* Every time you create an instance of `CRM_Utils_Cache_Redis`, it makes
  a new instance of the driver class `Redis`.

After
-----

* Every time you create an instance of `CRM_Utils_Cache_Redis`, it uses
  a shared instance of the driver class `Redis` (assuming the host/port
  are the same).
* If you're writing a driver somewhere else (e.g. the upcomgin
  `CRM_Core_PrevNextCache_Redis`), you can use the shared instance
  of `Redis`.
…mory-backed or SQL-backed)

Before
------

The `prevnext` service is always an instance of `CRM_Core_PrevNextCache_Sql`.

After
-----

If you define `CIVICRM_DB_CACHE_CLASS`, and if there's a corresponding
service `prevnext.driver.{mydriver}`, then it will use that service.

Otherwise, it will use `prevnext.driver.sql` (`CRM_Core_PrevNextCache_Sql`).
@totten totten force-pushed the master-prevnext branch 2 times, most recently from bbbeac6 to 713bdad Compare July 20, 2018 20:45
@totten totten force-pushed the master-prevnext branch from 713bdad to 718fa80 Compare July 20, 2018 23:02
totten added 3 commits July 20, 2018 20:18
…k to its old place.

The general context of this code is roughly as follows: "We've already
filled up the prevnext cache with a bunch of contact-IDs.  The user wants to
view a page of 50 contacts. The prevnext cache has all the contact-IDs,
but we want to lookup full information about the specific 50 contacts
on this page."

The `CRM_Contact_BAO_Query` is good for looking up the "full information"
some contacts, but it doesn't know how to filter based on the prevnext
cache. One needs a way to add that bit of information.

Before
------

* The contract for `CRM_Core_PrevNextCache_Interface::fetch()` was extremely
  nuanced -- it basically requires you to call the Query BAO and hack
  its SQL output.
* `CRM_Core_PrevNextCache_Sql::fetch()` (and its equivalent predecessor
  `CRM_Contact_BAO_Query::getCachedContacts()`) achieved this by
   splicing some extra clause into the SQL query -- specifically,
   a `JOIN` on the `civicrm_prevnext_cache` and some
   `GROUP BY`/`ORDER BY`/`LIMIT`.
* `CRM_Core_PrevNextCache_Redis::fetch()` achieved this with two
  steps -- first, reading the contact-list from the cache; and then
  splicing an extra `WHERE` into the SQL query.
* There is not unit test coverage for `::fetch()`.

After
-----

* The contract for `CRM_Core_PrevNextCache_Interface::fetch()` is
  much simpler - you don't need to know anything about the query BAO.
  You just return one page worth of ID's from the cache.
* The SQL manipulation has moved back to its original place in
  `CRM_Contact_BAO_Query::getCachedContacts()`. However, we
  do the same manipulation regardless of cache-backend -- ie
  we always splice in the `WHERE` (and not the `JOIN`).
* There is unit test coverage for `::fetch()`.
@totten
Copy link
Member Author

totten commented Jul 24, 2018

Note: With the overall patch-set, I did a lot r-run QA for the contact-search screen following this manual test-matrix: https://docs.google.com/spreadsheets/d/1baxOKv9N55UpqBZs6ySULe6SVh3aYkfsMj_stSBV0RA/edit?usp=sharing

I mention this because we're now breaking it up into smaller chunks for review purposes. It may be useful in reviewing the cherry-picked pieces to know that the contact-search works in the overall patch-set.

However, I have not done such careful r-run QA on each individual patch, nor have I r-run the other use-cases that might reference the prevnext code (dedupe/merge, custom search, or campaign/survey/respondent).

The new name is prettier and matches the names in `CRM_Core_PrevNextCache_{Interface,Redis}`.
@eileenmcnaughton
Copy link
Contributor

@totten closing as this is just a tracking branch now I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants