-
-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(dev/core#635) Implement local array-cache for use with Redis/Memcache #13496
Conversation
(Standard links)
|
3455831
to
e0d7aa4
Compare
Before ------ * No way to daisy-chain caches to form a cache hierarchy * ArrayCache::reobjectify() would fail to reobjectify objects in an array, which seems against the spirit of PSR-16 After ----- * You can create a cache hierarchy with `new CRM_Utils_Cache_Tiered([$fastCache, $mediumCache, $slowCache])` * ArrayCache::reobjectify() will reobjectify if it detects an object directly in an array Note: To ensure that TTL data is respected consistently regardless of how the tiers behave and the order in which they are used, the TTL/expiration must be stored extra times.
This allows you to put a static array in front of another cache. It is the same basic idea as CRM_Utils_Cache_Tiered, but it's optimized for a typical case where you only want one front-cache. Based on some naive benchmarking (performing several trials with a few thousand duplicate reads over the same cached data), this basically cut the read-time in half. The following is pretty representative of the results: ``` Redis-only cache write=0.1044s read=1.3266s 2-Tier (ArrayCache+Redis) write=0.1189s read=0.3765s Decorated-Redis cache write=0.1105s read=0.1505s ``` See also: https://gist.github.com/totten/6d6524be115c193e0704ff3cf250336d Note: To ensure that TTL data is respected consistently regardless of how the tiers behave and the order in which they are used, the TTL/expiration must be stored extra times.
This adds and documents a new config option which can be passed into the cache factory. The option, `withArray`, indicates that we prefer to have a thread-local array acting as an extra cache-tier.
e0d7aa4
to
84413ec
Compare
I've done some testing on this. Using Redis with or without this PR a 'normal' page load on a site with Shoreditch installed hits the Redis cache 26 times to get the key for the extension container - in this case With this PR I was, however, able to make the below one-liner to get it to use the ArrayDecorator and as expected it only hit Redis once, getting the remaining 25 instances from memory. In previous testing switching to Redis didn't give a speed performance until I addressed a bunch of places which didn't have array caching & were falling back on Redis all the time (but had been using array caching in conjunction with mysql per Tim's comments above) . At the time I fixed through Civi::statics - however, this makes it much more possible to replace that with something more correct. I got similar results with the fast array caching but didn't dig into the tiered caching as it's not currently in use & felt a bit obscure
|
Note that I think we should follow this up with a change to caching on the Extension System cache! |
Agree! There's ~4 that I think we should change. |
Overview
Consumers of Redis or Memcached currently trigger I/O anytime they read a value from the cache -- even if the value has been read before. This is OK if the caller is organized to do one read. It can be even good in some edge-cases of inter-process communication. However, if the cache is read frequently, this can lead to a lot of redundant reads; and there are certainly use-cases where we have redundant reads.
This PR allows one to define a cache-hierarchy -- e.g. combining a thread-local array with an external cache (like Redis, Memcache, SQL, or a on-disk file).
Context
This is an off-shoot/subset of dev/core#635 and #13489. For
dev/core#635
, we want to minimize unnecessary SQL writes (e.g. by directing all caches to a non-SQL cache-service). The examples I've got all useCRM_Core_BAO_Cache
, which is hard-coded with 2-tier cache-hierarchy (thread-local array-cache plus SQL-cache).Concurrently, there've been critiques that some of the existing Redis/Memcache consumers should be using 2-tier cache-hierarchy.
In subsequent PRs, we'll want to create a "cache-hierarchy". With these utilities, it can be done with only 1-2 lines of extra code, and it will be PSR-16 compliant.
This depends on #13500.
Before
CRM_Utils_Cache::create(...type => *memory*...)
returns an instance for direct-access to the underlying cache.After
CRM_Utils_Cache::create()
accepts an optionwithArray => FALSE|TRUE|fast
, which allows you to request a Redis/Memcache instance which uses a thread-local array.CRM_Utils_Cache_ArrayDecorator
implements a PSR-16 compliant wrapper. It takes any cache and puts a local-array in front of it.CRM_Utils_Cache_FastArrayDecorator
does the same thing, but it's not PSR-16 compliant. It sacrifices some correctness in order to improve performance.CRM_Utils_Cache_Tiered
allows construction of arbitrary N-tier cache hierarchies. WhereasArrayDecorator
allows two specific tiers (e.g.array => $delegate
),Tiered
can be used with stacks (likearray => redis => sql
orredis => sql => file
).Technical Details / Comments
I originally drafted the implementation of
Tiered
because it was a more flexible design. Then I ran a naive benchmark to compare the 2-tier hierarchy (array=>redis
) against directredis
and found... it was about the same, and sometimes slower!There were two basic reasons for this:
Usage patterns: Tiering isn't a universal good; it depends on usage patterns. If you just write a record, read that record one time, and then repeat with different records... then tiers suck. There's no gain from avoiding reads, and writes are more expensive. But if you re-read data several times, then it helps a lot. I needed to change the benchmark to compare performance in that usage pattern.
PSR-16 Compliance: PSR-16 standardizes certain edge-cases, esp: (a) TTL/expiration, which leads to an extra cost for enforcing consistent expiration times and (b) mutability of
object
s, which leads to an extra cost for copying in-memory objects ($copy=deserialize(serialize($original))
).The revised benchmark script lets us specify the #read operations. The
ArrayDecorator
andFastArrayDecorator
were attempts to squeeze out more performance. And the benchmarks improved. In the figures below, we measure the write-time (writing 1000 items) and read-time (reading 200 items). Note how it checks performance with differentreadPerItem
values (i.e. each item is read once; or read 10 times; or read 150 times).If there's only 1 read (no re-reading), then direct-access is best; all others have overhead. The overhead diminishes as you go from
Tiered
toArrayDecorator
toFastArrayDecorator
.Why include all three -- why not just do one? Well, (1) they're already written, and the parent unit-test goes a long way to showing the correctness of each, so it's not much difference cost-wise. (2) The best one actually depends on the situation -- why you're using a cache, how big the data-records are, how frequently the records are read. That means the decision about which mechanism to use (in which use-cases) should not be in this PR (which just provides the utility). The decision should be made when we use the cache-hierarchy (in subsequent PRs).
Reviewer Tips
Some things that might help in evaluating this:
E2E_Cache_TieredTest
), and the test classes are derived fromE2E_Cache_CacheTestCase
and\Cache\IntegrationTests\LegacySimpleCacheTest
. That class originates with thephp-cache
project and provides pretty good test-coverage for functions likeget($key, $default=NULL)
,set($key,$value,$ttl=NULL)
, andhas($key)
.cv cli
. You may not have Redis or Memcache locally, but you can pick any cache-driver (CRM_Utils_Cache_SqlGroup
,CRM_Utils_Cache_ArrayCache
).$a, $b, ...
) and an overarching cache ($z
), like one of these:$a=new CRM_Utils_Cache_ArrayCache([]); $z=new CRM_Utils_Cache_ArrayDecorator($a);
$a=new CRM_Utils_Cache_ArrayCache([]); $b=new CRM_Utils_Cache_ArrayCache([]); $z = new CRM_Utils_Cache_Tiered([$a, $b]);
$a=CRM_Utils_Cache::create(['name'=>'foo-mem', 'type'=>['*memory*']]); $b=CRM_Utils_Cache::create(['name'=>'foo-sql', 'type'=>['SqlGroup']]); $z=new CRM_Utils_Cache_Tiered([$a,$b])
$z->set('foo',123);
$z->get('foo')
$a->get('foo')
print_r($z);
print_r(['a'=>$a, 'b'=>$b, 'z'=>$z])