Concurrency issue between get(K key, Callable<? extends V> valueLoader) and invalidate(Object key) #1881

gmaes · 2014-11-05T14:40:30Z

Hi,

I encountered a concurrency issue between the "get(K key, Callable<? extends V> valueLoader)" and "invalidate(Object key)" methods with a basic (and i suppose a common) usage of the cache which let a stale value in the cache.

Here is the use case:

The cache is initially empty
1 thread is getting a value in the callable with a given key while an other thread is invalidating the same given key

Thread 1

Bean myBean = cache.get(ID, new Callable<Bean>() {
    @Override
    public Bean call() throws Exception {
        Bean bean = loadFromDB(ID); // (1)
        return bean; // (4)
    }
});

Thread 2

// Update just one property of the bean in DB
updatePartialDataInDB(ID, "newValue1"); // (2)
// Then, we need to invalidate the cache
cache.invalidate(ID); // (3)

The execution sequence order is marked with // (number)
After the point // 4, I have the old object in myBean variable which is fine.
However, If i try to get again the bean with the identifier ID, I expect to have no value in the cache but a fresh bean reloaded from the DB.
Unfortunately, that is not the case and a stale value is kept in the cache.

Is that a wanted behavior?

In LocalCache$Segment#remove(Object key, int hash) line 3081 to 3087 (guava 17.0):

RemovalCause cause;
if (entryValue != null) {
    cause = RemovalCause.EXPLICIT;
} else if (valueReference.isActive()) {
    cause = RemovalCause.COLLECTED;
} else {
    // currently loading
    return null;
}

Shouldn't a particular process be done before return null ?

Thanks

gmaes · 2014-12-04T09:02:10Z

Hi,

Do you plan to fix this issue for the next release?
If so, do you have an ETA?
This is a blocking issue for us.
Else, should we patch the code meanwhile?

Thanks

lowasser · 2014-12-15T21:12:57Z

My first reaction is that the current semantics make some sense: an entry still being computed isn't really in the cache; it's not really there to be invalidated?

sereda · 2014-12-15T21:38:54Z

Hi,

Speaking of semantics, it's not value that is being invalidated, it's the key. The key is certainly there - and the value is being calculated.

Given that value V(K) is calculated from some other value X(K) (a row in DB, in this sample case), invalidate(K) should mean that any values based on values of X(K) retrieved up to the moment of invalidation are subject for recalculation.

Otherwise, the cache is pretty much useless in a simple case of caching a database row.

lowasser · 2014-12-15T21:46:21Z

Let's assume that we did what you're saying: we invalidated the pending fetch. What exactly should the ongoing get() call from the other thread return? Restarting the load() call seems awkward, if it's even possible.

gmaes · 2014-12-15T22:00:17Z

The problem does not come from the current pending fetch.
As i said, this is fine to return the not up to date bean at step 4 but this one must not be kept in the cache entry for next fetches.

lowasser · 2014-12-15T22:02:12Z

Okay. If I understand correctly, that stale value will never be stored, but will be returned from the get?

baltiyskiy · 2014-12-15T22:03:39Z

I'd say this would be the expected behaviour.

lowasser · 2014-12-15T22:15:43Z

I don't think that's 100% obvious -- I suspect there are cases where you don't want the value-in-flight to be invalidated -- and I'd also expect this to require a fairly significant rewrite to the internals of Cache, but we can research this further.

cpovirk · 2014-12-15T22:20:31Z

I don't really have anything concrete to say here, but I remember that, when it came to CacheBuilder/MapMaker, fry@ used to talk about "linearizability" (e.g., in internal bug 1774366). That might be the property that we want. But I don't know what that would imply for the behavior here, let alone whether implementing that behavior is feasible (if in fact it differs from the current behavior).

baltiyskiy · 2014-12-15T22:23:21Z

I don't think that's 100% obvious -- I suspect there are cases where you don't want the value-in-flight to be invalidated

@lowasser I'm trying out the implementation in my private fork. I had to revisit some tests, and from what I see, there is no good reason for the current behaviour. Some tests actually assume that the value does not change when the invalidation happens, which contradicts the purpose of invalidation.

All existing tests are currently green, and ones I added too; but I need to add more tests to check subtler consequences of my changes. I can't guarantee that I will do it fast, but if you have time, let me have a try on this one.

lowasser · 2014-12-15T22:40:06Z

Please do; I think we'd welcome contributions on this.

txshtkckr · 2015-04-12T00:27:47Z

This is causing some very serious issues for us. I would argue that anyone using a lazy-loading cache would expect a happens-before relationship between invalidate and a following call to get, but since it can piggy-back on an existing load operation that fetched data before invalidate was called, that relationship does not exist.

I started poking around in here to try to fix it, but as @lowasser hinted, it isn't trivial. The main problems are

1. Whether to remove the LoadingValueReference, block on it, or somehow mark it as stale; and
2. How to rework the removal notifications

This is sufficiently painful that at this point I'm inclined to switch to another library, instead.

baltiyskiy · 2015-04-12T13:58:06Z

@txshtkckr Please see https://github.com/baltiyskiy/guava -- I've went with the "remove LoadingValueReference" approach, firing removal notification for the value when it is loaded. It's generally done, but I'd like to add some tests cases there. There's one known problem (regaring modCount) which I need to poke more.

I've put it on a pause now because I don't have much time due to other commitments, but I intend to finish it.

protogenes · 2017-05-29T12:06:05Z

I second @sereda expectation of causality, in that a loaded value V(K) must be updated when the key is invalidated.
Has there been a consensus how to work around this particular problem? If so, it should be included in https://github.com/google/guava/wiki/CachesExplained

The asMap().compute implementation did not take into account that the present value may be loading. A load does not block other writes to that entry and takes into account that it may be clobbered, causing it to automatically discard itself. This is a known design choice that breaks linearizability assumptions (google#1881). The compute should check if a load is in progress and call the appropriate internal removal method. Because a zombie entry remained in the cache and still is marked as loading, the loader could discover entry and try to wait for it to materialize. When the computation is a removal, indicated by a null value, the loader would see this as the zombie's result. Since a cache loader may not return null it would throw an exception to indicate a user bug. A new ComputingValueReference resolves both issues by indicating that the load has completed. The compute's removeEntry will then actually remove this entry and the loader will not wait on the zombie. Instead if it observes the entry, it will neither receive a non-null value or wait for it to load, but rather try to load anew under the lock. This piggybacks on the reference collection support where an entry is present but its value was garbage collected, causing the load to proceed. By the time the lock is obtained the compute method's entry was removed and the load proceeds as normal (so no unnecessary notification is produced). fixes google#5342 fixes google#2827 resolves underlying cause of google#2108

findepi · 2022-01-10T09:26:08Z

@cpovirk sorry for refreshing an old issue, it seems still relevant.
Did you come to agreement on what's the expected behavior?

For me, "get() after invalidate() should return fresh state" is the expected behavior.

ben-manes · 2022-01-10T11:36:00Z

Caffeine resolves this for explicit keyed operations, but not for clearing out the whole cache. That is because it decorates, rather than forks, ConcurrentHashMap which suppresses in-flight loads from appearing in its iterators. A trivial workaround to that is to append a generational id to the key so that existing entries cannot be retrieved and a stale in-flight load detected by the reader to possibly retry.

findepi · 2022-01-10T12:13:03Z

@ben-manes thanks for the update about Caffeine.

I am still looking for the Guava team's perspective on the problem.
In particular, would a patch be accepted? what solution categories have been discarded already?
What are acceptable drawbacks, if any, for a potential fix?

@lowasser 's comment from 2014 (#1881 (comment)) suggests a patch would be welcome, but there was also a concern about value-in-flight invalidation (#1881 (comment)). Of course, both are couple years old, so they not necessarily describe current state.

Edit: a Guava Cache wrapper that doesn't suffer from invalidation race was extracted from Trino project as a standalone library: https://github.com/findepi/evictable-cache

Due to google/guava#1881, invalidation does not invalidate any loading that is in progress. In order to overcome this a generationId field was added to the cache keys. However it is partial fix, the issue still may exists when entries are invalidated per key.

The `ContentCache` uses Caffeine. Caffeine offers partial solution to google/guava#1881 race condition problem present in Guava Cache implementation, but not when `invalidateAll` is used. To avoid accidental incorrect use of ContentCache, remove the `invalidateAll` method, which can be deceptive for the caller.

This method does only best-effort invalidation and is susceptible to a race condition. If the caller changed the state that could be cached (perhaps files on the storage) and calls this method, there is no guarantee that the cache will not contain stale entries some time after this method returns. This is a similar problem as the one described at google/guava#1881. `ContentCache` doesn't use a Guava Cache, it uses Caffeine. Caffeine offers partial solution to this issue, but not for `invalidateAll` call. To avoid accidental incorrect use of ContentCache, deprecate the `invalidateAll` method, which can be deceptive for the caller and remove it later.

* Deprecate ContentCache.invalidateAll This method does only best-effort invalidation and is susceptible to a race condition. If the caller changed the state that could be cached (perhaps files on the storage) and calls this method, there is no guarantee that the cache will not contain stale entries some time after this method returns. This is a similar problem as the one described at google/guava#1881. `ContentCache` doesn't use a Guava Cache, it uses Caffeine. Caffeine offers partial solution to this issue, but not for `invalidateAll` call. To avoid accidental incorrect use of ContentCache, deprecate the `invalidateAll` method, which can be deceptive for the caller and remove it later. * Document ContentCache.invalidate blocking nature * empty

cgdecker added package=cache type=defect Bug, not working as expected labels Nov 5, 2014

cgdecker assigned lowasser Nov 5, 2014

ben-manes mentioned this issue Apr 19, 2015

invalidateAll does not remove loading values if there is a removal listener ben-manes/caffeine#13

Closed

fdlk mentioned this issue Nov 30, 2016

L2 Cache contains second-last state of JobExecution entity molgenis/molgenis#5655

Closed

ben-manes mentioned this issue Mar 29, 2017

Feature request: non-blocking invalidate() in AsyncLoadingCache ben-manes/caffeine#148

Closed

protogenes mentioned this issue May 29, 2017

Recursive cache loading never terminates ben-manes/caffeine#89

Closed

ben-manes mentioned this issue Feb 28, 2018

Synchronous Cache might block on put() ben-manes/caffeine#224

Closed

kluever added the P3 no SLO label Aug 8, 2019

shixuan-fan mentioned this issue Aug 19, 2020

Add support for versioning metastore API prestodb/presto#15044

Merged

ben-manes mentioned this issue Dec 5, 2020

Fix compatibility between the cache compute methods and a load #5348

Closed

kokosing mentioned this issue Jan 10, 2022

Metadata caching may return incorrect data trinodb/trino#10512

Closed

findepi mentioned this issue Jan 12, 2022

retry to connect to hive metastore with new delegation token. trinodb/trino#10561

Merged

ben-manes mentioned this issue Jan 21, 2022

Use Caffeine for request aggregation cryptomator/cloud-access-java#6

Merged

findinpath mentioned this issue Feb 2, 2022

Invalidate CachingDirectoryLister when writing through Trino trinodb/trino#10915

Merged

findepi mentioned this issue Jun 14, 2024

Core: Deprecate ContentCache.invalidateAll apache/iceberg#10494

Merged

ben-manes mentioned this issue Aug 26, 2024

v2 contracts maypok86/otter#74

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrency issue between get(K key, Callable<? extends V> valueLoader) and invalidate(Object key) #1881

Concurrency issue between get(K key, Callable<? extends V> valueLoader) and invalidate(Object key) #1881

gmaes commented Nov 5, 2014

gmaes commented Dec 4, 2014

lowasser commented Dec 15, 2014

sereda commented Dec 15, 2014

lowasser commented Dec 15, 2014

gmaes commented Dec 15, 2014

lowasser commented Dec 15, 2014

baltiyskiy commented Dec 15, 2014

lowasser commented Dec 15, 2014

cpovirk commented Dec 15, 2014

baltiyskiy commented Dec 15, 2014

lowasser commented Dec 15, 2014

txshtkckr commented Apr 12, 2015

baltiyskiy commented Apr 12, 2015

protogenes commented May 29, 2017

findepi commented Jan 10, 2022

ben-manes commented Jan 10, 2022

findepi commented Jan 10, 2022 •

edited

Loading

Concurrency issue between get(K key, Callable<? extends V> valueLoader) and invalidate(Object key) #1881

Concurrency issue between get(K key, Callable<? extends V> valueLoader) and invalidate(Object key) #1881

Comments

gmaes commented Nov 5, 2014

gmaes commented Dec 4, 2014

lowasser commented Dec 15, 2014

sereda commented Dec 15, 2014

lowasser commented Dec 15, 2014

gmaes commented Dec 15, 2014

lowasser commented Dec 15, 2014

baltiyskiy commented Dec 15, 2014

lowasser commented Dec 15, 2014

cpovirk commented Dec 15, 2014

baltiyskiy commented Dec 15, 2014

lowasser commented Dec 15, 2014

txshtkckr commented Apr 12, 2015

baltiyskiy commented Apr 12, 2015

protogenes commented May 29, 2017

findepi commented Jan 10, 2022

ben-manes commented Jan 10, 2022

findepi commented Jan 10, 2022 • edited Loading

findepi commented Jan 10, 2022 •

edited

Loading