Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to activate FAIL-SAFE (no entries in memory or distributed) on multiple requests #38

Closed
arnoldsi-vii opened this issue Jan 27, 2022 · 13 comments

Comments

@arnoldsi-vii
Copy link

Hi,
We are testing your cache implementation, well done for a great work. During Load testing on One API call we started to get the Unable to activate FAIL-SAFE (no entries in memory or distributed) warning.
It managed to set data on high volume of requests, but after many many misses
Here is my code:

public async Task<Data> GetDataAsync(string identifier, IdentifyBy identifyBy)
        {
            var cacheKey = $"Info:{identifier}:{identifyBy}";
            var data = await _cache.GetAsync<Data>(cacheKey);

            if(data == null)
            {
               data = new Data { Id = 1 };
               await _cache.SetAsync(cacheKey , data, TimeSpan.FromMinutes(10));
            } else
            {
                _logger.LogInformation($"Successfully got Data Information  from Cache. identifier: {identifier} by { identifyBy }");
            }
        }

@jodydonetti
Copy link
Collaborator

jodydonetti commented Jan 28, 2022

Hi @arnoldsi-vii , thanks fro trying out FusionCache!

we started to get the Unable to activate FAIL-SAFE (no entries in memory or distributed) warning.

Thanks for pointing this out, I should probably change that logging.
Let me explain.
What seems to be an effective warning-level log entry, in reality is more like a trace-level log entry: basically it is saying "hey, I tried to see if there was a valid cache entry to eventually use as a fallback when it is needed, but I didn't find any". Thing is, this is totally fine! For example every first time you ask for a specific cache entry this would happen, since the very first time there cannot be a stale data to use.

So what I'm saying is that there's nothing to worry about.

The "problem" is that I'm using the same log level used for fail-safe activation (by default is warning, see here) even for this message (see here), whereas I should've used something different, like trace for example.

I'll change this in the next release, thanks!

Here is my code:

Looking at your code and I'm wondering if you have a reason to use 2 calls (a "get" + a "set") instead of a single GetOrSet call, which is highly optimized and prevents things like cache stampede (see here for more info).
Also, which "get" method are you actually using? I'm asking because you wrote GetAsync but the only "get" methods available are:

  • TryGetAsync
  • GetOrDefaultAsync
  • GetOrSetAsync

Which one you used?

It managed to set data on high volume of requests, but after many many misses

I'd like to understand more about this: it may be the fact you are not using the GetOrSet method as I've said above, or maybe something else. Without the use of GetOrSet what happens is that all the "get"s finish without finding any data (so you'll get a lot of MISS), followed by a lot of "set" calls (whereas only 1 per each key would be needed) and that is basically what a cache stampede problem is.

Try with this code:

public async Task<Data> GetDataAsync(string identifier, IdentifyBy identifyBy)
{
	var cacheKey = $"Info:{identifier}:{identifyBy}";
	var data = await _cache.GetOrSetAsync<Data>(
		cacheKey,
		_ => {
			_logger.LogInformation($"Data Information was not in the cache. identifier: {identifier} by { identifyBy }");
			return new Data { Id = 1 };
		},
		TimeSpan.FromMinutes(10)
	);
	return data;
}

Or, if you want an even shorter version without the logging line:

public async Task<Data> GetDataAsync(string identifier, IdentifyBy identifyBy)
{
	return await _cache.GetOrSetAsync(
		$"Info:{identifier}:{identifyBy}",
		new Data { Id = 1 },
		TimeSpan.FromMinutes(10)
	);
}

Please try one of these versions and let me know.

@arnoldsi-vii
Copy link
Author

Hi @jodydonetti,
Thank you for your response. We do using GetOrDefaultAsync and the reason for separate Get and Set is due legacy issues that we cannot change at this point.

@jodydonetti
Copy link
Collaborator

jodydonetti commented Jan 28, 2022

Ok, so to recap: because of legacy issues you cannot switch from 2 calls (a get + a set) to 1 call (GetOrSet). But by sticking to separate get and set calls you cannot use the cache stampede prevention, therefore you will keep seeing a lot of cache MISS + some cache SET, and this is true for FusionCache, EasyCaching and every solution out there (basically without using a specific method there's no way to optimize that flow).

Having said this, is there anything else I can help you with?

PS: of course I will still update the logging message about being Unable to activate FAIL-SAFE (no entries in memory or distributed), since that should be trace-level and not warnign-level 👍

@arnoldsi-vii
Copy link
Author

@jodydonetti I will keep testing the library. Even probably will put on one of our services that run in production.

keep up with great work!

@jodydonetti
Copy link
Collaborator

Meanwhile I published an alpha2 release, containing the change in the log level we discussed here.

@jodydonetti
Copy link
Collaborator

Hi @arnoldsi-vii can I close this issue or is there anything else I can do?

Btw I'm still intrigued to understand why you can use 2 calls (Get+Set ) but not 1 single call (GetOrSet) in your GetDataAsync(string identifier, IdentifyBy identifyBy) method. If it's private code/reasons and you don't want to share I can understand, it's just that now I'm curious 😬

@arnoldsi-vii
Copy link
Author

@jodydonetti long time ago we had deadlock situation when we used GetOrSet with one item required by other. It was design flow, but back then we best solution (time wise) was separate them. Now changing everything back to GetOrSet will create a lot of regression tests for us. New methods are using GetOrSet :)

@jodydonetti
Copy link
Collaborator

@jodydonetti long time ago we had deadlock situation when we used GetOrSet with one item required by other. It was design flow, but back then we best solution (time wise) was separate them. Now changing everything back to GetOrSet will create a lot of regression tests for us. New methods are using GetOrSet :)

Got it, makes sense! And thanks for the explanation 👍

@arnoldsi-vii arnoldsi-vii reopened this Jan 31, 2022
@arnoldsi-vii
Copy link
Author

@jodydonetti I kept testing the library on 2 projects. One is updating the data, the other is API for read data.
Both connected to same Redis server.
When both applications are loaded both reading the data properly.
Now the "Save data" service is removing old Redis entry and creating a new one.
After the save process the app getting unable to activate FAIL-SAFE (no entries in memory or distributed), but it's not all. The Redis record is updated properly, but when second application is making an api call, it's getting the data saved in memory and not a new data that Redis contains.

Do you have an idea what might be an issue?

This is the configuration:

services.AddFusionCache(options =>
            {
                options.DefaultEntryOptions = new FusionCacheEntryOptions
                {
                    // CACHE DURATION
                    Duration = TimeSpan.FromMinutes(15),

                    // FAIL-SAFE OPTIONS
                    IsFailSafeEnabled = true,
                    FailSafeMaxDuration = TimeSpan.FromHours(2),
                    FailSafeThrottleDuration = TimeSpan.FromSeconds(30),

                    // FACTORY TIMEOUTS
                    FactorySoftTimeout = TimeSpan.FromMilliseconds(200),
                    FactoryHardTimeout = TimeSpan.FromMilliseconds(1500),

                    // DISTRIBUTED CACHE
                    DistributedCacheSoftTimeout = TimeSpan.FromSeconds(1),
                    DistributedCacheHardTimeout = TimeSpan.FromSeconds(2),
                    AllowBackgroundDistributedCacheOperations = false,
                    

                    // JITTERING
                    JitterMaxDuration = TimeSpan.FromSeconds(2)
                };

                // DISTIBUTED CACHE CIRCUIT-BREAKER
                options.DistributedCacheCircuitBreakerDuration = TimeSpan.FromSeconds(2);

                // CUSTOM LOG LEVELS
                options.FailSafeActivationLogLevel = LogLevel.Warning;
                options.SerializationErrorsLogLevel = LogLevel.Warning;
                options.DistributedCacheSyntheticTimeoutsLogLevel = LogLevel.Warning;
                options.DistributedCacheErrorsLogLevel = LogLevel.Error;
                options.FactorySyntheticTimeoutsLogLevel = LogLevel.Debug;
                options.FactoryErrorsLogLevel = LogLevel.Error;
            });

I would like to know if anyone in the community talked same issues

@wileyveteran
Copy link

@arnoldsi-vii I think you will need the backplane (it is alpha release right at this moment). #11

you can probably reduce the impact somewhat by decreasing the cache timeout on the read API so that it picks up any changed values much sooner.

@arnoldsi-vii
Copy link
Author

@wileyveteran from my understanding the current version also supposed to sync cache

@jodydonetti
Copy link
Collaborator

Hi @arnoldsi-vii what version are you using?

Anyway @wileyveteran is correct in saying that to have data in sync between 2 nodes (or in general 2 app instances, even in the same node) you need the backplane #11, which has been just released in alpha (see here).

To be clear, the backplane is the actual mechanism used to have different memory cache instances talk to each others to sync cache entries, so that when one is removed on one hand, it will be also removed on all the others.

Let me know if there's something I can help you with.

@jodydonetti
Copy link
Collaborator

Hi all, just wanted to update you on the next version: I released right now the BETA2 for the next big release, which includes among other small things a big fix for the DI setup part.

This will probably be the last release before the official one.

Closing this issue since as of now I don't see anything to do here. Happy to re-open in case I'm missing something.

Thanks all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants