feat(redis): enable tuning of caching parameters #129

hannahhoward · 2025-02-13T21:59:51Z

Goals

Currently, our cache hit rate for some of our redis instances is really low, because it turns out 1 hour really isn't long enough to hold some of this data. There's not reason not to hold it longer -- we've allocated almost 10GB to these redis instances and we're currently using all of a couple MB.

Implementation

Enable passing a specific expiration parameter to each redis store
Pipe all the way out to an AWS env var
Up expirations times to:
- 1 month on prod, 1 week on staging for providers (very small data)
- 1 week on prod, 1 day on staging for claims
- 1 day on prod, 1 hour on staging for indexes (larger data)

codecov · 2025-02-13T22:03:03Z

Codecov Report

Attention: Patch coverage is 18.66667% with 61 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
pkg/aws/service.go	0.00%	36 Missing ⚠️
pkg/construct/construct.go	0.00%	18 Missing ⚠️
pkg/redis/redisstore.go	53.33%	6 Missing and 1 partial ⚠️

Files with missing lines	Coverage Δ
pkg/redis/contentclaimsstore.go	`76.92% <100.00%> (ø)`
pkg/redis/providerstore.go	`100.00% <100.00%> (ø)`
pkg/redis/shareddagindexstore.go	`62.50% <100.00%> (ø)`
pkg/redis/redisstore.go	`79.48% <53.33%> (-6.88%)`	⬇️
pkg/construct/construct.go	`0.00% <0.00%> (ø)`
pkg/aws/service.go	`0.00% <0.00%> (ø)`

... and 1 file with indirect coverage changes

frrist

One blocking comment wrt env var misspelling, LGTM otherwise.

frrist · 2025-02-13T22:04:20Z

pkg/aws/service.go

+	stringValue := mustGetEnv(envVar)
+	value, err := strconv.ParseInt(stringValue, 10, 64)
+	if err != nil {
+		panic(fmt.Errorf("parsing env var %s to int: %w", envVar, err))


suggestion: we may want to include the stringValue returned that was expected to be an int, although I suspect this is a very rare panic.

frrist · 2025-02-13T22:08:34Z

deploy/app/lambda.tf

+        PROVIDERS_CACHE_EXPIRATION_SECONDS = "${terraform.workspace == "prod" ? 30 * 24 * 60 * 60 : 24 * 60 * 60}"
        INDEXES_REDIS_URL = aws_elasticache_serverless_cache.cache["indexes"].endpoint[0].address
        INDEXES_REDIS_CACHE = aws_elasticache_serverless_cache.cache["indexes"].name
+        INDEXES_CACHE_EXPIRATION_SECONDS = "${terraform.workspace == "prod" ? 24 * 60 * 60 : 60 * 60}"
        CLAIMS_REDIS_URL = aws_elasticache_serverless_cache.cache["claims"].endpoint[0].address
        CLAIMS_REDIS_CACHE = aws_elasticache_serverless_cache.cache["claims"].name
+        CLAIMS_CACHE_EXPIRATION_SECONDS = "${terraform.workspace == "prod" ? 7 * 24 * 60 * 60 : 24 * 60 * 60}"


✔️ math is math-ing

frrist · 2025-02-13T22:11:22Z

pkg/aws/service.go

-		LegacyDataBucketURL:         mustGetEnv("LEGACY_DATA_BUCKET_URL"),
-		HoneycombAPIKey:             os.Getenv("HONEYCOMB_API_KEY"),
-		PrincipalMapping:            principalMapping,
+		ProvidersCacheExpirationSeconds: mustGetInt("PROVIDER_CACHE_EXPIRATION_SECONDS"),


blocking: PROVIDERS_CACHE_EXPIRATION_SECONDS

frrist · 2025-02-13T22:16:52Z

Question: What's our motivation for using Time-Based Expiry rather than Capacity-Based Expiry (LRU or LFU) for this cache? (Or maybe we are using both already?)

hannahhoward · 2025-02-13T22:21:16Z

Capacity based expiry is already built into AWS. :) (and will expire those with expiries first -- goodness I hope I configured that right though -- should probably go back and review :P)

hannahhoward · 2025-02-13T22:26:32Z

ah yes -- https://docs.aws.amazon.com/AmazonElastiCache/latest/dg/Scaling.html#:~:text=ElastiCache%20Serverless%20also%20supports%20a,from%20Replica%20using%20READONLY%20connections).

"When you set a cache data storage maximum and your cache data storage hits the maximum, ElastiCache will begin evicting data in your cache that has a Time-To-Live (TTL) set, using the LRU logic."

The reason for setting a TTL at all is that we want to distinguish between stuff that's already on IPNI and stuff we actually pre-cache on the indexer while it gets to IPNI. we want the pre-cache stuff to have no expiration until we get to IPNI, so it doesn't get deleted.

Also our capacity is 10GB but we get charged by the GB for what we actually use, so if our hit ration is high even with expiry, it's better to not use all 10GB at most times

frrist approved these changes Feb 13, 2025

View reviewed changes

feat(redis): enable tuning of caching parameters

875744b

hannahhoward force-pushed the feat/tune-caching-params branch from fd6a513 to 875744b Compare February 13, 2025 22:13

hannahhoward merged commit dd0b64e into main Feb 13, 2025
8 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(redis): enable tuning of caching parameters #129

feat(redis): enable tuning of caching parameters #129

hannahhoward commented Feb 13, 2025

codecov bot commented Feb 13, 2025 •

edited

Loading

frrist left a comment

frrist Feb 13, 2025

frrist Feb 13, 2025

frrist Feb 13, 2025

frrist commented Feb 13, 2025 •

edited

Loading

hannahhoward commented Feb 13, 2025

hannahhoward commented Feb 13, 2025

feat(redis): enable tuning of caching parameters #129

feat(redis): enable tuning of caching parameters #129

Conversation

hannahhoward commented Feb 13, 2025

Goals

Implementation

codecov bot commented Feb 13, 2025 • edited Loading

Codecov Report

frrist left a comment

Choose a reason for hiding this comment

frrist Feb 13, 2025

Choose a reason for hiding this comment

frrist Feb 13, 2025

Choose a reason for hiding this comment

frrist Feb 13, 2025

Choose a reason for hiding this comment

frrist commented Feb 13, 2025 • edited Loading

hannahhoward commented Feb 13, 2025

hannahhoward commented Feb 13, 2025

codecov bot commented Feb 13, 2025 •

edited

Loading

frrist commented Feb 13, 2025 •

edited

Loading