Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(redis): enable tuning of caching parameters #129

Merged
merged 1 commit into from
Feb 13, 2025

Conversation

hannahhoward
Copy link
Member

Goals

Currently, our cache hit rate for some of our redis instances is really low, because it turns out 1 hour really isn't long enough to hold some of this data. There's not reason not to hold it longer -- we've allocated almost 10GB to these redis instances and we're currently using all of a couple MB.

Screenshot 2025-02-13 at 1 30 19 PM

Implementation

  • Enable passing a specific expiration parameter to each redis store
  • Pipe all the way out to an AWS env var
  • Up expirations times to:
    • 1 month on prod, 1 week on staging for providers (very small data)
    • 1 week on prod, 1 day on staging for claims
    • 1 day on prod, 1 hour on staging for indexes (larger data)

Copy link

codecov bot commented Feb 13, 2025

Codecov Report

Attention: Patch coverage is 18.66667% with 61 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pkg/aws/service.go 0.00% 36 Missing ⚠️
pkg/construct/construct.go 0.00% 18 Missing ⚠️
pkg/redis/redisstore.go 53.33% 6 Missing and 1 partial ⚠️
Files with missing lines Coverage Δ
pkg/redis/contentclaimsstore.go 76.92% <100.00%> (ø)
pkg/redis/providerstore.go 100.00% <100.00%> (ø)
pkg/redis/shareddagindexstore.go 62.50% <100.00%> (ø)
pkg/redis/redisstore.go 79.48% <53.33%> (-6.88%) ⬇️
pkg/construct/construct.go 0.00% <0.00%> (ø)
pkg/aws/service.go 0.00% <0.00%> (ø)

... and 1 file with indirect coverage changes

Copy link
Member

@frrist frrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One blocking comment wrt env var misspelling, LGTM otherwise.

stringValue := mustGetEnv(envVar)
value, err := strconv.ParseInt(stringValue, 10, 64)
if err != nil {
panic(fmt.Errorf("parsing env var %s to int: %w", envVar, err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: we may want to include the stringValue returned that was expected to be an int, although I suspect this is a very rare panic.

Comment on lines +73 to +79
PROVIDERS_CACHE_EXPIRATION_SECONDS = "${terraform.workspace == "prod" ? 30 * 24 * 60 * 60 : 24 * 60 * 60}"
INDEXES_REDIS_URL = aws_elasticache_serverless_cache.cache["indexes"].endpoint[0].address
INDEXES_REDIS_CACHE = aws_elasticache_serverless_cache.cache["indexes"].name
INDEXES_CACHE_EXPIRATION_SECONDS = "${terraform.workspace == "prod" ? 24 * 60 * 60 : 60 * 60}"
CLAIMS_REDIS_URL = aws_elasticache_serverless_cache.cache["claims"].endpoint[0].address
CLAIMS_REDIS_CACHE = aws_elasticache_serverless_cache.cache["claims"].name
CLAIMS_CACHE_EXPIRATION_SECONDS = "${terraform.workspace == "prod" ? 7 * 24 * 60 * 60 : 24 * 60 * 60}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ math is math-ing

LegacyDataBucketURL: mustGetEnv("LEGACY_DATA_BUCKET_URL"),
HoneycombAPIKey: os.Getenv("HONEYCOMB_API_KEY"),
PrincipalMapping: principalMapping,
ProvidersCacheExpirationSeconds: mustGetInt("PROVIDER_CACHE_EXPIRATION_SECONDS"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking: PROVIDERS_CACHE_EXPIRATION_SECONDS

@hannahhoward hannahhoward force-pushed the feat/tune-caching-params branch from fd6a513 to 875744b Compare February 13, 2025 22:13
@frrist
Copy link
Member

frrist commented Feb 13, 2025

Question: What's our motivation for using Time-Based Expiry rather than Capacity-Based Expiry (LRU or LFU) for this cache? (Or maybe we are using both already?)

@hannahhoward
Copy link
Member Author

Capacity based expiry is already built into AWS. :) (and will expire those with expiries first -- goodness I hope I configured that right though -- should probably go back and review :P)

@hannahhoward hannahhoward merged commit dd0b64e into main Feb 13, 2025
8 of 9 checks passed
@hannahhoward
Copy link
Member Author

ah yes -- https://docs.aws.amazon.com/AmazonElastiCache/latest/dg/Scaling.html#:~:text=ElastiCache%20Serverless%20also%20supports%20a,from%20Replica%20using%20READONLY%20connections).

"When you set a cache data storage maximum and your cache data storage hits the maximum, ElastiCache will begin evicting data in your cache that has a Time-To-Live (TTL) set, using the LRU logic."

The reason for setting a TTL at all is that we want to distinguish between stuff that's already on IPNI and stuff we actually pre-cache on the indexer while it gets to IPNI. we want the pre-cache stuff to have no expiration until we get to IPNI, so it doesn't get deleted.

Also our capacity is 10GB but we get charged by the GB for what we actually use, so if our hit ration is high even with expiry, it's better to not use all 10GB at most times

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants