Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce CPU & network consumption of Facia JSON download #26338

Merged
merged 2 commits into from
Aug 9, 2023

Commits on Aug 9, 2023

  1. Switch to async AWS SDK v2 for Facia JSON download

    Context: #26335
    
    In this change:
    
    * Use AWS SDK v2 and non-blocking async code to avoid blocking a thread
      while downloading the Facia JSON data from S3
    * Directly transform bytes into JsValue, without constructing a `String`
      first
    
    Note that the work done between the two metrics `FrontDownloadLatency`
    & `FrontDecodingLatency` has changed slightly - the conversion to the
    basic JSON model (JsValue) now occurs in `FrontDownloadLatency`, rather
    than the 'decoding' step.
    
    We could get rid of the futureSemaphore and stop using the dedicated
    blocking-operations pool here, but we'll leave that to another PR.
    
    Co-authored-by: Ravi <7014230+arelra@users.noreply.github.com>
    Co-authored-by: Ioanna Kokkini <ioanna.kokkini@guardian.co.uk>
    3 people authored and rtyley committed Aug 9, 2023
    Configuration menu
    Copy the full SHA
    ebfc2a1 View commit details
    Browse the repository at this point in the history
  2. Reduce CPU & network consumption of Facia JSON download

    This change introduces ETag-based caching with the new
    https://github.com/guardian/etag-caching library, yielding
    these benefits:
    
    * Significant savings in terms of resources:
      * **Network**: `PressedPage`s are only re-downloaded if the S3 content has _changed_
      * **CPU**: Cached `PressedPage`s are only re-parsed if the S3 content has _changed_
    * Retains the 'stay current' behaviour of the old, non-caching solution: S3 queried with every request to ensure the ETag is up-to-date.
    * Also, given the cache is based on the ([S](https://github.com/blemale/scaffeine))[Caffeine](https://github.com/ben-manes/caffeine) caching library:
      * In-flight requests for a given key are unified, so 100 simultaneous requests for a key make just **1** fetch-and-parse
    
    Currently `facia` runs on [`c6g.2xlarge`](https://instances.vantage.sh/aws/ec2/c6g.2xlarge) instances (16GB RAM), with a JVM heap of 12GB. I'm going to suggest that it's reasonable for the `ETagCache` to consume up to 4GB of the 12GB RAM.
    
    There are ~300 fronts, each of which can have [4 different variants](https://github.com/guardian/frontend/blob/f64c5b681f53a9c87ae1e76c529d08b3ad16ef6b/common/app/model/PressedPage.scala#L68-L86), so currently there could be up to 1200 different `PressedPage` objects. From [heapdump](https://drive.google.com/file/d/1yjfqLMpWqww6-3L8RBN0yrAf9hQHHpSC/view?usp=sharing) analysis, the largest `PressedPage` retains ~22MB of memory (most instances are smaller, averaging at 4MB). With the 4GB budget, and assuming a worse case of 22MB per `PressedPage`, we can afford to set a max size on the cache of ~180 entries. Although it's disappointing we can't hold _all_ of the `PressedPage`s, the [priorities of the eviction policy used by the Caffeine caching library](https://github.com/ben-manes/caffeine/wiki/Efficiency) should ensure that we get a good hit rate on the most in-demand fronts.
    
    _Incidentally, heapdump analysis also shows that some structures within `PressedPage` are memory inefficient when considered from the perspective of holding many `PressedPage` objects in memory at once - using object pooling on `model.Tag` for instance, would probably lead to a 80% reduction of the total retained memory._
    rtyley committed Aug 9, 2023
    Configuration menu
    Copy the full SHA
    3611570 View commit details
    Browse the repository at this point in the history