Skip to content

Commit

Permalink
Lower memory requirements for cached pressed pages
Browse files Browse the repository at this point in the history
ETag Caching was introduced for Facia `PressedPage` JSON downloading
with #26338 in order to improve
scalability and address #26335,
but a limiting factor was the number of `PressedPage` objects that could
be stored in the cache.

With a max `PressedPage` size of 22MB and a memory budget of 4GB, a
cautious max cache size limit of only 180 `PressedPage` objects was set.
As a result, the cache hit rate was relatively low, and we saw elevated GC,
probably because of object continually being evicted out of the small cache:

#26338 (comment)

The change in this new commit dramatically reduces the combined size of
the `PressedPage` objects held in memory, taking the average retained size
per `PressedPage` from 4MB to 0.5MB (based on a sample of 125 `PressedPage`
objects held in memory at the same time).

It does this by deduplicating the `Tag` objects held by the `PressedPage`s.
Previously, as the `Tag`s for different `PressedPage`s were deserialised
from JSON, many identical tags would created over and over again, and held
in memory. After dedeuplication, those different `PressedPage`s will all
reference the _same_ `Tag` object for a given tag.

The deduplication is done as the `Tag`s are deserialised - a new cache
(gotta love caches!) holds `Tag`s keyed by their hashcode and tag id,
and if a new `Tag` is created with a matching key, it's thrown away, and
the old one is used instead. Thus we end up with just one instance of
that `Tag`, instead of many duplicated ones.

See also:

* https://en.wikipedia.org/wiki/String_interning - a similar technique
  used by Java for Strings: https://www.geeksforgeeks.org/interning-of-string/
  • Loading branch information
rtyley committed Sep 5, 2023
1 parent f43cb34 commit 984f9f8
Show file tree
Hide file tree
Showing 3 changed files with 49 additions and 2 deletions.
38 changes: 38 additions & 0 deletions common/app/json/ObjectDeduplication.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
package json

import com.github.blemale.scaffeine.Scaffeine
import com.gu.etagcaching.ConfigCache
import play.api.libs.json.Format

/**
* This class is a memory optimisation for when many identical objects (eg `Tag`s) are being
* read from JSON, and then held in memory. If the objects are immutable & identical,
* we don't want to hold multiple copies of them - we can just hold one in a cache, and have all
* references to the object point to that same object.
*
* In practice, this means when we deserialize Json to some Scala object, we _do_ create a
* new instance of that Scala object, but we immediately throw it away if we find we already
* have a matching one in our cache - we use the cached one in preference, and we get the desired
* situation where all the instances of that unique value in memory are represented by one single
* instance.
*
* We key our cached instances by hashcode (which is mostly unique, apart from
* https://en.wikipedia.org/wiki/Birthday_problem collisions) combined with a user-defined id function
* (which is a only precaution, to eliminate Birthday Problem collisions). So in our cache, we
* might have several instances of the Tag `sport/cycling`, if the tag has changed - but the hashcode
* ensures that we do not substitute old data in place of new data. Old data can be configured to
* eventually expire out of the cache (eg `.expireAfterAccess(1.hour)`).
*/
object ObjectDeduplication {

/**
* @param id as precaution, this function provides an object distinguisher beyond the object hashcode. For a Tag,
* you might just provide the tag id.
*/
def deduplicate[A](f: Format[A], id: A => Any, configCache: ConfigCache): Format[A] = {
val cache = configCache(Scaffeine()).build[(Int, Any), A]().asMap()
def key(a: A): (Int, Any) = (a.hashCode(), id(a))

f.bimap(a => cache.getOrElseUpdate(key(a), a), identity)
}
}
11 changes: 10 additions & 1 deletion common/app/model/Formats.scala
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,15 @@ package model

import com.gu.contentapi.client.utils.DesignType
import common.Pagination
import json.ObjectDeduplication.deduplicate
import model.content._
import model.facia.PressedCollection
import model.pressed._
import play.api.libs.json._
import play.api.libs.json.JodaReads._

import scala.concurrent.duration.DurationInt

object GenericThriftAtomFormat extends Format[com.gu.contentatom.thrift.Atom] {
def reads(json: JsValue): JsError = JsError("Converting from Json is not supported by intent!")
def writes(atom: com.gu.contentatom.thrift.Atom): JsObject = JsObject(Seq.empty)
Expand Down Expand Up @@ -216,7 +219,13 @@ object PressedContentFormat {
implicit val podcastFormat = Json.format[Podcast]
implicit val referenceFormat = Json.format[Reference]
implicit val tagPropertiesFormat = Json.format[TagProperties]
implicit val tagFormat = Json.format[Tag]
implicit val tagFormat: Format[Tag] =
deduplicate(
Json.format[Tag],
_.id,
_.maximumSize(20000) // average Tag retains ~8KB in memory, so 20000 cached Tags retain only 160MB
.expireAfterAccess(1.hour),
)
implicit val tagsFormat = Json.format[Tags]
implicit val elementPropertiesFormat = Json.format[ElementProperties]
implicit val imageAssetFormat = Json.format[ImageAsset]
Expand Down
2 changes: 1 addition & 1 deletion common/app/services/fronts/FrontJsonFapi.scala
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ trait FrontJsonFapi extends GuLogging {
}
},
AlwaysWaitForRefreshedValue,
_.maximumSize(180).expireAfterAccess(1.hour),
_.maximumSize(2000).expireAfterAccess(1.hour),
)

def get(path: String, pageType: PressedPageType): Future[Option[PressedPage]] =
Expand Down

0 comments on commit 984f9f8

Please sign in to comment.