Adding a new variations of the "per()" method to allow arbitrary keys…

… to be used to aggregate logs. This isn't as simple as it sounds due to the need to manage the number of unique keys stored per log site to avoid unbounded memory leaks. The new "bucketing strategy" class exists to map unbounded keys to a bounded set. RELNOTES=Added a new "per()" method to support arbitrary aggregation keys. PiperOrigin-RevId: 520063909
google · Mar 28, 2023 · bddcc1d · bddcc1d
1 parent c9f329c
commit bddcc1d
Show file tree

Hide file tree

Showing 5 changed files with 460 additions and 29 deletions.
diff --git a/api/src/main/java/com/google/common/flogger/LogContext.java b/api/src/main/java/com/google/common/flogger/LogContext.java
@@ -492,17 +492,17 @@ protected final void removeMetadata(MetadataKey<?> key) {
   /**
    * A callback which can be overridden to implement post processing of logging contexts prior to
    * passing them to the backend.
-   * 
+   *
    * <h2>Basic Responsibilities</h2>
-   * 
+   *
    * <p>This method is responsible for:
    * <ol>
    * <li>Performing any rate limiting operations specific to the extended API.
    * <li>Updating per log-site information (e.g. for debug metrics).
    * <li>Adding any additional metadata to this context.
    * <li>Returning whether logging should be attempted.
    * </ol>
-   * 
+   *
    * <p>Implementations of this method must always call {@code super.postProcess()} first with the
    * given log site key:
    *
@@ -514,15 +514,15 @@ protected final void removeMetadata(MetadataKey<?> key) {
    * }}</pre>
    *
    * <h2>Log Site Keys</h2>
-   * 
+   *
    * <p>If per log-site information is needed during post-processing, it should be stored using a
    * {@link LogSiteMap}. This will correctly handle "specialized" log-site keys and remove the risk
    * of memory leaks due to retaining unused log site data indefinitely.
-   * 
+   *
    * <p>Note that the given {@code logSiteKey} can be more specific than the {@link LogSite} of a
    * log statement (i.e. a single log statement can have multiple distinct versions of its state).
    * See {@link #per(Enum)} for more information.
-   * 
+   *
    * <p>If a log statement cannot be identified uniquely, then {@code logSiteKey} will be {@code
    * null}, and this method must behave exactly as if the corresponding fluent method had not been
    * invoked. On a system in which log site information is <em>unavailable</em>:
@@ -534,32 +534,32 @@ protected final void removeMetadata(MetadataKey<?> key) {
    * <pre>{@code logger.atInfo().withCause(e).log("Some message"); }</pre>
    *
    * <h2>Rate Limiting and Skipped Logs</h2>
-   * 
+   *
    * <p>When handling rate limiting, {@link #updateRateLimiterStatus(RateLimitStatus)} should be
    * called for each active rate limiter. This ensures that even if logging does not occur, the
    * number of "skipped" log statements is recorded correctly and emitted for the next allowed log.
-   * 
+   *
    * <p>If {@code postProcess()} returns {@code false} without updating the rate limit status, the
    * log statement may not be counted as skipped. In some situations this is desired, but either way
    * the extended logging API should make it clear to the user (via documentation) what will happen.
    * However in most cases {@code postProcess()} is only expected to return {@code false} due to
    * rate limiting.
-   * 
+   *
    * <p>If rate limiters are used there are still situations in which {@code postProcess()} can
    * return {@code true}, but logging will not occur. This is due to race conditions around the
    * resetting of rate limiter state. A {@code postProcess()} method can "early exit" as soon as
    * {@code shouldLog} is false, but should assume logging will occur while it remains {@code true}.
-   * 
+   *
    * <p>If a method in the logging chain determines that logging should definitely not occur, it may
    * choose to return the {@code NoOp} logging API at that point. However this will bypass any
    * post-processing, and no rate limiter state will be updated. This is sometimes desirable, but
    * the API documentation should make it clear to the user as to which behaviour occurs.
-   * 
+   *
    * For example, level selector methods (such as {@code atInfo()}) return the {@code NoOp} API for
    * "disabled" log statements, and these have no effect on rate limiter state, and will not update
    * the "skipped" count. This is fine because controlling logging via log level selection is not
    * conceptually a form of "rate limiting".
-   * 
+   *
    * <p>The default implementation of this method enforces the rate limits as set by {@link
    * #every(int)} and {@link #atMostEvery(int, TimeUnit)}.
    *
@@ -619,7 +619,7 @@ protected boolean postProcess(@NullableDecl LogSiteKey logSiteKey) {
    * Callback to allow custom log contexts to apply additional rate limiting behaviour. This should
    * be called from within an overriden {@code postProcess()} method. Typically this is invoked
    * after calling {@code super.postProcess(logSiteKey)}, such as:
-   * 
+   *
    * <pre>{@code protected boolean postProcess(@NullableDecl LogSiteKey logSiteKey) {
    *   boolean shouldLog = super.postProcess(logSiteKey);
    *   // Even if `shouldLog` is false, we still call the rate limiter to update its state.
@@ -629,10 +629,10 @@ protected boolean postProcess(@NullableDecl LogSiteKey logSiteKey) {
    *   }
    *   return shouldLog;
    * }}</pre>
-   * 
+   *
    * <p>See {@link RateLimitStatus} for more information on how to implement custom rate limiting in
    * Flogger.
-   * 
+   *
    * @param status a rate limiting status, or {@code null} if the rate limiter was not active.
    * @return whether logging will occur based on the current combined state of active rate limiters.
    */
@@ -802,6 +802,12 @@ public final <T> API with(MetadataKey<Boolean> key) {
     return with(key, Boolean.TRUE);
   }
 
+  @Override
+  public <T> API per(@NullableDecl T key, LogPerBucketingStrategy<? super T> strategy) {
+    // Skip calling the bucketer for null so implementations don't need to check.
+    return key != null ? with(Key.LOG_SITE_GROUPING_KEY, strategy.apply(key)) : api();
+  }
+
   @Override
   public final API per(Enum<?> key) {
     return with(Key.LOG_SITE_GROUPING_KEY, key);

diff --git a/api/src/main/java/com/google/common/flogger/LogPerBucketingStrategy.java b/api/src/main/java/com/google/common/flogger/LogPerBucketingStrategy.java
@@ -0,0 +1,232 @@
+/*
+ * Copyright (C) 2023 The Flogger Authors.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.google.common.flogger;
+
+import static com.google.common.flogger.util.Checks.checkArgument;
+import static com.google.common.flogger.util.Checks.checkNotNull;
+
+import java.util.HashMap;
+import org.checkerframework.checker.nullness.compatqual.NullableDecl;
+
+/**
+ * Provides a strategy for "bucketing" a potentially unbounded set of log aggregation keys used by
+ * the {@link LoggingApi#per(T, LogPerBucketingStrategy<T>)} method.
+ *
+ * <p>When implementing new strategies not provided by this class, it is important to ensure that
+ * the {@code apply()} method returns values from a bounded set of instances wherever possible.
+ *
+ * <p>This is important because the returned values are held persistently for potentially many
+ * different log sites. If a different instance is returned each time {@code apply()} is called, a
+ * different instance will be held in each log site. This multiplies the amount of memory that is
+ * retained indefinitely by any use of {@link LoggingApi#per(T,LogPerBucketingStrategy<T>)}.
+ * 
+ * <p>One way to handle arbitrary key types would be to create a strategy which "interns" instances
+ * in some way, to produce singleton identifiers. Unfortunately interning can itself be a cause of
+ * unbounded memory leaks, so a bucketing strategy wishing to perform interning should probably
+ * support a user defined maximum capacity to limit the overall risk. If too many instances are
+ * seen, the strategy should begin to return {@code null} (and log an appropriate warning).
+ * 
+ * <p>The additional complexity created by this approach really tells us that types which require
+ * interning in order to be used as aggregation keys should be considered unsuitable, and callers
+ * should seek alternatives.
+ */
+public abstract class LogPerBucketingStrategy<T> {
+  private static final LogPerBucketingStrategy<Object> KNOWN_BOUNDED =
+      new LogPerBucketingStrategy<Object>("KnownBounded") {
+        @Override
+        protected Object apply(Object key) {
+          return key;
+        }
+      };
+
+  // The is a "safe" strategy as far as memory use is concerned since class objects are effectively
+  // singletons.
+  private static final LogPerBucketingStrategy<Object> BY_CLASS =
+      new LogPerBucketingStrategy<Object>("ByClass") {
+        @Override
+        protected Object apply(Object key) {
+          return key.getClass();
+        }
+      };
+
+  // The is a "safe" strategy as far as memory use is concerned, because a class object returns the
+  // same string instance every time its called, and class objects are effectively singletons.
+  private static final LogPerBucketingStrategy<Object> BY_CLASS_NAME =
+      new LogPerBucketingStrategy<Object>("ByClassName") {
+        @Override
+        protected Object apply(Object key) {
+          // This is a naturally interned value, so no need to call intern().
+          return key.getClass().getName();
+        }
+      };
+
+  /**
+   * A strategy to use only if the set of log aggregation keys is known to be a strictly bounded set
+   * of instances with singleton semantics.
+   *
+   * <p><em>WARNING</em>: When using this strategy, keys passed to {@link
+   * LoggingApi#per(T,LogPerBucketingStrategy<T>)} are used as-is by the log aggregation code, and
+   * held indefinitely by internal static data structures. As such it is vital that key instances
+   * used with this strategy have singleton semantics (i.e. if {@code k1.equals(k2)} then {@code k1
+   * == k2}). Failure to adhere to this requirement is likely to result in hard to detect memory
+   * leaks.
+   *
+   * <p>If keys do not have singleton semantics then you should use a different strategy, such as
+   * {@link #byHashCode(int)} or {@link #byClass()}.
+   */
+  public static final LogPerBucketingStrategy<Object> knownBounded() {
+    return KNOWN_BOUNDED;
+  }
+
+  /**
+   * A strategy which uses the {@code Class} of the given key for log aggregation. This is useful
+   * when you need to aggregate over specific exceptions or similar type-distinguished instances.
+   *
+   * <p>Note that using this strategy will result in a reference to the {@code Class} object of the
+   * key being retained indefinitely. This will prevent class unloading from occurring for affected
+   * classes, and it is up to the caller to decide if this is acceptable or not.
+   */
+  public static final LogPerBucketingStrategy<Object> byClass() {
+    return BY_CLASS;
+  }
+
+  /**
+   * A strategy which uses the {@code Class} name of the given key for log aggregation. This is
+   * useful when you need to aggregate over specific exceptions or similar type-distinguished
+   * instances.
+   *
+   * <p>This is an alternative strategy to {@link #byClass()} which avoids holding onto the class
+   * instance and avoids any issues with class unloading. However it may conflate classes if
+   * applications use complex arrangements of custom of class-loaders, but this should be extremely
+   * rare.
+   */
+  public static final LogPerBucketingStrategy<Object> byClassName() {
+    return BY_CLASS_NAME;
+  }
+
+  /**
+   * A strategy defined for some given set of known keys.
+   *
+   * <p>Unlike {@link #knownBounded()}, this strategy maps keys a bounded set of identifiers, and
+   * permits the use of non-singleton keys in {@link LoggingApi#per(T,LogPerBucketingStrategy<T>)}.
+   *
+   * <p>If keys outside this set are used this strategy returns {@code null}, and log aggregation
+   * will not occur. Duplicates in {@code knownKeys} are ignored.
+   */
+  public static final LogPerBucketingStrategy<Object> forKnownKeys(Iterable<?> knownKeys) {
+    final HashMap<Object, Integer> keyMap = new HashMap<Object, Integer>();
+    StringBuilder name = new StringBuilder("ForKnownKeys(");
+    int index = 0;
+    for (Object key : knownKeys) {
+      checkNotNull(key, "key");
+      if (!keyMap.containsKey(key)) {
+        name.append(index > 0 ? ", " : "").append(key);
+        keyMap.put(key, index++);
+      }
+    }
+    checkArgument(!keyMap.isEmpty(), "knownKeys must not be empty");
+    name.append(")");
+    return new LogPerBucketingStrategy<Object>(name.toString()) {
+      @Override
+      @NullableDecl
+      protected Object apply(Object key) {
+        return keyMap.get(key);
+      }
+    };
+  }
+
+  /**
+   * A strategy which uses the {@code hashCode()} of a given key, modulo {@code maxBuckets}, for log
+   * aggregation.
+   *
+   * <p>This is a fallback strategy for cases where the set of possible values is not known in
+   * advance, or could be arbirarily large in unusual circumstances.
+   * 
+   * <p>When using this method it is obviously important that the {@code hashCode()} method of the
+   * expected keys is well distributed, since duplicate hash codes, or hash codes congruent to
+   * {@code maxBuckets} will cause keys to be conflated.
+   *
+   * <p>The caller is responsible for deciding the number of unique log aggregation keys this
+   * strategy can return. This choice is a trade-off between memory usage and the risk of conflating
+   * keys when performing log aggregation. Each log site using this strategy will hold up to {@code
+   * maxBuckets} distinct versions of log site information to allow rate limiting and other stateful
+   * operations to be applied separately per bucket. The overall allocation cost depends on the type
+   * of rate limiting used alongside this method, but it scales linearly with {@code maxBuckets}.
+   * 
+   * <p>It is recommended to keep the value of {@code maxBuckets} below 250, since this guarantees
+   * no additional allocations will occur when using this strategy, however the value chosen should
+   * be as small as practically possible for the typical expected number of unique keys.
+   *
+   * <p>To avoid unwanted allocation at log sites, users are strongly encouraged to assign the
+   * returned value to a static field, and pass that to any log statements which need it.
+   */
+  public static final LogPerBucketingStrategy<Object> byHashCode(final int maxBuckets) {
+    checkArgument(maxBuckets > 0, "maxBuckets must be positive");
+    return new LogPerBucketingStrategy<Object>("ByHashCode(" + maxBuckets + ")") {
+      @Override
+      protected Object apply(Object key) {
+        // Modulo can return -ve values and we want a value in the range (0 <= modulo < maxBuckets).
+        // Note: Math.floorMod() is Java 8, so cannot be used here (yet) otherwise we would just do:
+        // return Math.floorMod(key.hashCode(), maxBuckets) - 128;
+        int modulo = key.hashCode() % maxBuckets;
+        // Can only be -ve if the hashcode was negative, and if so (-maxBuckets < modulo < 0).
+        // The following adds maxBuckets if modulo was negative, or zero (saves a branch).
+        modulo += (modulo >> 31) & maxBuckets;
+        // Subtract 128 from the modulo in order to take full advantage of the promised Integer
+        // cache in the JVM (ensuring up to 256 cached values). From java.lang.Integer#valueOf():
+        // ""This method will always cache values in the range -128 to 127 ...""
+        return modulo - 128;
+      }
+    };
+  }
+
+  private final String name;
+
+  /** Instantiates a strategy with the specified name (used for debugging). */
+  protected LogPerBucketingStrategy(String name) {
+    this.name = checkNotNull(name, "name");
+  }
+
+  /**
+   * Maps a log aggregation key from a potentially unbounded set of key values to a bounded set of
+   * instances.
+   *
+   * <p>Implementations of this method should be efficient, and avoid allocating memory wherever
+   * possible. The returned value must be an immutable identifier with minimal additional allocation
+   * requirements and ideally have singleton semantics (e.g. an {@code Enum} or {@code Integer}
+   * value).
+   *
+   * <p><em>Warning</em>: If keys are not known to have natural singleton semantics (e.g. {@code
+   * String}) then returning the given key instance is generally a bad idea. Even if the set of key
+   * values is small, the set of distinct allocated instances passed to {@link
+   * LoggingApi#per(T,LogPerBucketingStrategy<T>)} can be unbounded, and that's what matters. As
+   * such it is always better to map keys to some singleton identifier or intern the keys in some
+   * way.
+   *
+   * @param key a non-null key from a potentially unbounded set of log aggregation keys.
+   * @return an immutable value from some known bounded set, which will be held persistently by
+   *     internal Flogger data structures as part of the log aggregation feature. If {@code null} is
+   *     returned, the corresponding call to {@code per(key, STRATEGY)} has no effect.
+   */
+  @NullableDecl
+  protected abstract Object apply(T key);
+
+  @Override
+  public final String toString() {
+    return LogPerBucketingStrategy.class.getSimpleName() + "[" + name + "]";
+  }
+}