Skip to content

ContinuousProfiling

Zoltan Farkas edited this page Jan 28, 2023 · 16 revisions

Continuous profiling + Execution Contexts

History and evolution

Continuous profiling and distributed tracing are common techniques being used in modern systems to identify performance issues.

The original purpose for the spf4j library was continuous profiling with flame graph visualization. Additionally the spf4j library grew to include low overhead metrics api and various other components needed to make systems more realiable and more performant: test log backend, thread pool, jdbc semaphore, jmx, csv, etc...

One of the abstractions added to the library is the ExecutionContext. The first requirement that the ExecutionContext resolved, was timeout propagation. Next the execution context evolved into a bag of various execution related "things", like logs, tracing information, stack samples....

The addition of stack samples to the execution context came from the desire to answer the question "why did this outlier take 10s to execute?".

When a request executes outside of its SLA (let's say 50% of the timeout value) the request execution framework can materialize to a log file the stack samples attributed to this particular request.

Example in action

To show an example, I created a simple demo project running in GKE and a simulated slow request(follow the link to simulate it and have some data to look at bellow).

After invoking this Endpoint and have at least a slow request, you can go and retrieve the profile data from the profiles endpoint.

And visualize where the time is being spent via:

Actuator endpoint

see live flame graph visualization based on d3-flame-graph (do not forget to simulate the slow request first)

images

spf4j-ui

images

images

As you can see, it is easy to see the sleep call that I used to simulate this slow request.

This is how the profiling works:

images

For REST invocations, the servlet filter implementing ExecutionContext start/stop + logging is at

To enable ExecutionContext corelated profiling all you need to do is:

    // Enable Continuous profiling aware ExecutionContext
    System.setProperty("spf4j.execContext.tlAttacherClass", ProfilingTLAttacher.class.getName());
    System.setProperty("spf4j.execContext.factoryClass", ProfiledExecutionContextFactory.class.getName());
    Sampler sampler = new Sampler(Integer.getInteger("app.profiler.sampleTimeMillis", 10),
            (t) -> new TracingExecutionContexSampler(contextFactory::getCurrentThreadContexts,
                    (ctx) -> {
                      // group by for the sample aggregations.
                      // profiles will be aggregated across the HTTP method of the request.
                      // There will be a ssdump3 file with samples broken down by http method.
                      String name = ctx.getName();
                      if (name.startsWith("GET")) {
                        return "GET";
                      } else if (ctx.getName().startsWith("POST")) {
                        return "POST";
                      } else {
                        return "OTHER";
                      }
                    }));
    sampler.start();

Profiling fast requests

For a detailed description on how can you do that see

Notes:

  • The profiling exhibits safe point bias

  • Profiling overhead can be controled by configuring the sample time + thread sample limit.

  • Each request will incur extra overhead of 2 concurrent map accesses (put + remove).

  • No visibility into native stack.

  • Time corelated GC activity should be inspected to reason about GC impact.

Join the discussion on: Gitter chat

Enjoy! Looking forward for any feedback!