-
Notifications
You must be signed in to change notification settings - Fork 2
ContinuousProfiling
Continuous profiling and distributed tracing are common techniques being used in modern systems to identify performance issues.
The original purpose for the spf4j library was continuous profiling with flame graph visualization. Additionally the spf4j library grew to include low overhead metrics api and various other components needed to make systems more realiable and more performant: test log backend, thread pool, jdbc semaphore, jmx, csv, etc...
One of the abstractions added to the library is the ExecutionContext. The first requirement that the ExecutionContext resolved, was timeout propagation. Next the execution context evolved into a bag of various execution related "things", like logs, tracing information, stack samples....
The addition of stack samples to the execution context came from the desire to answer the question "why did this outlier take 10s to execute?".
When a request executes outside of its SLA (let's say 50% of the timeout value) the request execution framework can materialize to a log file the stack samples attributed to this particular request.
To show an example, I created a simple demo project running in GKE and a simulated slow request(follow the link to simulate it and have some data to look at bellow).
After invoking this Endpoint and have at least a slow request, you can go and retrieve the profile data from the profiles endpoint.
And visualize where the time is being spent via:
see live flame graph visualization based on d3-flame-graph (do not forget to simulate the slow request first)
As you can see, it is easy to see the sleep call that I used to simulate this slow request.
This is how the profiling works:
For REST invocations, the servlet filter implementing ExecutionContext start/stop + logging is at
To enable ExecutionContext corelated profiling all you need to do is:
// Enable Continuous profiling aware ExecutionContext
System.setProperty("spf4j.execContext.tlAttacherClass", ProfilingTLAttacher.class.getName());
System.setProperty("spf4j.execContext.factoryClass", ProfiledExecutionContextFactory.class.getName());
Sampler sampler = new Sampler(Integer.getInteger("app.profiler.sampleTimeMillis", 10),
(t) -> new TracingExecutionContexSampler(contextFactory::getCurrentThreadContexts,
(ctx) -> {
// group by for the sample aggregations.
// profiles will be aggregated across the HTTP method of the request.
// There will be a ssdump3 file with samples broken down by http method.
String name = ctx.getName();
if (name.startsWith("GET")) {
return "GET";
} else if (ctx.getName().startsWith("POST")) {
return "POST";
} else {
return "OTHER";
}
}));
sampler.start();
For a detailed description on how can you do that see
Notes:
-
The profiling exhibits safe point bias
-
Profiling overhead can be controled by configuring the sample time + thread sample limit.
-
Each request will incur extra overhead of 2 concurrent map accesses (put + remove).
-
No visibility into native stack.
-
Time corelated GC activity should be inspected to reason about GC impact.
Enjoy! Looking forward for any feedback!