-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplementing log correlation #2428
Conversation
💚 CLA has been signed |
...-plugin/src/main/java/co/elastic/apm/agent/log4j2/Log4j2TraceCorrelationInstrumentation.java
Outdated
Show resolved
Hide resolved
6cb20fe
to
6e2ae2b
Compare
run integration tests |
The current mechanism is not more efficient in all cases. In fact, it’s often less efficient. Generally, the more the application logs, the more efficient the current strategy with the MdcActivationListener is. The less an application logs, the more efficient it is to only add the correlation ids on demand. An extreme case would be a bad n+1 query that does 10k calls to Redis. With the current mechanism, the hook would be executed for every span activation/deactivation, even if there are no log events in the scope of these spans. Due to that, I think we can even enable log correlation by default now and maybe also add the Also, I think adding to However, I’d like to challenge the approach to instrument all logger methods, such as I think there are good alternative methods we can instrument that are executed after checking whether the log level is enabled. It looks like these methods are always executed and are within a few frames of the logger method calls:
|
I just wanted to mention that log correlation is (still) working for me with this changes. |
That's absolutely right, but it's different when it's the user's choice to add this overhead. Either way, I don't think this aspect of overhead would be an issue, it's more points to consider and possibly optimize.
No need to challenge, the description of this PR explains why event creation is ideal in terms of JIT-timing. I instrumented the API in order to test the general approach and overcome the concerns described above. It's still in draft state because we are still exploring our options. One big advantage of instrumenting based on the APIs, is the instrumentation of slf4j. |
I just came back to this for a brief review, adding comments for discussion: I do like the simplicity and robustness (e.g. stability across versions and independency of actual implementation) of the API-based instrumentation, but I agree overhead is a major factor, so let's review the separate concerns:
That's a valid concern indeed. Since our advices are not inlined anymore, we add less bytecode today. It's hard to say what the effect would be, but we can do some rough estimation. I just compared
Reentrance guard is already implemented in the proposed
Whether log level is enabled is nice to test, although adding to thread-local maps is quite cheap, so not sure it worth the effort. If we decide it is, then we can use the related APIs for that.
True, but we apply method matching only for a limited number of classes. From what I could see, even if it's 10X comparing to single-method matching, it's still a matter of few milliseconds for the entire library. |
Summarizing offline discussion:
|
562f72c
to
a806af3
Compare
run integration tests |
...lugin-common/src/main/java/co/elastic/apm/agent/log/shader/AbstractLogCorrelationHelper.java
Outdated
Show resolved
Hide resolved
...lugin-common/src/main/java/co/elastic/apm/agent/log/shader/AbstractLogCorrelationHelper.java
Outdated
Show resolved
Hide resolved
See eyalkoren#5 |
JBoss LogManger is a logging implementation bases on JUL. Since there is no support for JUL at the moment, JBoss LogManager needs its one implementation for log correlation. |
Hmm, I assumed a logger is obtained through |
...j2-plugin/src/main/java/co/elastic/apm/agent/log4j2/correlation/CorrelationIdMapAdapter.java
Outdated
Show resolved
Hide resolved
…tion Log correlation for JBoss LogManager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't finished the review yet, I only have a few comments/questions so far.
That's definitely an impressive refactor and the end result is definitely worth the effort as everything now looks better and easier to extend and maintain.
Some tests are disabled with @Disabled
then executed only through Junit4 integration tests, I wonder if we could have a better option for this in the future as 1) it may result in having lots of tests that appear to be disabled in CI but are not in practice as they are also integration tests, and 2) it creates a bit of duplication so having a generic Junit4-wrapper would be nice if such thing exists.
apm-agent-core/src/main/java/co/elastic/apm/agent/impl/ActivationListener.java
Show resolved
Hide resolved
@@ -44,6 +44,8 @@ | |||
|
|||
private static final Logger logger = LoggerFactory.getLogger(ErrorCapture.class); | |||
|
|||
private static final ThreadLocal<ErrorCapture> activeError = new ThreadLocal<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] Would it be relevant to use co.elastic.apm.agent.sdk.weakconcurrent.WeakConcurrent#buildThreadLocal
instead of a plain ThreadLocal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You reminded me that this one made me think I need to take another look into our thread local usages in general.
In this specific case, I don't think it is required, as it is set and removed atomically within a single method execution. Once a method bytecode starts to execute, I don't think any of its dependencies may change before it ends.
But it may be something we want to set general rules for and make sure to follow all across.
apm-agent-core/src/main/java/co/elastic/apm/agent/logging/LoggingConfiguration.java
Outdated
Show resolved
Hide resolved
private final Tracer tracer = GlobalTracer.get(); | ||
|
||
@Override | ||
protected boolean addToMdc() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] could be made private as no sub-class is overriding addToMdc
or removeFromMdc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It implements a protected abstract
method that needs to have protected
access at least.
public ElementMatcher.Junction<ClassLoader> getClassLoaderMatcher() { | ||
return not(ofType(nameStartsWith("co.elastic.apm."))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] do you have further details on why this exclusion was previously required ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was required before and now as well for log4j2, because we load it within the agent. However, while it didn't interfere with tests of other frameworks before, it does now, so I removed it from the abstract and implemented it specifically for log4j2, where we test with TestClassWithDependencyRunner
, which uses co.elastic.test.ChildFirstURLClassLoader
especially to escape these matchers.
...pm-logging-plugin-common/src/main/java/co/elastic/apm/agent/loginstr/reformatting/Utils.java
Show resolved
Hide resolved
...ng-plugin-common/src/test/java/co/elastic/apm/agent/loginstr/LoggingInstrumentationTest.java
Outdated
Show resolved
Hide resolved
Yes, we have to improve on this front, but I think it's out of the scope of this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this is really nice and I only have a few minor comments.
Another minor thing that might be nice to add if you have time is to make sure to properly document n javadoc which methods are instrumented in *Instrumentation
classes to make it easier to navigate to the instrumented classes.
...c/main/java/co/elastic/apm/agent/log4j2/error/Log4j2LoggerErrorCapturingInstrumentation.java
Show resolved
Hide resolved
...lugin/src/test/java/co/elastic/apm/agent/log4j2/correlation/CorrelationIdMapAdapterTest.java
Show resolved
Hide resolved
} | ||
|
||
@Test | ||
void testShadePathComputation() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] seems a good candidate for changing the computeReformattedLogFilePathWithConfiguredDir
entry point and test to use Path
instead of a raw file path in a String
that requires to rewrite path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get this one. IIUC it's only something to change within the test, so I'll leave as is.
Let me know if you meant something within the actual implementation that you think should be changed.
...ng-plugin-common/src/test/java/co/elastic/apm/agent/loginstr/LoggingInstrumentationTest.java
Outdated
Show resolved
Hide resolved
Awesome, looking forward to the next release of the agent! Thank you for all your incredible work ❤️ |
Maybe this change in 1.30.0 has to do with my problem here? : |
What does this PR do?
Closes #2412
Closes #2374
Closes #2369
Closes #2387
Implementing a new approach for log correlation (which is about adding active
trace.id
,transaction.id
anderror.id
to log events automatically). The current mechanism makes MDC updates when trace events are being activated or deactivated. This is efficient, but there are cases where the logging library is not available within the context of event activation, or there is access to the wrong one.Instead, the new approach is to do JIT-MDC-updates, within the logging library, based on the active tracing events, through instrumentation.
General concept
Conceptually, the most robust way is to add the tracing IDs to the MDC right before the logging event is created and remove it right after for the following reasons:
Related disadvantages are:
By instrumenting the public logging API themselves, we can benefit from all listed advantages and avoid the first disadvantage.
As for performance- since we use the actual logging framework best practice to decorate logging events with thread-contextual data, we can rely on optimizations done by the framework, so if there is a performance hit, it should be minimal. However, below there are some framework-specific pointers in case we want to optimize.
Prior research (documenting for later reference)
The current approach is to instrument public logging APIs. If this proves to be too impactful in terms of performance, following are some library-specific options for optimization.
log4j2
Trying to capture the
LoggingEvent
creation is not trivial because of API instability, multiple creation options (constructor, static factory method and builder at least) and because events may be reusable.One option is to try and instrument at least two
org.apache.logging.log4j.core.impl.ContextDataFactory#createContextData()
overloads, but it can get complicated as the returnedStringMap
s may be either mutable or immutable (seeorg.apache.logging.log4j.util.StringMap#freeze()
).It is difficult to find a single method that is also stable across versions through which all logging APIs go through, and not sure this would improve anything.
logback
The Logback MDC implementation tracks map updates and reads, and creates a copy of the map every time there is an update after a read (see
ch.qos.logback.classic.util.LogbackMDCAdapter
). This means that our approach will incur more map copy actions.The logging event's immutable context map gets populated either when the event is logged, or when it is decided to defer its logging.
ch.qos.logback.classic.spi.LoggingEvent#getMDCPropertyMap()
may be a good instrumentation target as it seems stable across versions I believe it returns a mutable collection.By doing so we can avoid updates of the MDC-underlying map.
To-do list
apm-log-correlation-plugin
apm-error-logging-plugin
into one plugin with the shading pluginco.elastic.apm.agent.impl.transaction.TraceContext#applicationClassLoader
fatal
andcatching
APIs, and log4j1'sfatal
APIlog_format_file=JSON
- will be handled through Add trace correlation to agent logs #2495enable_log_correlation
configChecklist
I added tests for log correlation within agent logs- will be handled through Add trace correlation to agent logs #2495