Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slf4j MDC with java.lang.LinkageError, within the transaction created by external plugin and log correlation enabled #2350

Closed
CoderLan0668 opened this issue Dec 16, 2021 · 16 comments · Fixed by #2376 or #2390
Assignees
Labels
agent-java bug Bugs community Issues and PRs created by the community
Milestone

Comments

@CoderLan0668
Copy link
Contributor

CoderLan0668 commented Dec 16, 2021

Using agent 1.27.0, and enable log correlation

Within the transaction created by external plugin, the application will throw java.lang.LinkageError, the stack trace is :

java.lang.LinkageError: loader constraint violation: when resolving method "org.slf4j.impl.StaticMDCBinder.getMDCA()Lorg/slf4j/spi/MDCAdapter;" the class loader co.elastic.apm.agent.bci.classloading.IndyPluginClassLoader @64d6d42b (instance of co.elastic.apm.agent.bci.classloading.IndyPluginClassLoader, child of co.elastic.apm.agent.bci.classloading.DiscriminatingMultiParentClassLoader @737bfc4e co.elastic.apm.agent.bci.classloading.DiscriminatingMultiParentClassLoader) of the current class, org/slf4j/MDC, and the class loader co.elastic.apm.agent.premain.ShadedClassLoader @32709393 (instance of co.elastic.apm.agent.premain.ShadedClassLoader, child of co.elastic.apm.agent.premain.ShadedClassLoader @35d176f7 co.elastic.apm.agent.premain.ShadedClassLoader) for the method's defining class, org/slf4j/impl/StaticMDCBinder, have different Class objects for the type org/slf4j/spi/MDCAdapter used in the signature
	at org.slf4j.MDC.bwCompatibleGetMDCAdapterFromBinder(MDC.java:102)
	at org.slf4j.MDC.<clinit>(MDC.java:108)
	at java.base/jdk.internal.misc.Unsafe.ensureClassInitialized0(Native Method)
	at java.base/jdk.internal.misc.Unsafe.ensureClassInitialized(Unsafe.java:1044)
	at java.base/java.lang.invoke.DirectMethodHandle$EnsureInitialized.computeValue(DirectMethodHandle.java:357)
	at java.base/java.lang.invoke.DirectMethodHandle$EnsureInitialized.computeValue(DirectMethodHandle.java:354)
	at java.base/java.lang.ClassValue.getFromHashMap(ClassValue.java:228)
	at java.base/java.lang.ClassValue.getFromBackup(ClassValue.java:210)
	at java.base/java.lang.ClassValue.get(ClassValue.java:116)
	at java.base/java.lang.invoke.DirectMethodHandle.checkInitialized(DirectMethodHandle.java:378)
	at java.base/java.lang.invoke.DirectMethodHandle.ensureInitialized(DirectMethodHandle.java:368)
	at java.base/java.lang.invoke.DirectMethodHandle.internalMemberNameEnsureInit(DirectMethodHandle.java:318)
	at co.elastic.apm.agent.mdc.MdcActivationListener.before(MdcActivationListener.java:189)
	at co.elastic.apm.agent.mdc.MdcActivationListener.beforeActivate(MdcActivationListener.java:173)
	at co.elastic.apm.agent.impl.ElasticApmTracer.activate(ElasticApmTracer.java:690)
	at co.elastic.apm.agent.impl.transaction.AbstractSpan.activate(AbstractSpan.java:495)
	at co.elastic.apm.agent.pluginapi.AbstractSpanInstrumentation$ActivateInstrumentation$AdviceClass.activate(AbstractSpanInstrumentation.java:392)
	at co.elastic.apm.api.AbstractSpanImpl.activate(AbstractSpanImpl.java:156)

According to the stack trace above, i guess:

1、the transaction was created by elastic apm agent api in external plugin , so the application classloader of the trace context will be IndyPluginClassLoader

2、As the transaction activating, the MdcActivationListener will be called,and it will try to load slf4j MDC class, using IndyPluginClassLoader, but some oher class has already been loaded by ShadedClassLoader, and throw java.lang.LinkageError

@github-actions github-actions bot added agent-java community Issues and PRs created by the community triage labels Dec 16, 2021
@jackshirazi jackshirazi self-assigned this Dec 20, 2021
@jackshirazi
Copy link
Contributor

To confirm, you are using the apm-agent-plugin-sdk external plugin mechanism where the agent loads the plugin from the experimental elastic.apm.plugins_dir ?

@CoderLan0668
Copy link
Contributor Author

To confirm, you are using the apm-agent-plugin-sdk external plugin mechanism where the agent loads the plugin from the experimental elastic.apm.plugins_dir ?

Yes, the plugin also use the apm-agent-api to create transaction or span

@wolframhaussig
Copy link
Contributor

I see the same using elastic-apm-agent-1.28.1.jar attached in tomcat using -javaagent:

java.lang.LinkageError: loader constraint violation: loader co.elastic.apm.agent.bci.classloading.IndyPluginClassLoader @489488de wants to load interface org.slf4j.Marker. A different interface with the same name was previously loaded by org.apache.catalina.loader.ParallelWebappClassLoader @193a4a55. (org.slf4j.Marker is in unnamed module of loader org.apache.catalina.loader.ParallelWebappClassLoader @193a4a55, parent loader java.net.URLClassLoader @6807989e)
        at co.elastic.logging.logback.EcsEncoder.serializeMarker(EcsEncoder.java:139)
        at co.elastic.logging.logback.EcsEncoder.serializeMarkers(EcsEncoder.java:132)
        at co.elastic.logging.logback.EcsEncoder.encode(EcsEncoder.java:103)
        at co.elastic.logging.logback.EcsEncoder.encode(EcsEncoder.java:43)
        at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:230)
        at ch.qos.logback.core.rolling.RollingFileAppender.subAppend(RollingFileAppender.java:235)
        at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:102)
        at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
        at co.elastic.apm.agent.logback.LogbackAppenderAppendAdvice.shadeLoggingEvent(LogbackAppenderAppendAdvice.java:49)
        at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:103)
        at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
        at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
        at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
        at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
        at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
        at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:383)
        at ch.qos.logback.classic.Logger.info(Logger.java:599)
        ...
        at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:668)
        ...
        at org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:174)
        at org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:199)
        at org.springframework.security.web.authentication.UsernamePasswordAuthenticationFilter.attemptAuthentication(UsernamePasswordAuthenticationFilter.java:94)
        at org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:212)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
        at org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:116)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
        at org.springframework.security.web.csrf.CsrfFilter.doFilterInternal(CsrfFilter.java:124)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
        at org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal(HeaderWriterFilter.java:66)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
        at org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter.doFilterInternal(WebAsyncManagerIntegrationFilter.java:56)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
        at org.springframework.security.web.session.ConcurrentSessionFilter.doFilter(ConcurrentSessionFilter.java:155)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
        at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:105)
        at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
        at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:214)
        at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:177)
        at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:347)
        at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:263)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.springframework.web.multipart.support.MultipartFilter.doFilterInternal(MultipartFilter.java:122)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:197)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        ...
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
        at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:645)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
        at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:609)
        at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
        at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:810)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1506)
        at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.base/java.lang.Thread.run(Thread.java:834)

@jackshirazi
Copy link
Contributor

have you got a simple example of how to create that please, would help enormously

@CoderLan0668
Copy link
Contributor Author

CoderLan0668 commented Dec 22, 2021

have you got a simple example of how to create that please, would help enormously

No simple example now, but the problem can be reproduced easily

1、 create an external plugin,the plugin instrument some api and create a root transaction using apm-agent-api

2、create a simple application using slf4j, and launched with the es apm agent, which load the external plugin above,and also you should enable the log correlation

3、try to invoke the api which instrumented by your external plugin , then you will get the error log.

@tobiasstadler
Copy link
Contributor

tobiasstadler commented Dec 22, 2021

I tried 1.28.2 with https://github.com/tobiasstadler/apm-wildfly-remote-ejb-plugin and log correlation is not working for me either.

I can see

2021-12-22 08:02:57,230 [default task-1] DEBUG co.elastic.apm.agent.mdc.MdcActivationListener - Class loader co.elastic.apm.agent.bci.classloading.IndyPluginClassLoader@1cf9b8f cannot load log4j2 API
java.lang.ClassNotFoundException: org.apache.logging.log4j.ThreadContext
	at net.bytebuddy.dynamic.loading.ByteArrayClassLoader.findClass(ByteArrayClassLoader.java:397) ~[elastic-apm-agent-1.28.3-SNAPSHOT.jar:?]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_312]
	at net.bytebuddy.dynamic.loading.ByteArrayClassLoader$ChildFirst.loadClass(ByteArrayClassLoader.java:1252) ~[elastic-apm-agent-1.28.3-SNAPSHOT.jar:?]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_312]
	at co.elastic.apm.agent.mdc.MdcActivationListener$3.get(MdcActivationListener.java:87) ~[elastic-apm-agent-1.28.3-SNAPSHOT.jar:?]
	at co.elastic.apm.agent.mdc.MdcActivationListener$3.get(MdcActivationListener.java:81) ~[elastic-apm-agent-1.28.3-SNAPSHOT.jar:?]
	at co.elastic.apm.agent.cache.WeakKeySoftValueLoadingCache$CacheValue.<init>(WeakKeySoftValueLoadingCache.java:119) ~[elastic-apm-agent-1.28.3-SNAPSHOT.jar:?]
	at co.elastic.apm.agent.cache.WeakKeySoftValueLoadingCache$CacheValue.<init>(WeakKeySoftValueLoadingCache.java:111) ~[elastic-apm-agent-1.28.3-SNAPSHOT.jar:?]
	at co.elastic.apm.agent.cache.WeakKeySoftValueLoadingCache.get(WeakKeySoftValueLoadingCache.java:76) ~[elastic-apm-agent-1.28.3-SNAPSHOT.jar:?]
	at co.elastic.apm.agent.mdc.MdcActivationListener.before(MdcActivationListener.java:184) ~[elastic-apm-agent-1.28.3-SNAPSHOT.jar:?]
	at co.elastic.apm.agent.mdc.MdcActivationListener.beforeActivate(MdcActivationListener.java:173) ~[elastic-apm-agent-1.28.3-SNAPSHOT.jar:?]
	at co.elastic.apm.agent.impl.ElasticApmTracer.activate(ElasticApmTracer.java:691) ~[elastic-apm-agent-1.28.3-SNAPSHOT.jar:?]
	at co.elastic.apm.agent.impl.transaction.AbstractSpan.activate(AbstractSpan.java:495) ~[elastic-apm-agent-1.28.3-SNAPSHOT.jar:?]
	at co.elastic.apm.agent.pluginapi.AbstractSpanInstrumentation$ActivateInstrumentation$AdviceClass.activate(AbstractSpanInstrumentation.java:392) ~[elastic-apm-agent-1.28.3-SNAPSHOT.jar:?]
	at co.elastic.apm.api.AbstractSpanImpl.activate(AbstractSpanImpl.java:156) ~[?:?]
	at co.elastic.apm.agent.wildfly_remote_ejb.RemoteEJBServerAdvice.onEnterInvokeMethod(RemoteEJBServerAdvice.java:37) ~[?:?]
	at org.jboss.as.ejb3.remote.AssociationImpl.invokeMethod(AssociationImpl.java:535) ~[?:?]

in the apm logs.

LoggerFactory.getLogger("foo").info(ElasticApm.currentTransaction().getId());
LogManager.getLogger("bar").info(ElasticApm.currentTransaction().getId());

shows up as

2021-12-22 08:16:23,867 INFO  [default task-14] [{}] foo - 38d39717a020b5ed
2021-12-22 08:16:23,868 INFO  [default task-14] [{}] bar - 38d39717a020b5ed

in the application logs.

@tobiasstadler
Copy link
Contributor

Maybe the problem is that for internal plugins the applicationClassLoader in TraceContext is (usually) the classLoader that loaded the instrumented class while for external plugins it is the IndyPluginClassLoader with the classLoader that loaded the instrumented class as a parent.

@CoderLan0668 Does your external plugin use SLF4J?

@tobiasstadler
Copy link
Contributor

My problem will by fixed by #2369

@eyalkoren
Copy link
Contributor

Here's my analysis related to external plugins and our log correlation in general: each TraceContext holds a link to what it thinks is the application class loader. This CL is decided when the transaction is created and then permeates to child spans.
Transactions that are created by the API (and their child spans) will use the CL that loaded co.elastic.apm.api.ElasticApm. In cases where the public API is added as a library to the application, this makes sense.
The problem with external plugins is that, as you noted, we use special class loaders for those, that have the agent Class Loader up in their delegation hierarchy. In such cases, using the co.elastic.apm.api.ElasticApm CL for the transaction trace context is conceptually wrong, as it misses the intention of being able to find the right MDC in the context of the application.

What happened to @CoderLan0668 is probably what @tobiasstadler suspects- since the plugin contain slf4j and since IndyPluginClassLoader is a child-first, usages of slf4j within the plugin classes will load the plugin slf4j, whereas usages within the agent (e.g. MdcActivationListener) will load the agent slf4j, hence the clash.

What happens to @tobiasstadler is the result of using the wrong class loader in the trace context. I will add a comment to #2369 and a proposed fix that hopefully will fix both issues.

As for the issue reported by @wolframhaussig , this seems related to the mechanism added in #2167 - now the ECS logging classes have visibility to the logging framework, but they may also have duplicated visibility to slf4j...
Can you please open an issue so that we can handle that separately?

@eyalkoren
Copy link
Contributor

eyalkoren commented Dec 23, 2021

Also important for all here: as described in the official log4j vulnerability advisory, we released 1.26.1 yesterday, which uses log4j 2.12.3 that contains the required fixes for all known vulnerabilities to date.
Since we did a pretty major architectural change in our agent in 1.27.0, I believe that upgrading to 1.26.1 will both eliminate the issues you are all experiencing and scan clear for known vulnerabilities.
Regardless, we will try to address the reported issues as soon as we get available for that.

@CoderLan0668, @tobiasstadler, @wolframhaussig - If you do get the chance to try them out, please provide your feedback.
Thanks!

@eyalkoren eyalkoren added the bug Bugs label Dec 23, 2021
@eyalkoren eyalkoren added this to the 8.1 milestone Dec 23, 2021
@eyalkoren eyalkoren removed the triage label Dec 23, 2021
@wolframhaussig
Copy link
Contributor

@eyalkoren I am already on 1.28.1 so I think your comment does not apply to me

@tobiasstadler
Copy link
Contributor

I tried 1.26.1, but log correlation does not work for me. Infact, I think it never worked for me. But this is due to how class loading works in WildFly, #2369 will solve it for me.

@eyalkoren
Copy link
Contributor

@eyalkoren I am already on 1.28.1 so I think your comment does not apply to me

I think it does apply for you - I don't expect you having this problem in 1.26.1. We will propose a fix once we get to it and then you can try it out and upgrade again.

I tried 1.26.1, but log correlation does not work for me. Infact, I think it never worked for me. But this is due to how class loading works in WildFly, #2369 will solve it for me.

Just saw that, I guess this answers my last question in #2369 (comment) 🙂
Another option I can think of that is generic to all scenarios is to add an Elastic-MDC, so that the agent and ECS logging will use the same ThreadLocal keys to communicate trace/transaction/error IDs between them. But this means you will need to use ECS-formatted logs. Would that be a problem? My assumption is that the vast majority of use cases to have these IDs in logs is in order to utilize our log correlation feature.

@eyalkoren
Copy link
Contributor

@CoderLan0668 you are welcome to try out the proposed fix build. Your feedback would be much appreciated.

@wolframhaussig
Copy link
Contributor

@CoderLan0668, @tobiasstadler, @wolframhaussig - If you do get the chance to try them out, please provide your feedback.
Thanks!

You are right, version 1.26.2 does not produce those errors

@zubairbasha
Copy link

I was facing this issue from two days.
java.lang.LinkageError: loader constraint violation: when resolving method 'org.slf4j.spi.MDCAdapter org.slf4j.MDC.getMDCAdapter()' the class loader java.net.FactoryURLClassLoader @74e6fdef of the current class, ch/qos/logback/classic/spi/LoggingEvent, and the class loader org.eclipse.osgi.internal.loader.EquinoxClassLoader @37b88402 for the method's defining class, org/slf4j/MDC, have different Class objects for the type org/slf4j/spi/MDCAdapter used in the signature (ch.qos.logback.classic.spi.LoggingEvent is in unnamed module of loader java.net.FactoryURLClassLoader @74e6fdef, parent loader org.eclipse.osgi.internal.framework.ContextFinder @37df14d1; org.slf4j.MDC is in unnamed module of loader org.eclipse.osgi.internal.loader.EquinoxClassLoader @37b88402, parent loader 'platform').

Not able to open the jasper report and compile and print

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-java bug Bugs community Issues and PRs created by the community
Projects
None yet
6 participants