Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for setting the application class loader when starting a transaction via the public api #2369

Conversation

tobiasstadler
Copy link
Contributor

What does this PR do?

Adds support for setting the application class lower when starting a transaction via the public api (e.g. external plugins).

My external plugin instruments a class which is not loaded by the class loader of my application and does not have access to application classes (but it does know what the correct class loader is). For external plugins the initiating class loader of a transaction is a IndyPluginClassLoader with the class loader of the instrumented class as its parent. So when the MdcActivationListener tries to lookup the logging implementations it cannot find any, as they are not part of the class loader of the instrumented class.

By specifying the initiating class loader (the one with the application and logging classes) when starting a transaction the log correlation is then able to find the used logging implementation.

Checklist

  • This is an enhancement of existing features, or a new feature in existing plugins
    • I have updated CHANGELOG.asciidoc
    • I have added tests that prove my fix is effective or that my feature works
    • Added an API method or config option? Document in which version this will be introduced
    • I have made corresponding changes to the documentation

@github-actions
Copy link

👋 @tobiasstadler Thanks a lot for your contribution!

It may take some time before we review a PR, so even if you don’t see activity for some time, it does not mean that we have forgotten about it.

Every once in a while we go through a process of prioritization, after which we are focussing on the tasks that were planned for the upcoming milestone. The prioritization status is typically reflected through the PR labels. It could be pending triage, a candidate for a future milestone, or have a target milestone set to it.

@apmmachine
Copy link
Contributor

apmmachine commented Dec 22, 2021

❕ Build Aborted

The PR is not allowed to run in the CI yet

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Reason: The PR is not allowed to run in the CI yet

  • Start Time: 2022-01-28T13:39:22.108+0000

  • Duration: 8 min 8 sec

  • Commit: 2fc0d8b

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run benchmark tests : Run the benchmark tests.

  • run jdk compatibility tests : Run the JDK Compatibility tests.

  • run integration tests : Run the Agent Integration tests.

  • run end-to-end tests : Run the APM-ITs.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@eyalkoren
Copy link
Contributor

Thanks for the proposed fix @tobiasstadler ! ❤️

While this solves your problem and it worths considering, it can only work when the plugin class knows the application class loader, so maybe it's not a very generic solution. In addition, it extends the API in a way that raises the proficiency bar and our aim is to keep it as simple as possible.

The general problem is that we use an incorrect CL, as explained in #2350 .
An easy fix for that is to avoid using agent class loaders as trace context CLs. If we just set null, the MDC listener will fallback to try the context CL or the System CL, which may be sufficient. Another option is to use the actual loader of the class instrumented by the plugin, as it can be available for us through the IndyPluginClassLoader. And maybe the best is try both - use the instrumentation library CL and in the MDC listener try that AND the fallback if it fails.

@tobiasstadler
Copy link
Contributor Author

tobiasstadler commented Dec 23, 2021

Thank you for your feedback!

Yes, this is by no means a generic solution. But I think I still need this, even if a generic solution for #2350 is available. As I said neither the class loader of the instrumented class nor the context class loader at that time have access to the the log4j classes. So in my opinion the MdcActivationListener will still not be able to load the log4j classes and therefore should not be able to add the ids to the trace context.
Yes, this problem is specific to my plugin and of course I could manually add the ids to the trace context. But I prefer to use the existing solution.

@eyalkoren
Copy link
Contributor

As I said neither the class loader of the instrumented class nor the context class loader at that time have access to the the log4j classes.

Sorry, I didn't see anywhere above that you mentioned that the context CL doesn't have access to the MDC. Did you try to set the CL to null and verify that Context CL indeed doesn't have access to it? What's being set as the context CL of the request executing thread? Is this occurring within a JBoss-managed thread pool, or some custom thread pool?

The general concept of MDC (or ThreadContext) assumes that either the logging library (or facade) is globally available, or that it is available through the context CL to any code that want to update it. I'd look for alternatives before providing the agent with a CL that can load classes out of context. It may not be, but it sounds like a potential for trouble :) .

You report this doesn't work when used in an external plugins, as opposed to internal plugin. Assuming that your internal plugin instrumentation would have been the same, what would you set there as "initiating Class Loader" if you instrumented the same class as an internal plugin?

@tobiasstadler
Copy link
Contributor Author

I don‘t think it will work if my plugin was an internal plugin, because the context class loader is the ejb module class loader (which does not have access the the application/logging classes) when I start/activate the transaction. So the MdcActivationListener shouldn‘t be able to find my logging class.

But I think I could set the context class loader to my application class loader (with access to the logging classes) before starting/activating the transaction and restore the old class loader after the activation. So if the generic solution falls back to the context class loader it could also work for me.

@tobiasstadler
Copy link
Contributor Author

I changed my plugin to change the context class loader before starting/activating the transaction, set the applicationClassLoader of the TraceContext to null (using the debugger) and log correlation works for me (with 1.26.1, but the same approach should also work for me with 1.28.3)

@tobiasstadler
Copy link
Contributor Author

I implemented "try the application class loader AND fallback to the cortex class loader" in #2374 (works for me)

@eyalkoren
Copy link
Contributor

I changed my plugin to change the context class loader before starting/activating the transaction, set the applicationClassLoader of the TraceContext to null (using the debugger) and log correlation works for me (with 1.26.1, but the same approach should also work for me with 1.28.3)

I want to provide a wholistic fix for this external plugins CL issue, so I would be grateful if you provide some more concrete details:

  • What Servlet container are you using? Is it OSGi?
  • What is set by default as the context CL when your instrumentation is invoked?
  • What is your library class loader?
  • What is your applicationClassLoader?
  • How can you have access to the applicationClassLoader from your plugin?
  • Is it possible that the instrumented library is loaded more than once in your system (i.e. by multiple class loaders in different contexts)?

@tobiasstadler
Copy link
Contributor Author

@eyalkoren The problematic plugin is: https://github.com/tobiasstadler/apm-wildfly-remote-ejb-plugin. The integration test is close enough to our actual production use case.

Regarding you questions:

@tobiasstadler
Copy link
Contributor Author

e.g. in co.elastic.apm.agent.wildfly_remote_ejb.RemoteEJBServerAdvice#onEnterInvokeMethod:

Context Class Loader (Thread.currentThread().getContextClassLoader()): ModuleClassLoader for Module "org.jboss.ejb-client" version 4.0.44.Final from local module loader @6057aebb (finder: local module finder @63eef88a (roots: /opt/jboss/wildfly/modules,/opt/jboss/wildfly/modules/system/layers/base))

Application Class Loader (componentView.getViewClass().getClassLoader()): ModuleClassLoader for Module "deployment.apm-wildfly-remote-ejb-plugin-it.war" from Service Module Loader

@eyalkoren
Copy link
Contributor

I see, the JBoss module system. So with regards to:

As far as I know the org.jboss.ejb-client JBoss Module is shared between all deployed applications, so I guess it is not possible to load it more than once

I believe you can use multiple versions for a module, is that correct? If so, multiple class loaders may load versions of the class that your plugin is instrumenting.

Either way, this addresses my general consideration - the usefulness of setting the library class loader for this purpose is doubtful.

Another solution would be to use a ThreadLocal instead of MDC when the latter is unavailable and use that when logging actually occurs. These are the two main options I have in mind:

  1. Instrumenting the logging frameworks to capture all logging events and if the dedicated ThreadLocal is set, it means that the MDC could not be set, so set trace IDs in the MDC before logging the event and remove it after (without the risk of not being to load the MDC/ThreadContext class).
    • Pro: generic
    • Con: complicated and riskier - requires a new instrumentation for each logging framework, possibly multiple implementations for different versions
  2. In ECS-logging, if trace/transaction/error IDs are not available through MDC, try reading them through the same ThreadLocal
    • Pro: simpler and cheaper
    • Cons: would take effect only when using ECS-logging

@eyalkoren
Copy link
Contributor

Let me think about it for a bit and see if and how it is related to the fix we need to do anyway (we shouldn't set an agent CL as described in #2350 ).
Maybe we will end up extending the API as you propose eventually or as an additional workaround.
My goal is to find a way for it to work without requiring you to specify a CL and definitely without changing the context class loader.

@tobiasstadler
Copy link
Contributor Author

Shouldn‘t setting the „correct“ class loader (by the external plugin) as context class loader before activating the transaction also work, when the logging plugin falls back to context class loader? It requires the external plugin to set/unset the context class loader, but it is easy to implement in the agent. I think that my problem is an edge case and in most cases the context class loader will be the „correct“ one, so most external plugins probably won‘t need to do anything.
Another option would be to allow the external plugin to set the application class loader when starting the transaction
Or maybe the external plugin can use the context class loader as application class loader when starting a transaction. But using the context class loader will probably break some stuff.

@tobiasstadler
Copy link
Contributor Author

Let me think about it for a bit and see if and how it is related to the fix we need to do anyway (we shouldn't set an agent CL as described in #2350 ).
Maybe we will end up extending the API as you propose eventually or as an additional workaround.
My goal is to find a way for it to work without requiring you to specify a CL and definitely without changing the context class loader.

I am, fine with changing the context class loader in my plugin.

@eyalkoren
Copy link
Contributor

Sorry, I wasn't really clear on that- I definitely think we should fallback to looking in the context CL not only when the "initiating CL" is not set, but also when it is set and fails to load the MDC class, so I assume we want #2374 anyway (that's what I proposed above).
However, I don't want to force you to set the context CL, it may have side effects that you currently don't see, but users of your plugin (or you in the future) do not expect.
Let me digest everything and then we'll merge everything that is in accordance to our solution.

@tobiasstadler
Copy link
Contributor Author

#2350 is not really an issue for me as I will deploy a workaround to make log correlation work for me. So no need to hurry (at least for my issue).

@tobiasstadler
Copy link
Contributor Author

@eyalkoren #2376 works for me. Thanks You!

@AlexanderWert AlexanderWert added this to the 8.1 milestone Jan 10, 2022
@tobiasstadler
Copy link
Contributor Author

@eyalkoren Did you already get a chance to think about this change?

@felixbarny
Copy link
Member

Likely to be superseded by #2412

@tobiasstadler
Copy link
Contributor Author

While #2412 may solve the MDC issue, the mechanism to overwrite the service name will still not work, because the transaction is associated with no/the wrong class loader. In my case the transaction will have jboss-application as service name instead of the service name of my deployment (class loader).

@eyalkoren eyalkoren mentioned this pull request Jan 27, 2022
20 tasks
@tobiasstadler
Copy link
Contributor Author

@felixbarny @eyalkoren While #2412 solves the log correlation case, but overwriting the service name still doesn't work (in my case). Could we please adapt the public api as proposed in this PR.

@SylvainJuge SylvainJuge added the size:medium Medium (M) tasks label Jan 31, 2022
@felixbarny
Copy link
Member

Adding a class loader argument to startTransaction creates a (small) combinatorial explosion and forces users to think about whether and which class loader they need to set.
But I think we could just make it an instance method of Transaction::setInitiatingClassLoader. Then it's clearly optional and folks that don't need it don't have to worry about it. This should work as TraceContext:setApplicationClassLoader is mutable/can be set after starting the transaction. We'll just need to make sure that we also set the service name and version when the class loader is set after transaction start.

There can be issues when setting the class loader after spans have already created. The consequence is that some spans may have the wrong service name and version. I think that risk is acceptable. We can document that the method should be called before creating child spans.

@tobiasstadler
Copy link
Contributor Author

@felixbarny Thank You for your review. I will come up with an implementation in the next days

@tobiasstadler
Copy link
Contributor Author

@felixbarny I implemented you suggestion in #2444

@tobiasstadler
Copy link
Contributor Author

#2444 is the way to go

@tobiasstadler tobiasstadler deleted the external-plugin-application-class-loader branch February 3, 2022 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-java community Issues and PRs created by the community size:medium Medium (M) tasks triage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants