-
Notifications
You must be signed in to change notification settings - Fork 828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws-sdk: Use compile time reference+@NoMuzzle instead of reflection. #8603
aws-sdk: Use compile time reference+@NoMuzzle instead of reflection. #8603
Conversation
This is a first chunk, if we want to follow this approach, I can add more.
Sorry, something is still not right here, I will push an update once I figure it out (it's not only the spotless, but also some NPE I can't fully explain). |
My shortcut of adding the muzzle test to library tests was the problem: It drags in the autoconfigure library indirectly through the agent, and this leads to double instrumentation, and conflicting access to the execution attributes. |
.../src/main/java/io/opentelemetry/instrumentation/awssdk/v2_2/SqsSendMessageRequestAccess.java
Show resolved
Hide resolved
Note that I'd prefer at least a general "yes, let's go ahead with this approach" / "no this does not look good" / etc. before I'd work on the remaining follow-up tasks I'd like to implement for SQS/SNS propagation (as outlined in the description of #8405), since depending on whether this approach finds acceptance, I will have to either implement it with boilerplate reflection classes or not. |
@@ -125,35 +126,22 @@ public SdkRequest modifyRequest( | |||
return request; | |||
} | |||
|
|||
@NoMuzzle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NoMuzzle
turns off a pretty important security feature of the agent -- in general I think I'd prefer all the no-muzzle'd methods to be kept as minimal as possible, contained in classes separate from the rest of the code, and either guarded with extra defensive conditions (e.g. it'd be great to move/copy the SqsSendMessageRequestAccess.isInstance
check to the beginning of this method) or at the very least heavily commented why it is safe.
With that in mind, WDYT about moving all the affected code into a separate class, and minimizing the entry points? It'd be great if it was just one method that was used outside of it, and the rest of them would be private and encapsulated in the new class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do this then you may run into an issue where classes used only in the @NoMuzzle
method are not found. This is because besides the safety checks muzzle plugin also composes the list of the instrumentation classes used. You may need to manually add these classes using getAdditionalHelperClassNames
method as described in https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/contributing/writing-instrumentation-module.md#inject-additional-instrumentation-helper-classes-manually-with-the-getadditionalhelperclassnames-method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's true, although it's true for the current code as well. E.g. MessageAttributeValueMapGetter
is only referenced in a @NoMuzzle
method.
If we want to avoid that, we can modify the ReferenceCollectingClassVisitor
to create a special MethodVisitor
for @NoMuzzle
methods, that just collects helper classes and not references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do this then you may run into an issue where classes used only in the @NoMuzzle method are not found.
I don't think that's true, it seems that @NoMuzzle
is not transitive. I.e. if a method annotated with NoMuzzle calls another method, that method is checked again -- I even had to make a lambda a separate method to add NoMuzzle.
So it unfortunately won't help us minimize NoMuzzle if we minimize the entrypoints. But it would still probably be a nice refactoring to move out SQS-specific code to a separate class.
It'd be great if it was just one method that was used outside of it,
It will not be that well-encapsulated, since we need a few different hooks (I think two) where we do SQS specific things.
@NoMuzzle turns off a pretty important security feature of the agent
But: If you use reflection instead, you turn off pretty important safety features of the Java compiler (I think you mean safety here, not security). And IIUC, reflection stops Muzzle from noticing problems just like @NoMuzzle
(though it is more fine grained with reflection)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it unfortunately won't help us minimize NoMuzzle if we minimize the entrypoints. But it would still probably be a nice refactoring to move out SQS-specific code to a separate class.
👍
It will not be that well-encapsulated, since we need a few different hooks (I think two) where we do SQS specific things.
That's fine. I just want to make sure that any usage of @NoMuzzle
is very deliberate and well documented.
But: If you use reflection instead, you turn off pretty important safety features of the Java compiler (I think you mean safety here, not security).
Yup, safety, I mixed up these two for some reason 🙈
That's true; that's why we've had all these null checks on MethodHandle
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I found a good solution to this build failure by testing the core module and the sqs module in separate muzzle runs using excludedInstrumentationNames (e06b345). When you have multiple other modules (e.g. sqs, and sns) it could get a bit unwieldy, but then adding a feature includedInstrumentationNames to muzzle might not be that hard.
(NB: For some reason, these checks don't run when I run gradle using IntelliJ, but they do run when I execute the same gradle task from the command line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think excludeInstrumentationName
is a very good solution for now. I didn't know we had such an option, would have thought that you'd need to use extraDependency
to get around this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excludeInstrumentationName also has the advantage that we test that it works without the SQS references. Which makes me realize, I can probably remove the manual muzzle unit tests as this is now effectively done by this muzzle directive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the whole javaagent-unit-tests module in a095c37. We now lose the check that we actually do reference SQS from the SQS module, but that sanity check does not seem valuable enough to justify the module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this GH conversation can be considered "resolved" now?
instrumentation/aws-sdk/aws-sdk-2.2/library-extra-test/build.gradle.kts
Outdated
Show resolved
Hide resolved
...est/src/test/java/io/opentelemetry/instrumentation/awssdk/v2_2/SqsNoMuzzleReferenceTest.java
Outdated
Show resolved
Hide resolved
...est/src/test/java/io/opentelemetry/instrumentation/awssdk/v2_2/SqsNoMuzzleReferenceTest.java
Outdated
Show resolved
Hide resolved
To be honest, I do not really feel up to more fundamental reworks of this concept like splitting into multiple instrumentation modules. Personally, I think the current concept is fine enough to count as improvement (after all the reflection were are also not Muzzle checked, now we have slightly more code not covered by Muzzle, but far less boilerplate and more coverage by basic Java compiler type checks). Personally (and for Dynatrace, as I'm doing this in my working time), my priority is to get the features mentioned under "Further follow-up tasks" in the description of #8405 done. I thought that refactoring this a bit would be worth the time but if it becomes a larger undertaking, I would rather go ahead with the reflection approach. These are my first contributions to the java-instrumentation repository, so I don't really know my way around the more intricate parts of the Java agent infrastructure, so it may very well be that my assessment of this PR's approach being an improvement over the status quo is wrong. I would prefer one of these two options (in order of preference):
In the end, I have the feeling that the ideal approach would require some improvements to Muzzle infrastructure. Basically, what we want is all the bells and whistles of Muzzle and no Alternatively, re-architecting the AWS SDK instrumentation to mirror the AWS SDK's plugin split might also work (though not sure if that is possible without introducing much boilerplate) and would have the advantage that you can't accidentally call into classes that are not present, so that would be the safest approach. If you then additionally added a way that the user still gets a single all-in-one maven/gradle module that adds everything without picking manually, that would be great. |
The purpose of muzzle checks is to improve run time safety by verifying that the methods that the instrumentation calls exist and are accessible. The checks reflection code performs are essentially the same as what muzzle does, if it does not find the method it reverts to do nothing and avoids failing at runtime.
Unfortunately what you are trying here is something new, we don't have any sample code that you could learn from. Totally understand it if you want to give up on this. |
Thank you, that branch is quite helpful.
I think this means that I should run the respective unit test from the agent module as well (currently it runs only in the library module with the settings that enable this code path) I also noticed you did manage to put two instrumentation modules in the same gradle module, with the same primary name. Is this OK then? I think it would reduce the boilerplate a bit. |
So here's what I found while running the tests also for the java-agent ae2e8d8:
|
(See #8603 (comment) for latest comments) |
.../aws-sdk-2.2/library/src/main/java/io/opentelemetry/instrumentation/awssdk/v2_2/SqsImpl.java
Outdated
Show resolved
Hide resolved
The relevant code is in Line 308 in f0b9a25
If I understand it correctly then it collects used method parameters & return type, field type, type used in instanceof, checkcast or new, class used in Foo.class. It does not look at generic signatures, which is fine as vm doesn't care about these either. |
.../src/main/java/io/opentelemetry/instrumentation/awssdk/v2_2/TracingExecutionInterceptor.java
Outdated
Show resolved
Hide resolved
<SendMessageResponse> | ||
<SendMessageResult> | ||
<MD5OfMessageBody>d41d8cd98f00b204e9800998ecf8427e</MD5OfMessageBody> | ||
<MD5OfMessageAttributes>3ae8f24a165a8cedc005670c81a27295</MD5OfMessageAttributes> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to drop this test because of this hardcoded MD5. Alternative would be to run the test suite without the experimental message propgator enabled.
However, I instead added an async variant to the SQS-specific test class, so hopefully still covering everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it helps you then setting a flag for only some tests is also quite easy
opentelemetry-java-instrumentation/instrumentation/netty/netty-4.1/javaagent/build.gradle.kts
Lines 45 to 68 in 4a0d725
tasks { | |
val testConnectionSpan by registering(Test::class) { | |
filter { | |
includeTestsMatching("Netty41ConnectionSpanTest") | |
includeTestsMatching("Netty41ClientSslTest") | |
} | |
include("**/Netty41ConnectionSpanTest.*", "**/Netty41ClientSslTest.*") | |
jvmArgs("-Dotel.instrumentation.netty.connection-telemetry.enabled=true") | |
jvmArgs("-Dotel.instrumentation.netty.ssl-telemetry.enabled=true") | |
} | |
test { | |
systemProperty("testLatestDeps", findProperty("testLatestDeps") as Boolean) | |
filter { | |
excludeTestsMatching("Netty41ConnectionSpanTest") | |
excludeTestsMatching("Netty41ClientSslTest") | |
} | |
} | |
check { | |
dependsOn(testConnectionSpan) | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good idea, but would also require some reshuffling of the tests, as I don't think (or wouldn't know how) you can exclude tests based on parameters.
Would it be OK not to add this in this PR anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine, as there are not the only tests that exercise sqs instrumentation. Idk whether this is what you are looking for, but to conditionally disable tests we usually use
Line 45 in ca24c38
Assumptions.assumeTrue(!abortPrematch || testAbortPrematch()) |
Assumptions
class.
...ws-sdk-2.2/library/src/main/java/io/opentelemetry/instrumentation/awssdk/v2_2/SqsAccess.java
Outdated
Show resolved
Hide resolved
...c/main/groovy/io/opentelemetry/instrumentation/awssdk/v2_2/AbstractAws2SqsTracingTest.groovy
Show resolved
Hide resolved
Thank you very much for your help @laurit and @mateuszrzeszutek! |
This is a first chunk, if we want to follow this approach, I will add more, also to avoid inconsistencies.
Thank you @laurit for helping me with various hurdles I ran into when trying this! (See #8394)