Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-sdk: Use compile time reference+@NoMuzzle instead of reflection. #8603

Merged
merged 16 commits into from
Jun 16, 2023

Conversation

Oberon00
Copy link
Member

@Oberon00 Oberon00 commented May 30, 2023

This is a first chunk, if we want to follow this approach, I will add more, also to avoid inconsistencies.

Thank you @laurit for helping me with various hurdles I ran into when trying this! (See #8394)

This is a first chunk, if we want to follow this approach, I can add more.
@Oberon00 Oberon00 requested a review from a team May 30, 2023 16:36
@Oberon00 Oberon00 marked this pull request as draft May 31, 2023 07:32
@Oberon00
Copy link
Member Author

Oberon00 commented May 31, 2023

Sorry, something is still not right here, I will push an update once I figure it out (it's not only the spotless, but also some NPE I can't fully explain).

@github-actions github-actions bot requested a review from theletterf May 31, 2023 09:17
@Oberon00
Copy link
Member Author

My shortcut of adding the muzzle test to library tests was the problem: It drags in the autoconfigure library indirectly through the agent, and this leads to double instrumentation, and conflicting access to the execution attributes.

@Oberon00 Oberon00 marked this pull request as ready for review May 31, 2023 10:09
@Oberon00
Copy link
Member Author

Oberon00 commented Jun 1, 2023

Note that I'd prefer at least a general "yes, let's go ahead with this approach" / "no this does not look good" / etc. before I'd work on the remaining follow-up tasks I'd like to implement for SQS/SNS propagation (as outlined in the description of #8405), since depending on whether this approach finds acceptance, I will have to either implement it with boilerplate reflection classes or not.

@@ -125,35 +126,22 @@ public SdkRequest modifyRequest(
return request;
}

@NoMuzzle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NoMuzzle turns off a pretty important security feature of the agent -- in general I think I'd prefer all the no-muzzle'd methods to be kept as minimal as possible, contained in classes separate from the rest of the code, and either guarded with extra defensive conditions (e.g. it'd be great to move/copy the SqsSendMessageRequestAccess.isInstance check to the beginning of this method) or at the very least heavily commented why it is safe.

With that in mind, WDYT about moving all the affected code into a separate class, and minimizing the entry points? It'd be great if it was just one method that was used outside of it, and the rest of them would be private and encapsulated in the new class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do this then you may run into an issue where classes used only in the @NoMuzzle method are not found. This is because besides the safety checks muzzle plugin also composes the list of the instrumentation classes used. You may need to manually add these classes using getAdditionalHelperClassNames method as described in https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/contributing/writing-instrumentation-module.md#inject-additional-instrumentation-helper-classes-manually-with-the-getadditionalhelperclassnames-method

Copy link
Member

@mateuszrzeszutek mateuszrzeszutek Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's true, although it's true for the current code as well. E.g. MessageAttributeValueMapGetter is only referenced in a @NoMuzzle method.

If we want to avoid that, we can modify the ReferenceCollectingClassVisitor to create a special MethodVisitor for @NoMuzzle methods, that just collects helper classes and not references.

Copy link
Member Author

@Oberon00 Oberon00 Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do this then you may run into an issue where classes used only in the @NoMuzzle method are not found.

I don't think that's true, it seems that @NoMuzzle is not transitive. I.e. if a method annotated with NoMuzzle calls another method, that method is checked again -- I even had to make a lambda a separate method to add NoMuzzle.

So it unfortunately won't help us minimize NoMuzzle if we minimize the entrypoints. But it would still probably be a nice refactoring to move out SQS-specific code to a separate class.

It'd be great if it was just one method that was used outside of it,

It will not be that well-encapsulated, since we need a few different hooks (I think two) where we do SQS specific things.

@NoMuzzle turns off a pretty important security feature of the agent

But: If you use reflection instead, you turn off pretty important safety features of the Java compiler (I think you mean safety here, not security). And IIUC, reflection stops Muzzle from noticing problems just like @NoMuzzle (though it is more fine grained with reflection)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it unfortunately won't help us minimize NoMuzzle if we minimize the entrypoints. But it would still probably be a nice refactoring to move out SQS-specific code to a separate class.

👍

It will not be that well-encapsulated, since we need a few different hooks (I think two) where we do SQS specific things.

That's fine. I just want to make sure that any usage of @NoMuzzle is very deliberate and well documented.

But: If you use reflection instead, you turn off pretty important safety features of the Java compiler (I think you mean safety here, not security).

Yup, safety, I mixed up these two for some reason 🙈
That's true; that's why we've had all these null checks on MethodHandles.

Copy link
Member Author

@Oberon00 Oberon00 Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found a good solution to this build failure by testing the core module and the sqs module in separate muzzle runs using excludedInstrumentationNames (e06b345). When you have multiple other modules (e.g. sqs, and sns) it could get a bit unwieldy, but then adding a feature includedInstrumentationNames to muzzle might not be that hard.

(NB: For some reason, these checks don't run when I run gradle using IntelliJ, but they do run when I execute the same gradle task from the command line)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think excludeInstrumentationName is a very good solution for now. I didn't know we had such an option, would have thought that you'd need to use extraDependency to get around this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excludeInstrumentationName also has the advantage that we test that it works without the SQS references. Which makes me realize, I can probably remove the manual muzzle unit tests as this is now effectively done by this muzzle directive.

Copy link
Member Author

@Oberon00 Oberon00 Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the whole javaagent-unit-tests module in a095c37. We now lose the check that we actually do reference SQS from the SQS module, but that sanity check does not seem valuable enough to justify the module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this GH conversation can be considered "resolved" now?

@Oberon00
Copy link
Member Author

Oberon00 commented Jun 5, 2023

To be honest, I do not really feel up to more fundamental reworks of this concept like splitting into multiple instrumentation modules. Personally, I think the current concept is fine enough to count as improvement (after all the reflection were are also not Muzzle checked, now we have slightly more code not covered by Muzzle, but far less boilerplate and more coverage by basic Java compiler type checks).

Personally (and for Dynatrace, as I'm doing this in my working time), my priority is to get the features mentioned under "Further follow-up tasks" in the description of #8405 done. I thought that refactoring this a bit would be worth the time but if it becomes a larger undertaking, I would rather go ahead with the reflection approach.

These are my first contributions to the java-instrumentation repository, so I don't really know my way around the more intricate parts of the Java agent infrastructure, so it may very well be that my assessment of this PR's approach being an improvement over the status quo is wrong.

I would prefer one of these two options (in order of preference):

  1. Go ahead with the current approach without major changes (i.e., I would address the comments about renaming of course, but would not do larger refactorings like splitting up the module)
  2. Close this PR unmerged for now, and I will go ahead with the old/current reflection approach for the remaining follow-up features I want to add for SQS/SNS.

In the end, I have the feeling that the ideal approach would require some improvements to Muzzle infrastructure. Basically, what we want is all the bells and whistles of Muzzle and no @NoMuzzle anywhere, except that during instrumentation module loading, if a SQS-class is missing (but not present but incompatible), we still want to load the module.

Alternatively, re-architecting the AWS SDK instrumentation to mirror the AWS SDK's plugin split might also work (though not sure if that is possible without introducing much boilerplate) and would have the advantage that you can't accidentally call into classes that are not present, so that would be the safest approach. If you then additionally added a way that the user still gets a single all-in-one maven/gradle module that adds everything without picking manually, that would be great.
But I'm not up for working on such a large refactoring at this time.

@laurit
Copy link
Contributor

laurit commented Jun 6, 2023

To be honest, I do not really feel up to more fundamental reworks of this concept like splitting into multiple instrumentation modules. Personally, I think the current concept is fine enough to count as improvement (after all the reflection were are also not Muzzle checked, now we have slightly more code not covered by Muzzle, but far less boilerplate and more coverage by basic Java compiler type checks).

The purpose of muzzle checks is to improve run time safety by verifying that the methods that the instrumentation calls exist and are accessible. The checks reflection code performs are essentially the same as what muzzle does, if it does not find the method it reverts to do nothing and avoids failing at runtime.
Using plain java is definitely much nicer than using reflection, but the main question here is do we want to give up on the muzzle checks. One could argue that library instrumentation does not have the muzzle checks and if there is an incompatible change in the instrumented library runtime failures are a possibility, why bother with the agent, could let it fail the same way. I prepared a small sample for you that shows how you could use java code with @NoMuzzle and also keep the muzzle checks https://github.com/dynatrace-oss-contrib/opentelemetry-java-instrumentation/compare/aws-sdk-no-muzzle...laurit:opentelemetry-java-instrumentation:aws-no-muzzle-test?expand=1 Something like that only works when you can isolate the code that touches sqs types from the rest of the code.

Personally (and for Dynatrace, as I'm doing this in my working time), my priority is to get the features mentioned under "Further follow-up tasks" in the description of #8405 done. I thought that refactoring this a bit would be worth the time but if it becomes a larger undertaking, I would rather go ahead with the reflection approach.

Unfortunately what you are trying here is something new, we don't have any sample code that you could learn from. Totally understand it if you want to give up on this.

@Oberon00
Copy link
Member Author

Oberon00 commented Jun 6, 2023

Thank you, that branch is quite helpful.

// types that are only used in @NoMuzzle methods are not injected, with agent here you'll get
// NoClassDefFoundError for MessageAttributeValueMapGetter

I think this means that I should run the respective unit test from the agent module as well (currently it runs only in the library module with the settings that enable this code path)

I also noticed you did manage to put two instrumentation modules in the same gradle module, with the same primary name. Is this OK then? I think it would reduce the boilerplate a bit.

@Oberon00
Copy link
Member Author

Oberon00 commented Jun 7, 2023

So here's what I found while running the tests also for the java-agent ae2e8d8:

  • Indeed just as you noted @laurit, the NoMuzzle prevents the class from being injected, causing a class loader exception (most unit tests fail because of a strict context violation that is caused by this instead of directly because of this, interestingly)
  • Muzzle only seems to collect references to methods, so if you just use the type in the signature or even cast to it (at least when the cast only involves the class in a type parameter position), it does not cause Muzzle to collect a reference. Hopefully/probably this is in line with JVM rules. So in this case, the relevant @NoMuzzle annotations could be removed which causes all classes to be found again.
  • The design of the generic SDK tests is not fit for the SQS injection as it hard-codes an MD5 and has no way to calculate it dynamically (since the span + trace ID is part of the message attributes, the MD5 will always be different). This causes an exception in the client SDK. I removed the test and instead updated the SQS-specific test to also test the async client.

@Oberon00
Copy link
Member Author

(See #8603 (comment) for latest comments)

@laurit
Copy link
Contributor

laurit commented Jun 14, 2023

  • Muzzle only seems to collect references to methods, so if you just use the type in the signature or even cast to it (at least when the cast only involves the class in a type parameter position), it does not cause Muzzle to collect a reference. Hopefully/probably this is in line with JVM rules. So in this case, the relevant @NoMuzzle annotations could be removed which causes all classes to be found again.

The relevant code is in


If I understand it correctly then it collects used method parameters & return type, field type, type used in instanceof, checkcast or new, class used in Foo.class. It does not look at generic signatures, which is fine as vm doesn't care about these either.

<SendMessageResponse>
<SendMessageResult>
<MD5OfMessageBody>d41d8cd98f00b204e9800998ecf8427e</MD5OfMessageBody>
<MD5OfMessageAttributes>3ae8f24a165a8cedc005670c81a27295</MD5OfMessageAttributes>
Copy link
Member Author

@Oberon00 Oberon00 Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to drop this test because of this hardcoded MD5. Alternative would be to run the test suite without the experimental message propgator enabled.

However, I instead added an async variant to the SQS-specific test class, so hopefully still covering everything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it helps you then setting a flag for only some tests is also quite easy

tasks {
val testConnectionSpan by registering(Test::class) {
filter {
includeTestsMatching("Netty41ConnectionSpanTest")
includeTestsMatching("Netty41ClientSslTest")
}
include("**/Netty41ConnectionSpanTest.*", "**/Netty41ClientSslTest.*")
jvmArgs("-Dotel.instrumentation.netty.connection-telemetry.enabled=true")
jvmArgs("-Dotel.instrumentation.netty.ssl-telemetry.enabled=true")
}
test {
systemProperty("testLatestDeps", findProperty("testLatestDeps") as Boolean)
filter {
excludeTestsMatching("Netty41ConnectionSpanTest")
excludeTestsMatching("Netty41ClientSslTest")
}
}
check {
dependsOn(testConnectionSpan)
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good idea, but would also require some reshuffling of the tests, as I don't think (or wouldn't know how) you can exclude tests based on parameters.

Would it be OK not to add this in this PR anymore?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine, as there are not the only tests that exercise sqs instrumentation. Idk whether this is what you are looking for, but to conditionally disable tests we usually use

For java tests it is pretty much the same but use junit5 Assumptions class.

@trask trask merged commit 5b4e05a into open-telemetry:main Jun 16, 2023
@trask
Copy link
Member

trask commented Jun 16, 2023

thx @Oberon00 (and @laurit)!

@Oberon00
Copy link
Member Author

Thank you very much for your help @laurit and @mateuszrzeszutek!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants