Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Netty 4.1.111.Final and switch to use grpc-netty-shaded #4426

Merged
merged 14 commits into from
Jun 13, 2024

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Jun 11, 2024

Motivation

Netty 4.1.111.Final contains important fixes:

On the Pulsar side these address Broker stability when TLS with Bookkeeper V2 protocol is used between Broker and Bookies. The Bookkeeper client didn't contain a workaround for the Netty SslHandler "feature" like Pulsar did. That's why Bookkeeper client was impacted. Netty 4.1.111.Final will address those stability issues with the Bookkeeper client when using V2 protocol over TLS.

Changes

  • Upgrade Netty to 4.1.111.Final
  • Switch to use grpc-netty-shaded instead of grpc-netty
  • Add module to shade jetcd-core partially so that it can work with grpc-netty-shaded.
    • jetcd-core library and it's transient dependency vertx-grpc has a direct dependency on grpc-netty and it doesn't work with grpc-netty-shaded without this solution.
    • this solution avoids the need to shade grpc libraries completely which would cause a lot of duplication and increase build times.

Copy link
Member

@hezhangjian hezhangjian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI will be fixed in #4429

@hezhangjian
Copy link
Member

Do you think it's valuable cherry-pick to branch-4.16?

@lhotari
Copy link
Member Author

lhotari commented Jun 12, 2024

Do you think it's valuable cherry-pick to branch-4.16?

Yes, however there will be the need to solve the grpc incompatibility issue: https://lists.apache.org/thread/ysr7041jjdgqo0f0bghm9rswzmsddf3v .
Also in Slack thread https://apachebookkeeper.slack.com/archives/C6G5104SF/p1717752326074469?thread_ts=1717748452.052619&cid=C6G5104SF .
My last comment was:

I guess we could try bumping grpc version and keeping protobuf version the same as before. Upgrading protobuf would be a breaking change for Pulsar.
It's possible that upgrading grpc doesn't cause any issues as long as Pulsar uses the same version when it upgrades to use 4.16.6

@hezhangjian
Copy link
Member

@lhotari Okay, my colleague @zhiheng123 is trying to solve branch-4.16 problem. For me, I would like to finish the 4.17.1 release first.

@lhotari
Copy link
Member Author

lhotari commented Jun 12, 2024

@shoothzj @zhiheng123 Pulsar CI is also failing in grpc with 4.1.111.Final. So I'm currently investigating similar issues.
I'm planning to try if upgrading grpc version to 1.60.2 would be an option.

@hezhangjian
Copy link
Member

@lhotari @zhiheng123 has do trying in #4420 and also his fork repo. I think we can share some information about trying upgrade gRPC later.

@lhotari
Copy link
Member Author

lhotari commented Jun 12, 2024

Results on Pulsar side: apache/pulsar#22892 (comment)

@lhotari
Copy link
Member Author

lhotari commented Jun 12, 2024

It seems that grpc-netty dependency will have to be excluded and replaced by grpc-netty-shaded since grpc is compatible only with specific Netty versions.

@hezhangjian
Copy link
Member

@lhotari Nice work! I think it might be the right way. It can solve both master branch error and branch-4.16 error.

@lhotari
Copy link
Member Author

lhotari commented Jun 12, 2024

Unfortunately jetcd-core isn't compatible with grpc-netty-shaded. There are conflicts with io.grpc.netty.GrpcSslContexts/io.grpc.netty.shaded.io.grpc.netty.GrpcSslContexts and io.netty.handler.ssl.SslContext/io.grpc.netty.shaded.io.netty.handler.ssl.SslContext classes. The only solution seems to be to do custom shading for jetcd-core and jetcd-test.

@lhotari
Copy link
Member Author

lhotari commented Jun 12, 2024

I created etcd-io/jetcd#1370 to jetcd project about the issue.

@lhotari
Copy link
Member Author

lhotari commented Jun 12, 2024

Custom shading of jetcd-core and jetcd-test will slow down the build which is unfortunate. One possibility would be to exclude the modules from the main build (or move them to another repository) and publish them only periodically when the version of jetcd gets updated. Not great. It's too bad that this is causing additional headaches.

@lhotari
Copy link
Member Author

lhotari commented Jun 12, 2024

In Pulsar, I found a reasonable solution by shading jetcd-core in a way where vertx is relocated and included, but grpc is
switched to use grpc-netty-shaded instead of grpc-netty. stream-storage-java-client supports switching to use grpc-netty-shaded without making changes to the module.

I'll apply a similar solution to Bookkeeper soon.

@lhotari lhotari force-pushed the lh-netty-4.1.111 branch from d2f33b1 to 4c7aaad Compare June 12, 2024 14:05
@lhotari lhotari force-pushed the lh-netty-4.1.111 branch from 4c7aaad to 527dc8d Compare June 12, 2024 14:23
@lhotari lhotari changed the title Upgrade to Netty 4.1.111.Final Upgrade to Netty 4.1.111.Final and switch to use grpc-netty-shaded Jun 12, 2024
@lhotari
Copy link
Member Author

lhotari commented Jun 12, 2024

@shoothzj @zhiheng123 @dlg99 Please review now. I was finally able to find a solution where grpc-netty is replaced with grpc-netty-shaded. That fixes the compatibility issues. One of the challenges was to preserve IntelliJ support and provide correct metadata for the shaded jetcd library that has a direct dependency on grpc-netty and doesn't work with grpc-netty-shaded. The shading handles the relocation so that jetcd-core and it's transient dependency vertx-grpc will be relocated to use grpc-netty-shaded packages without needing to shade and relocate grpc libraries.

@lhotari lhotari requested review from dlg99 and merlimat June 12, 2024 15:31
Copy link
Member

@hezhangjian hezhangjian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks for lari. I found one nit, I put it in comment

@hezhangjian
Copy link
Member

I didn't see bcpkix-jdk15on in grpc-all's dependnecy, could you please have a check? I run mvn dependency:tree in my laptop.

[INFO] com.shoothzj:grpc-examples:jar:1.0-SNAPSHOT
[INFO] \- io.grpc:grpc-all:jar:1.64.0:compile
[INFO]    +- io.grpc:grpc-api:jar:1.64.0:compile
[INFO]    |  +- com.google.code.findbugs:jsr305:jar:3.0.2:compile
[INFO]    |  \- com.google.errorprone:error_prone_annotations:jar:2.23.0:compile
[INFO]    +- io.grpc:grpc-auth:jar:1.64.0:compile
[INFO]    |  \- com.google.auth:google-auth-library-credentials:jar:1.22.0:compile
[INFO]    +- io.grpc:grpc-core:jar:1.64.0:compile
[INFO]    |  +- com.google.code.gson:gson:jar:2.10.1:compile
[INFO]    |  +- com.google.android:annotations:jar:4.1.1.4:runtime
[INFO]    |  +- org.codehaus.mojo:animal-sniffer-annotations:jar:1.23:runtime
[INFO]    |  +- io.perfmark:perfmark-api:jar:0.26.0:runtime
[INFO]    |  \- io.grpc:grpc-context:jar:1.64.0:compile
[INFO]    +- io.grpc:grpc-grpclb:jar:1.64.0:compile
[INFO]    |  +- com.google.protobuf:protobuf-java:jar:3.25.1:compile
[INFO]    |  \- com.google.protobuf:protobuf-java-util:jar:3.25.1:compile
[INFO]    +- io.grpc:grpc-inprocess:jar:1.64.0:compile
[INFO]    +- io.grpc:grpc-netty:jar:1.64.0:compile
[INFO]    |  +- io.netty:netty-codec-http2:jar:4.1.100.Final:compile
[INFO]    |  |  +- io.netty:netty-common:jar:4.1.100.Final:compile
[INFO]    |  |  +- io.netty:netty-buffer:jar:4.1.100.Final:compile
[INFO]    |  |  +- io.netty:netty-transport:jar:4.1.100.Final:compile
[INFO]    |  |  |  \- io.netty:netty-resolver:jar:4.1.100.Final:compile
[INFO]    |  |  +- io.netty:netty-codec:jar:4.1.100.Final:compile
[INFO]    |  |  +- io.netty:netty-handler:jar:4.1.100.Final:compile
[INFO]    |  |  \- io.netty:netty-codec-http:jar:4.1.100.Final:compile
[INFO]    |  +- io.netty:netty-handler-proxy:jar:4.1.100.Final:runtime
[INFO]    |  |  \- io.netty:netty-codec-socks:jar:4.1.100.Final:runtime
[INFO]    |  \- io.netty:netty-transport-native-unix-common:jar:4.1.100.Final:compile
[INFO]    +- io.grpc:grpc-okhttp:jar:1.64.0:compile
[INFO]    |  \- com.squareup.okio:okio:jar:3.4.0:runtime
[INFO]    |     \- com.squareup.okio:okio-jvm:jar:3.4.0:runtime
[INFO]    |        +- org.jetbrains.kotlin:kotlin-stdlib-jdk8:jar:1.8.0:runtime
[INFO]    |        |  +- org.jetbrains.kotlin:kotlin-stdlib:jar:1.8.0:runtime
[INFO]    |        |  |  \- org.jetbrains:annotations:jar:13.0:runtime
[INFO]    |        |  \- org.jetbrains.kotlin:kotlin-stdlib-jdk7:jar:1.8.0:runtime
[INFO]    |        \- org.jetbrains.kotlin:kotlin-stdlib-common:jar:1.8.0:runtime
[INFO]    +- io.grpc:grpc-opentelemetry:jar:1.64.0:compile
[INFO]    |  +- io.opentelemetry:opentelemetry-api:jar:1.36.0:runtime
[INFO]    |  |  \- io.opentelemetry:opentelemetry-context:jar:1.36.0:runtime
[INFO]    |  \- com.google.auto.value:auto-value-annotations:jar:1.10.4:compile
[INFO]    +- io.grpc:grpc-protobuf:jar:1.64.0:compile
[INFO]    |  +- com.google.api.grpc:proto-google-common-protos:jar:2.29.0:compile
[INFO]    |  \- io.grpc:grpc-protobuf-lite:jar:1.64.0:runtime
[INFO]    +- io.grpc:grpc-rls:jar:1.64.0:compile
[INFO]    +- io.grpc:grpc-services:jar:1.64.0:compile
[INFO]    |  \- com.google.j2objc:j2objc-annotations:jar:2.8:compile
[INFO]    +- io.grpc:grpc-servlet:jar:1.64.0:compile
[INFO]    +- io.grpc:grpc-servlet-jakarta:jar:1.64.0:compile
[INFO]    +- io.grpc:grpc-stub:jar:1.64.0:compile
[INFO]    +- io.grpc:grpc-testing:jar:1.64.0:compile
[INFO]    |  \- junit:junit:jar:4.13.2:compile
[INFO]    |     \- org.hamcrest:hamcrest-core:jar:1.3:compile
[INFO]    +- io.grpc:grpc-util:jar:1.64.0:compile
[INFO]    +- io.grpc:grpc-xds:jar:1.64.0:compile
[INFO]    |  +- io.opencensus:opencensus-proto:jar:0.2.0:compile
[INFO]    |  +- io.grpc:grpc-alts:jar:1.64.0:compile
[INFO]    |  |  +- org.conscrypt:conscrypt-openjdk-uber:jar:2.5.2:compile
[INFO]    |  |  \- com.google.auth:google-auth-library-oauth2-http:jar:1.22.0:compile
[INFO]    |  |     +- com.google.http-client:google-http-client:jar:1.43.3:compile
[INFO]    |  |     |  +- org.apache.httpcomponents:httpclient:jar:4.5.14:compile
[INFO]    |  |     |  |  +- commons-logging:commons-logging:jar:1.2:compile
[INFO]    |  |     |  |  \- commons-codec:commons-codec:jar:1.11:compile
[INFO]    |  |     |  +- org.apache.httpcomponents:httpcore:jar:4.4.16:compile
[INFO]    |  |     |  +- io.opencensus:opencensus-api:jar:0.31.1:compile
[INFO]    |  |     |  \- io.opencensus:opencensus-contrib-http-util:jar:0.31.1:compile
[INFO]    |  |     \- com.google.http-client:google-http-client-gson:jar:1.43.3:compile
[INFO]    |  +- com.google.re2j:re2j:jar:1.7:compile
[INFO]    |  \- io.grpc:grpc-netty-shaded:jar:1.64.0:compile
[INFO]    \- com.google.guava:guava:jar:32.1.3-jre:compile
[INFO]       +- com.google.guava:failureaccess:jar:1.0.1:compile
[INFO]       +- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:compile
[INFO]       \- org.checkerframework:checker-qual:jar:3.37.0:compile

@lhotari
Copy link
Member Author

lhotari commented Jun 13, 2024

I didn't see bcpkix-jdk15on in grpc-all's dependnecy, could you please have a check? I run mvn dependency:tree in my laptop.

@shoothzj I didn't add that exclusion in this PR, I just moved it and made it consistent. It's possible that it is obsolete.

@lhotari lhotari requested a review from hezhangjian June 13, 2024 05:10
@lhotari
Copy link
Member Author

lhotari commented Jun 13, 2024

@shoothzj I removed the obsolete exclusion of bcpkix-jdk15on. PTAL

@hezhangjian hezhangjian merged commit a95f698 into apache:master Jun 13, 2024
25 checks passed
lhotari added a commit that referenced this pull request Jun 13, 2024
…4426)

[Netty 4.1.111.Final](https://netty.io/news/2024/06/11/4-1-111-Final.html) contains important fixes:
- netty/netty#14086
- netty/netty#14093

On the Pulsar side these address Broker stability when TLS with Bookkeeper V2 protocol is used between Broker and Bookies. The Bookkeeper client didn't contain a workaround for the Netty SslHandler "feature" like Pulsar did. That's why Bookkeeper client was impacted. Netty 4.1.111.Final will address those stability issues with the Bookkeeper client when using V2 protocol over TLS.

- Upgrade Netty to 4.1.111.Final
- Switch to use grpc-netty-shaded instead of grpc-netty
  - grpc-java is not compatible with Netty 4.1.111.Final. [grpc-java supports only specific Netty versions](https://github.com/grpc/grpc-java/blob/master/SECURITY.md#netty). The solution is to use grpc-netty-shaded instead of grpc-netty dependency.
- Add module to shade jetcd-core partially so that it can work with grpc-netty-shaded.
  - jetcd-core library and it's transient dependency vertx-grpc has a direct dependency on grpc-netty and it doesn't work with grpc-netty-shaded without this solution.
  - this solution avoids the need to shade grpc libraries completely which would cause a lot of duplication and increase build times.

(cherry picked from commit a95f698)
lhotari added a commit that referenced this pull request Jun 13, 2024
…4426)

[Netty 4.1.111.Final](https://netty.io/news/2024/06/11/4-1-111-Final.html) contains important fixes:
- netty/netty#14086
- netty/netty#14093

On the Pulsar side these address Broker stability when TLS with Bookkeeper V2 protocol is used between Broker and Bookies. The Bookkeeper client didn't contain a workaround for the Netty SslHandler "feature" like Pulsar did. That's why Bookkeeper client was impacted. Netty 4.1.111.Final will address those stability issues with the Bookkeeper client when using V2 protocol over TLS.

- Upgrade Netty to 4.1.111.Final
- Switch to use grpc-netty-shaded instead of grpc-netty
  - grpc-java is not compatible with Netty 4.1.111.Final. [grpc-java supports only specific Netty versions](https://github.com/grpc/grpc-java/blob/master/SECURITY.md#netty). The solution is to use grpc-netty-shaded instead of grpc-netty dependency.
- Add module to shade jetcd-core partially so that it can work with grpc-netty-shaded.
  - jetcd-core library and it's transient dependency vertx-grpc has a direct dependency on grpc-netty and it doesn't work with grpc-netty-shaded without this solution.
  - this solution avoids the need to shade grpc libraries completely which would cause a lot of duplication and increase build times.

(cherry picked from commit a95f698)
Ghatage pushed a commit to sijie/bookkeeper that referenced this pull request Jul 12, 2024
…pache#4426)

### Motivation

[Netty 4.1.111.Final](https://netty.io/news/2024/06/11/4-1-111-Final.html) contains important fixes:
- netty/netty#14086
- netty/netty#14093

On the Pulsar side these address Broker stability when TLS with Bookkeeper V2 protocol is used between Broker and Bookies. The Bookkeeper client didn't contain a workaround for the Netty SslHandler "feature" like Pulsar did. That's why Bookkeeper client was impacted. Netty 4.1.111.Final will address those stability issues with the Bookkeeper client when using V2 protocol over TLS.

### Changes

- Upgrade Netty to 4.1.111.Final
- Switch to use grpc-netty-shaded instead of grpc-netty
  - grpc-java is not compatible with Netty 4.1.111.Final. [grpc-java supports only specific Netty versions](https://github.com/grpc/grpc-java/blob/master/SECURITY.md#netty). The solution is to use grpc-netty-shaded instead of grpc-netty dependency.
- Add module to shade jetcd-core partially so that it can work with grpc-netty-shaded. 
  - jetcd-core library and it's transient dependency vertx-grpc has a direct dependency on grpc-netty and it doesn't work with grpc-netty-shaded without this solution.
  - this solution avoids the need to shade grpc libraries completely which would cause a lot of duplication and increase build times.
hezhangjian pushed a commit that referenced this pull request Nov 18, 2024
### Motivation

There is a potential jar shading issue introduced in #4426 that causes `NoClassDefFoundError` when connecting to an etcd metadata store.

The `jetcd-core-shaded` module was introduced in #4426 to address the compatibility issues between jetcd-core’s grpc-java dependency and Netty. You can find more details [here][1] and in the [grpc-java documentation][2].

[1]: #4426 (comment)
[2]: https://github.com/grpc/grpc-java/blob/master/SECURITY.md#netty

Currently, we use `unpack-shaded-jar` execution unpacks the shaded jar produced by `maven-shade-plugin:shade` into the `jetcd-core-shaded/target/classes` directory. However, the classes in this directory conflict with its dependencies. If the `maven-shade-plugin:shade` runs again without cleaning this directory, it can produce an incorrect shaded jar. You can replicate and verify this issue with the following commands:
```shell
# Step 1: Clean the build directory
mvn clean

# Step 2: Perform an install and unpack the shaded jar into a directory.
# Verify the import statement for `io.netty.handler.logging.ByteBufFormat` in 
# `org/apache/pulsar/jetcd/shaded/io/vertx/core/net/NetClientOptions.class`. 
# The correct import should be: 
# `import io.grpc.netty.shaded.io.netty.handler.logging.ByteBufFormat;`.
mvn install
unzip $M2_REPO/org/apache/bookkeeper/metadata/drivers/jetcd-core-shaded/4.18.0-SNAPSHOT/jetcd-core-shaded-4.18.0-SNAPSHOT-shaded.jar \
  -d metadata-drivers/jetcd-core-shaded/target/first-classes

# Step 3: Run the install command again without cleaning.
# The unpacked jar from the previous step will persist in `target/classes`. 
# Unpack the shaded jar into a different directory (e.g., second-classes) and check the import.
# The incorrect import will be: 
# `import io.grpc.netty.shaded.io.grpc.netty.shaded.io.netty.handler.logging.ByteBufFormat;`.
mvn install
unzip $M2_REPO/org/apache/bookkeeper/metadata/drivers/jetcd-core-shaded/4.18.0-SNAPSHOT/jetcd-core-shaded-4.18.0-SNAPSHOT-shaded.jar \
  -d metadata-drivers/jetcd-core-shaded/target/second-classes

# Step 4: Use IntelliJ IDEA's "Compare Directories" tool to compare the `first-classes` 
# and `second-classes` directories. The differences in imports should become apparent.
```

We can remove the attach and unpack configurations, and it should work fine.

This issue has already been [reported][3] in apache/pulsar, and a similar [patch][patch] has addressed it.

[3]: apache/pulsar#23513
[patch]: apache/pulsar#23604
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants