Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java API mismatch when building Gerrit and JGit from source on 0.16.1 #6027

Closed
davido opened this issue Aug 30, 2018 · 23 comments
Closed

Java API mismatch when building Gerrit and JGit from source on 0.16.1 #6027

davido opened this issue Aug 30, 2018 · 23 comments
Assignees
Labels
P1 I'll work on this now. (Assignee required) team-Rules-Java Issues for Java rules

Comments

@davido
Copy link
Contributor

davido commented Aug 30, 2018

To simplify developing process with Gerrit's main dependency JGit, we support building gerrit with building JGit from source instead of consuming the binary artifacts from Central or from Gerrit's Google storage bucket.

When building Gerrit in that way on Bazel 0.16.1 generated byte code for JGit is screwed up and the final artifact cannot be run on Java 8 any more. Note, that this error doesn't show up any more in 0.17rc1, but I would like to make sure, we understand the problem (and the fix). Besides that, 0.16.1 is the latest Bazel stable release so that I think it worth reporting.

Also note, that if we consume JGit from Central and not build it from source, from an external repository on Bazel 0.16.1, then all is fine, and final artifact can be run on Java 8. This is the stack trace we are seeing:

$ bazel build :gerrit
Target //:gerrit up-to-date:
  bazel-bin/gerrit.war
INFO: Elapsed time: 0.694s, Critical Path: 0.22s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action

$ java -jar bazel-bin/gerrit.war init -d ../test_xxx_yyy
Using secure store: com.google.gerrit.server.securestore.DefaultSecureStore
[2018-08-30 07:48:18,819] [main] INFO  com.google.gerrit.server.config.GerritServerConfigProvider : No /home/davido/projects/gerrit2/../test_xxx_yyy/etc/gerrit.config; assuming defaults

*** Gerrit Code Review 2.15.3-4973-g89aa7c7814-dirty
*** 

Create '/home/davido/projects/gerrit2/../test_xxx_yyy' [Y/n]? Y

*** Git Repositories
*** 

Location of Git repositories   [git]: 

*** SQL Database
*** 

Database server type           [h2]: 

*** NoteDb Database
*** 

Use NoteDb for change metadata?
  See documentation:
  https://gerrit-review.googlesource.com/Documentation/note-db.html
Enable                         [Y/n]? Y

*** Index
*** 

Type                           [lucene/?]: 
Exception in thread "main" java.lang.NoSuchMethodError: java.nio.ByteBuffer.mark()Ljava/nio/ByteBuffer;
	at org.eclipse.jgit.util.RawParseUtils.decodeNoFallback(RawParseUtils.java:1121)
	at org.eclipse.jgit.util.RawParseUtils.decode(RawParseUtils.java:1087)
	at org.eclipse.jgit.util.RawParseUtils.decode(RawParseUtils.java:1046)
	at org.eclipse.jgit.util.RawParseUtils.decode(RawParseUtils.java:1025)
	at org.eclipse.jgit.storage.file.FileBasedConfig.load(FileBasedConfig.java:173)
	at org.eclipse.jgit.util.FS_POSIX.getAtomicFileCreationSupportOption(FS_POSIX.java:124)
	at org.eclipse.jgit.util.FS_POSIX.determineAtomicFileCreationSupport(FS_POSIX.java:110)
	at org.eclipse.jgit.util.FS_POSIX.supportsAtomicCreateNewFile(FS_POSIX.java:354)
	at org.eclipse.jgit.util.FS_POSIX.createNewFile(FS_POSIX.java:382)
	at org.eclipse.jgit.internal.storage.file.LockFile.lock(LockFile.java:158)
	at org.eclipse.jgit.storage.file.FileBasedConfig.save(FileBasedConfig.java:234)
	at com.google.gerrit.server.index.GerritIndexStatus.save(GerritIndexStatus.java:48)
	at com.google.gerrit.server.index.IndexUtils.setReady(IndexUtils.java:60)
	at com.google.gerrit.pgm.init.InitIndex.run(InitIndex.java:71)
	at com.google.gerrit.pgm.init.SitePathInitializer.run(SitePathInitializer.java:93)
	at com.google.gerrit.pgm.init.BaseInit.run(BaseInit.java:136)
	at com.google.gerrit.pgm.util.AbstractProgram.main(AbstractProgram.java:61)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.google.gerrit.launcher.GerritLauncher.invokeProgram(GerritLauncher.java:224)
	at com.google.gerrit.launcher.GerritLauncher.mainImpl(GerritLauncher.java:120)
	at com.google.gerrit.launcher.GerritLauncher.main(GerritLauncher.java:64)
	at Main.main(Main.java:28)

To reproduce:

diff --git a/lib/jgit/jgit.bzl b/lib/jgit/jgit.bzl
index 6ada5bda7b..ba189f5960 100644
--- a/lib/jgit/jgit.bzl
+++ b/lib/jgit/jgit.bzl
@@ -10,7 +10,7 @@ _JGIT_REPO = MAVEN_CENTRAL  # Leave here even if set to MAVEN_CENTRAL.
 
 # set this to use a local version.
 # "/home/<user>/projects/jgit"
-LOCAL_JGIT_REPO = ""
+LOCAL_JGIT_REPO = "/home/davido/projects/jgit"
 
 def jgit_repos():
     if LOCAL_JGIT_REPO:

And run:

  • bazel build :gerrit

Now try to set-up a gerrit site by running:

  $ java -jar bazel-bin/gerrit.war init -d ../test_xxx_yyy

and accept all suggested defaults would fail with the stack trace mentioned above.

FWIW: OP was reported on repo-discuss mailing list: [1].

My environment:

  $ java -fullversion
  openjdk full version "1.8.0_171-b11"

  $ javac -version
  javac 1.8.0_171

  $ bazel version
  Build label: 0.16.1

  $ echo $JAVA_HOME
  <unset>

[1] https://groups.google.com/d/topic/repo-discuss/d_qGIW0PnpU/discussion

//CC @cushon

@cushon
Copy link
Contributor

cushon commented Aug 30, 2018

NoSuchMethodError: java.nio.ByteBuffer.mark()Ljava/nio/ByteBuffer;

This is #5888, which is fixed in 0.17rc1. We're adding better regression test coverage in this area.

@cushon cushon closed this as completed Aug 30, 2018
@davido
Copy link
Contributor Author

davido commented Aug 30, 2018

I'm not sure I'm following, the problem you are linked to explicitly passing --javabase option:

 $ bazel run :main --javabase=:jdk8

but we don't do that. And another specific problem here, that the wrong byte code generation only affects sources compiled form external repository (jgit) and not the main repository (gerrit). How that?

@cushon
Copy link
Contributor

cushon commented Aug 30, 2018

No, the problem I linked to is that default toolchain in 0.16.1 compiles against the JDK 9 APIs. ByteBuffer#mark is a covariant override that was added in JDK 9, and does not exist on JDK 8.

And another specific problem here, that the wrong byte code generation only affects sources compiled form external repository (jgit) and not the main repository (gerrit). How that?

Does the main repository include any code that calls ByteBuffer#mark? This affect affects a fairly small set of JDK methods; all the instances I've seen are related to the new overrides in ByteBuffer.

I think the external repository thing is a red herring, I'd expect the same behaviour if you compiled that code directly in your main repo.

@davido
Copy link
Contributor Author

davido commented Aug 30, 2018

I think the external repository thing is a red herring, I'd expect the same behaviour if you compiled that code directly in your main repo.

Ah, thanks for clarifying, that makes sense!

But then, this is a pretty much show stopper for many Bazel users, so that I'm surprised, that we don't see 0.16.2 release that fixed that major breakage ASAP.

@cushon
Copy link
Contributor

cushon commented Aug 30, 2018

Ack. I think this is the second report of this problem, and the first outside of Bazel code.

cc @lberki @buchgr

@davido
Copy link
Contributor Author

davido commented Aug 30, 2018

@lberki @buchgr @ulfjack

Can you explain Gerrit Code Review and JGit communities, why we are not seeing Patch Release for 0.16.1 for such severe breakage in Bazel latest release, leaving us without a fix for weeks/months?

(Of course, we can build Bazel from source or use 0.17.rc1, but I think the appropriate reaction here is to patch latest release, or alternatively, as adequate reaction immediately promote 0.17rc1 as stable release).

@lberki
Copy link
Contributor

lberki commented Aug 30, 2018

Uh, this is bad.

I also thought this was #5888, but it's a different issue: #5888 came about because we did not use any bootclasspath with a JDK 8 --javabase. This is because we use the JDK 9 bootclasspath even if we target JDK 8.

282b883 fixes this (turns out, @meisterT and @buchgr got hit by this on our continuous integration), but it was not cherry-picked into 0.16.1 because that release was pushed on Aug 13 and the bug was discovered on Aug 16.

@davido , can you check if 0.17rc1 fixes this? If it doesn't, we have an even bigger problem.
@davido : how much pain does this cause on your side?
@laurentlb , when will 0.17 be released?

@lberki lberki reopened this Aug 30, 2018
@lberki lberki self-assigned this Aug 30, 2018
@lberki lberki added P1 I'll work on this now. (Assignee required) category: rules > java labels Aug 30, 2018
@gertvdijk
Copy link
Contributor

can you check if 0.17rc1 fixes this? If it doesn't, we have an even bigger problem.

Yes, 0.17-rc1 fixes this. (Refer to the Gerrit mailing list discussion from here, where @davido reports the same - reproducible with 0.16.1, not with 0.17-rc1.)

how much pain does this cause on your side?

I guess this only affects developers working on JGit-Gerrit, basically. And if older Bazel versions are not affected, this is not a big deal, I think?
These kind of bugs in a build tool scare the hell out of me, though.

For me, being a bit of a newbie in Bazel (and Java, and Gerrit development), it was more like "I'm totally puzzled" and wasted a few hours until @davido saved me. 🤷‍♂️ 😄

@lberki
Copy link
Contributor

lberki commented Aug 30, 2018

That's good news!

(I also classified mentally this as "not a big deal", since we haven't heard about this from anyone else other than @davido , thus my surprise about his classification of "severe" -- is this especially painful for JGit/Gerrit for some reason?)

@davido
Copy link
Contributor Author

davido commented Aug 30, 2018

Even though, JGit is built with Maven, we support Bazel driven build toolchain for this project [1]. Today we know, that 0.16.1 and (earlier Bazel stable releases?) produces broken byte code that cannot be run with default Java 8. This is enough reason for me to conduct a patch release.

Not to mention, that @gertvdijk and myself spent hours investigating this breakage reported here, without linking that to broken Bazel distribution. I find it disappointing, that 282b883 was fixed 14 days ago, and today, we still don't have a working latest Bazel release that fixed that severe problem (yes, producing broken byte code is severe problem to me).

Moreover I have rejected the upgrade of our CI to Bazel 0.16.1 release: [2] after learning that this Bazel release is inherently broken.

@davido : how much pain does this cause on your side?

We can handle this and wait a couple of days for stable 0.17 release.

@laurentlb
Copy link
Contributor

Expected date of release for 0.17 is September 6, but it will depend on release blocking bugs and cherrypicks that are needed. If you want to request a cherrypick, please ask it on #5059.

@davido
Copy link
Contributor Author

davido commented Aug 30, 2018

Expected date of release for 0.17 is September 6, [...]

Thanks, that would be fine.

@cushon
Copy link
Contributor

cushon commented Aug 30, 2018

I also thought this was #5888, but it's a different issue: #5888 came about because we did not use any bootclasspath with a JDK 8 --javabase. This is because we use the JDK 9 bootclasspath even if we target JDK 8.

That's still #5888. If no -bootclasspath is set when cross-compiling javac helpfully prints a warning that bootstrap class path not set in conjunction with -source N, and then defaults to using the bootclasspath from the host JDK.

If we didn't use any bootclasspath at all the compilation would fail with unable to find package java.lang in classpath or bootclasspath.

@msohn
Copy link

msohn commented Aug 30, 2018

I hit the same problem when running JGit tests using bazel 0.16.1:

$ git clone https://git.eclipse.org/jgit/jgit
$ cd jgit
$ bazel build all
this succeeds
$ bazel test //...
many tests fail with the above mentioned java.lang.NoSuchMethodError: java.nio.ByteBuffer.mark()Ljava/nio/ByteBuffer

I installed bazel 0.17.0rc1, this fixes the problem

@gertvdijk
Copy link
Contributor

I hit the same problem when running JGit tests using bazel 0.16.1:

Yikes - that's a much simpler case to run into this issue. Does that make a valid point to mark it "severe", instead "not a big deal"? @lberki

@jin jin added the team-Rules-Java Issues for Java rules label Aug 30, 2018
@msohn
Copy link

msohn commented Aug 31, 2018

bazel is a secondary build system for jgit which is mostly used by those also working on Gerrit.
Otherwise we use Maven since bazel has no support to build OSGi bundles, Eclipse features and p2 repositories which we need for the integration into Eclipse.

Hence I think we can live with 0.17.0rc1 for now and update to 0.17.0 when you release it next week.

@lberki
Copy link
Contributor

lberki commented Aug 31, 2018

@cushon : indeed, this is a special case of #5888 .

@gertvdijk: the issue is in fact severe in terms of how broken we are, but it does not meet the bar of "widespread breakage" for a patch release, especially if @laurentlb's estimate is correct about 0.17 being out next week.

@laurentlb: WDYT? I'm somewhat worried about #6035 throwing a wrench in the works, but it we release 0.17 on September 6 and it works for JGit/Gerrit, let's not do a patch release.

@davido @gertvdijk : how feasible is it to not update to 0.16.1 for the time being?

@gertvdijk
Copy link
Contributor

gertvdijk commented Aug 31, 2018

@lberki:

how feasible is it to not update to 0.16.1 for the time being?

I don't have a big voice in this (being just a random newbie-Bazel/Gerrit developer running into it), but I can also live with no new 0.16.x release for the time being.

@davido
Copy link
Contributor Author

davido commented Aug 31, 2018

@davido @gertvdijk : how feasible is it to not update to 0.16.1 for the time being?

@lberki As I said in my previous comment we can handle that and wait for 0.17. What is worring me, is that there was no appropriate communication from Bazel-Java-Team side and that the severity of #5888 and the consequences it cause (apparently only Bazel code is affected?) were underestimated.

Since minute 1. after you have confirmed, that 0.16.1 that was released in the wild, and under some circumstances (it doesn't matter how rare they are), will produce broken byte code for your users, I would expect that one of two things happen:

  • Patch release published that replaces broken stable release, or
  • Urgent warning announcement on the Bazel mailing list to warn Bazel users about broken byte code generation and actually revoking the release 0.16.1 (does Bazel Team have a Process/Workflow for revoking broken releases?) and asking the users to refrain to upgrade to that inherently broken Bazel release.

(We are all developers here and we are all making mistakes and we fix them later and that's all fine. It's about communication about severe breakages in stable releases that should be improved and avoid your users spending hours investigation in vane -- for known and fixed but not yet released problems.)

@lberki
Copy link
Contributor

lberki commented Aug 31, 2018

I think the disconnect here is that your expectations about how breakages should be handled are not in line with our ideas about the same. Which is, of course, because there aren't clear guidelines (or at least I don't know about them...) and which I why I started this thread here:

https://groups.google.com/d/msg/bazel-discuss/wYocSoGTJOg/aJnD4ZqUCAAJ

It's not strictly about handling breakages, but about dealing with incompatible changes we need to make and these issues are related to each other; we currently do incompatible changes without much thought, which, of course, means that we are unable to treat breakages as automatic showstoppers because, well, they are expected to some degree.

I don't think that's a tenable position forever, but it a balance needs to be struck between backwards compatibility and the ability to remove functionality we regret having.

@davido
Copy link
Contributor Author

davido commented Aug 31, 2018

https://groups.google.com/d/msg/bazel-discuss/wYocSoGTJOg/aJnD4ZqUCAAJ

Yes, I'm aware of this thread and I agree that there should be an agreement on the process of how experimental flags and features are discontinued or promoted and renamed to stable flags and features.

Still, I have a strong impression that we are talking past each other here. I'm talking about the outcome of bazel build :a without any flags or options passed, without using any fancy features. Just build plain Java source file with default Java 8. There are two major breakages (there are probably more, but let simplify for the purpose of this discussion):

  1. It's erroneosly failing to build a valid Java source file
  2. It's succeeding but producing broken byte code

Do you agree that the second failure is much more painful, because it is much harder to track down, understand and fix? To me that problem is fundamentally different from discontinuation policy for experimental flags and features and as such requires adequate reaction when discovered in stable Bazel releases (as mentioned in my previous comment).

@jrn
Copy link

jrn commented Aug 31, 2018

For what it's worth, this affected ~everyone in Gerrit Code Review team at Google. We didn't report it because we didn't realize it was a bazel bug (actually we didn't understand that until today).

@jrn
Copy link

jrn commented Sep 1, 2018

This is a duplicate of #5888. Please feel free to close it.

@cushon cushon closed this as completed Sep 14, 2018
dpursehouse added a commit to dpursehouse/jgit that referenced this issue Oct 10, 2018
Check the bazel version using the checker from bazel_skylib, and
require at least version 0.17.1 which is the minimum version that
does not suffer from the Java API mismatch issue [1].

The implementation is borrowed from the Gerrit project.

[1] bazelbuild/bazel#6027

Change-Id: I224250088a1f5072fcaa3ec81228f4d2cb8cb389
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
hanwen pushed a commit to hanwen/jgit that referenced this issue Sep 20, 2019
Check the bazel version using the checker from bazel_skylib, and
require at least version 0.17.1 which is the minimum version that
does not suffer from the Java API mismatch issue [1].

The implementation is borrowed from the Gerrit project.

[1] bazelbuild/bazel#6027

Change-Id: I224250088a1f5072fcaa3ec81228f4d2cb8cb389
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-Rules-Java Issues for Java rules
Projects
None yet
Development

No branches or pull requests

8 participants