Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[class-parse] .jmod file support #891

Merged
merged 1 commit into from
Oct 15, 2021
Merged

Conversation

jonpryor
Copy link
Member

@jonpryor jonpryor commented Oct 13, 2021

Context: #858
Context: https://stackoverflow.com/questions/44732915/why-did-java-9-introduce-the-jmod-file-format/64202720#64202720
Context: https://openjdk.java.net/projects/jigsaw/
Context: https://github.com/xamarin/monodroid/commit/c9e5cbd61fe80e183b44356149abe95578a13d06

JDK 9 replaced the "venerable" (and huge, ~63MB) jre/lib/rt.jar
with a set of .jmod files.

Thus, as of JDK 9, there is no .jar file to try to parse with
class-parse, only .jmod files!

A .jmod file, in turn, is still a ZIP container, much like .jar
files, but:

  1. With a different internal directory structure, and
  2. With a custom file header.

The result of (2) is that while unzip -l can show and extract the
contents of a .jmod file -- with a warning --
System.IO.Compression.ZipArchive cannot process the file:

% mono …/class-parse.exe $HOME/android-toolchain/jdk-11/jmods/java.base.jmod
class-parse: Unable to read file 'java.base.jmod': Number of entries expected in End Of Central Directory does not correspond to number of entries in Central Directory.
<api
  api-source="class-parse" />

Update Xamarin.Android.Tools.Bytecode.ClassPath to support .jmod
files by using PartialStream (73096d9) to skip the first 4 bytes.

Once able to read a .jmod file, lots of debug messages appeared
while parsing java.base.jmod, a'la:

class-parse: method com/xamarin/JavaType$1MyStringList.<init>(Lcom/xamarin/JavaType;Ljava/lang/String;ILjava/lang/StringBuilder;)V:
  Local variables array has 2 entries ('LocalVariableTableAttribute(
      LocalVariableTableEntry(Name='this', Descriptor='Lcom/xamarin/JavaType$1MyStringList;', StartPC=0, Index=0),
      LocalVariableTableEntry(Name='this$0', Descriptor='Lcom/xamarin/JavaType;', StartPC=0, Index=1),
      LocalVariableTableEntry(Name='a', Descriptor='Ljava/lang/String;', StartPC=0, Index=2),
      LocalVariableTableEntry(Name='b', Descriptor='I', StartPC=0, Index=3))'
  ); descriptor has 3 entries!
class-parse: method com/xamarin/JavaType$1MyStringList.<init>(Lcom/xamarin/JavaType;Ljava/lang/String;ILjava/lang/StringBuilder;)V:
Signature ('Signature((Ljava/lang/String;I)V)') has 2 entries; Descriptor '(Lcom/xamarin/JavaType;Ljava/lang/String;ILjava/lang/StringBuilder;)V' has 3 entries!

This was a variation on the "JDK 8?" block that previously didn't
have much detail, in part because it didn't have a repro. Now we
have a repro, based on JDK code which contains a class
declaration within a method declaration

// Java
public List<String> staticActionWithGenerics(…) {
    class MyStringList extends ArrayList<String> {
        public MyStringList(String a, int b) {
        }
        public String get(int index) {
            return unboundedList.toString() + value1.toString();
        }
    }
}

The deal is that staticActionWithGenerics() contains a MyStringList
class, which in turn contains a constructor with two parameters.
However, as far as Java bytecode is concerned, the constructor
contains 3 local variables with StartPC==0, which is what we use to
infer parameter names.

Refactor, cleanup, and otherwise modify huge swaths of Methods.cs
to get to a "happy medium" of:

  • No warnings from our unit tests, ensured by updating
    ClassFileFixture to have a [SetUp] method which sets the
    Log.OnLog field to a delegate which may call Assert.Fail()
    when invoked. This asserts for all messages starting with
    class-parse: methods, which are produced by Methods.cs.

  • No warnings when processing java.base.jmod:

    % mono bin/Debug/class-parse.exe $HOME/android-toolchain/jdk-11/jmods/java.base.jmod >/dev/null
    # no error messages
    
  • No warnings when processing Android API-31:

    % mono bin/Debug/class-parse.exe $HOME/android-toolchain/sdk/platforms/android-31/android.jar >/dev/null
    # no error messages
    

Additionally, improve Log.cs so that there are M(string)
overloads for the existing M(string, params object[]) methods.
This is a "sanity-preserving" change, as "innocuous-looking" code
such as Log.Debug("{foo}") will throw FormatException when the
(string, params object[]) overload is used.

Aside: closures are weird and finicky.
Consider the following Java code:

class ClosureDemo {
    public void m(String a) {
        class Example {
            public Example(int x) {
                System.out.println (a);
            }
        }
    }
}

It looks like the JNI signature for the Example constructor might
be (I)V, but isn't. It is instead:

(LClosureDemo;ILjava/lang/String;)V

Breaking that down:

  • LClosureDemo;: Example is an inner class, and thus has an
    implicit reference to the containing type. OK, easy to forget.

  • I: the int x parameter. Expected.

  • Ljava/lang/String: the String a parameter from the enclosing
    scope! This is the closure parameter.

This does make sense. The problem is that it's selective: only
variables used within Example become extra parameters.
If the Example constructor is updated to remove the
System.out.println(a) statement, then a is no longer used, and
is no longer present as a constructor parameter.

The only way I found to "reasonably" determine if a constructor
parameter was a closure parameter was by checking all fields in the
class with names starting with val$, and then comparing the types
of those fields to types within the enclosing method's descriptor.
I can't think of a way to avoid using val$. :-(

Another aside: closure parameter behavior differs between JDK 1.8
and JDK-11: there appears to be a JDK 1.8 javac bug in which it
assigns the wrong parameter names. Consider MyStringList:
The Java constructor declaration is:

public static <T, …>
void staticActionWithGenerics (
        T value1, …
        List<?> unboundedList, …)
{
    class MyStringList extends ArrayList<String> {
        public MyStringList(String a, int b) {
        }
        // …
    }
}

The JNI signature for the MyStringList constructor is:

(Ljava/lang/String;ILjava/util/List;Ljava/lang/Object;)V

which is:

  • String: parameter a
  • I: parameter b
  • List: closure parameter for unboundedList
  • Object: closure parameter for value1.

If we build with JDK 1.8 javac -parameters, the MethodParameters
annotation states that the closure parameters are:

MyStringList(String a, int b, List val$value1, Object val$unboundedList);

which is wrong; unboundedList is the List, value1 is Object!

This was fixed in JDK-11, with the MethodParameters annotations
specifying:

MyStringList(String a, int b, List val$unboundedList, Object val$value1);

This means that the unit tests need to take this into consideration.

Add a new ConfiguredJdkInfo class, which contains code similar to
tests/TestJVM/TestJVM.cs: it will read bin/Build*/JdkInfo.props
to find the path to the JDK found in make prepare, and then
determine the JDK version from that directory's release file.

@jonpryor jonpryor requested a review from jpobst October 13, 2021 02:48
@jonpryor jonpryor force-pushed the jonp-jmod-support branch 4 times, most recently from a22cc66 to 56ffc3c Compare October 14, 2021 22:29
@@ -23,6 +25,8 @@ public static void Error (string format, params object[] args)
log (TraceLevel.Error, 0, format, args);
}

public static void Error (string message) => Error ("{0}", message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do these do? These should be equivalent:

public static void Error (string format, params object[] args)
public static void Error (string message)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpobst: these overloads preserve sanity. :-)

What do you expect to happen with:

Console.WriteLine("{foo}");

What happens -- which I hope is what you expect? -- is that {foo} is written to stdout.

What happens if you call Log.Error ("{foo}"), prior to this change?

You get a System.FormatException, by way of `string.Format()!

string.Format("{foo}");                                                                                                                                                                               
System.FormatException: Input string was not in a correct format.                                                                                                                                               
  at System.Text.StringBuilder.AppendFormatHelper (System.IFormatProvider provider, System.String format, System.ParamsArray args)
  at System.String.FormatHelper (System.IFormatProvider provider, System.String format, System.ParamsArray args)
  at System.String.Format (System.String format, System.Object[] args)
  at <InteractiveExpressionClass>.Host (System.Object& $retval)
  at Mono.CSharp.Evaluator.Evaluate (System.String input, System.Object& result, System.Boolean& result_set)

See also: https://github.com/xamarin/monodroid/commit/c75aaaad56ef27bb907ad59a61aab4a741468dfe

The TL/DR: is that providing only a (string format, params object[]) method signature is a recipe for disaster, especially as we continue migrating to C#6 $"..." format strings.

//
// This is reasonable:
// 1-1) Failed : Xamarin.Android.Tools.BytecodeTests.JvmOverloadsConstructorTests.XmlDeclaration_WithJavaType_class
// TraceLevel=Verbose, Verbosity=0, Message=Kotlin: Hiding synthetic default constructor in class 'JvmOverloadsConstructor' with signature '(LJvmOverloadsConstructor;IILjava/lang/String;ZILkotlin/jvm/internal/DefaultConstructorMarker;)V'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is the key: by being able to parse the Kotlin metadata we see that this is the synthetic default constructor and we remove it.

I suspect 1-2 fails not because synthetic changed, but because it is finding an additional synthetic constructor that shouldn't be there. (Expected string length 3509 but was 4633.)

Copy link
Member Author

@jonpryor jonpryor Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Kotlin-related message is always produced, e.g.

% mono bin/Debug/class-parse.exe tests/Xamarin.Android.Tools.Bytecode-Tests/kotlin/JvmOverloadsConstructor.class >/dev/null
Kotlin: Hiding synthetic default constructor in class 'JvmOverloadsConstructor' with signature '(LJvmOverloadsConstructor;IILjava/lang/String;ZILkotlin/jvm/internal/DefaultConstructorMarker;)V'

Whether it should be produced or not is a different discussion; you presumably wanted it to be emitted, right?

The primary intent to having Log.OnLog call Assert.Fail() within the unit tests was to ensure that the various Methods.cs changes didn't cause new warnings to be introduced. Presumably there were no warning messages before, and there certainly are no non-Kotlin-related warnings with this PR.

Context: dotnet#858
Context: https://stackoverflow.com/questions/44732915/why-did-java-9-introduce-the-jmod-file-format/64202720#64202720
Context: https://openjdk.java.net/projects/jigsaw/
Context: xamarin/monodroid@c9e5cbd

JDK 9 replaced the "venerable" (and huge, ~63MB) `jre/lib/rt.jar`
with a set of `.jmod` files.

Thus, as of JDK 9, there is no `.jar` file to try to parse with
`class-parse`, only `.jmod` files!

A `.jmod` file, in turn, is still a ZIP container, much like `.jar`
files, but:

 1. With a different internal directory structure, and
 2. With a custom file header.

The result of (2) is that while `unzip -l` can show and extract the
contents of a `.jmod` file -- with a warning --
`System.IO.Compression.ZipArchive` cannot process the file:

	% mono …/class-parse.exe $HOME/android-toolchain/jdk-11/jmods/java.base.jmod
	class-parse: Unable to read file 'java.base.jmod': Number of entries expected in End Of Central Directory does not correspond to number of entries in Central Directory.
	<api
	  api-source="class-parse" />

Update `Xamarin.Android.Tools.Bytecode.ClassPath` to support `.jmod`
files by using `PartialStream` (73096d9) to skip the first 4 bytes.

Once able to read a `.jmod` file, lots of debug messages appeared
while parsing `java.base.jmod`, a'la:

	class-parse: method com/xamarin/JavaType$1MyStringList.<init>(Lcom/xamarin/JavaType;Ljava/lang/String;ILjava/lang/StringBuilder;)V:
	  Local variables array has 2 entries ('LocalVariableTableAttribute(
	      LocalVariableTableEntry(Name='this', Descriptor='Lcom/xamarin/JavaType$1MyStringList;', StartPC=0, Index=0),
	      LocalVariableTableEntry(Name='this$0', Descriptor='Lcom/xamarin/JavaType;', StartPC=0, Index=1),
	      LocalVariableTableEntry(Name='a', Descriptor='Ljava/lang/String;', StartPC=0, Index=2),
	      LocalVariableTableEntry(Name='b', Descriptor='I', StartPC=0, Index=3))'
	  ); descriptor has 3 entries!
	class-parse: method com/xamarin/JavaType$1MyStringList.<init>(Lcom/xamarin/JavaType;Ljava/lang/String;ILjava/lang/StringBuilder;)V:
	Signature ('Signature((Ljava/lang/String;I)V)') has 2 entries; Descriptor '(Lcom/xamarin/JavaType;Ljava/lang/String;ILjava/lang/StringBuilder;)V' has 3 entries!

This was a variation on the "JDK 8?" block that previously didn't
have much detail, in part because it didn't have a repro.  Now we
have a repro, based on [JDK code][0] which contains a class
declaration within a method declaration

	// Java
	public List<String> staticActionWithGenerics(…) {
	    class MyStringList extends ArrayList<String> {
	        public MyStringList(String a, int b) {
	        }
	        public String get(int index) {
	            return unboundedList.toString() + value1.toString();
	        }
	    }
	}

The deal is that `staticActionWithGenerics()` contains a `MyStringList`
class, which in turn contains a constructor with two parameters.
*However*, as far as Java bytecode is concerned, the constructor
contains *3* local variables with StartPC==0, which is what we use to
infer parameter names.

Refactor, cleanup, and otherwise modify huge swaths of `Methods.cs`
to get to a "happy medium" of:

  * No warnings from our unit tests, ensured by updating
    `ClassFileFixture` to have a `[SetUp]` method which sets the
    `Log.OnLog` field to a delegate which may call `Assert.Fail()`
    when invoked.  This asserts for all messages starting with
    `class-parse: methods`, which are produced by `Methods.cs`.

  * No warnings when processing `java.base.jmod`:

        % mono bin/Debug/class-parse.exe $HOME/android-toolchain/jdk-11/jmods/java.base.jmod >/dev/null
        # no error messages

  * No warnings when processing Android API-31:

        % mono bin/Debug/class-parse.exe $HOME/android-toolchain/sdk/platforms/android-31/android.jar >/dev/null
        # no error messages

Additionally, improve `Log.cs` so that there are `M(string)`
overloads for the existing `M(string, params object[])` methods.
This is a "sanity-preserving" change, as "innocuous-looking" code
such as `Log.Debug("{foo}")` will throw `FormatException` when the
`(string, params object[])` overload is used.

Aside: closures are *weird* and finicky.
Consider the following Java code:

	class ClosureDemo {
	    public void m(String a) {
	        class Example {
	            public Example(int x) {
	                System.out.println (a);
	            }
	        }
	    }
	}

It looks like the JNI signature for the `Example` constructor might
be `(I)V`, but isn't.  It is instead:

	(LClosureDemo;ILjava/lang/String;)V

Breaking that down:

  * `LClosureDemo;`: `Example` is an inner class, and thus has an
    implicit reference to the containing type.  OK, easy to forget.

  * `I`: the `int x` parameter.  Expected.

  * `Ljava/lang/String`: the `String a` parameter from the enclosing
    scope!  This is the closure parameter.

This does make sense.  The problem is that it's *selective*: only
variables used within `Example` become extra parameters.
If the `Example` constructor is updated to remove the
`System.out.println(a)` statement, then `a` is no longer used, and
is no longer present as a constructor parameter.

The only way I found to "reasonably" determine if a constructor
parameter was a closure parameter was by checking all fields in the
class with names starting with `val$`, and then comparing the types
of those fields to types within the enclosing method's descriptor.
I can't think of a way to avoid using `val$`. :-(

Another aside: closure parameter behavior *differs* between JDK 1.8
and JDK-11: there appears to be a JDK 1.8 `javac` bug in which it
assigns the *wrong* parameter names.  Consider `MyStringList`:
The Java constructor declaration is:

	public static <T, …>
	void staticActionWithGenerics (
	        T value1, …
	        List<?> unboundedList, …)
	{
	    class MyStringList extends ArrayList<String> {
	        public MyStringList(String a, int b) {
	        }
	        // …
	    }
	}

The JNI signature for the `MyStringList` constructor is:

	(Ljava/lang/String;ILjava/util/List;Ljava/lang/Object;)V

which is:

  * `String`: parameter `a`
  * `I`: parameter `b`
  * `List`: closure parameter for `unboundedList`
  * `Object`: closure parameter for `value1`.

If we build with JDK 1.8 `javac -parameters`, the `MethodParameters`
annotation states that the closure parameters are:

	MyStringList(String a, int b, List val$value1, Object val$unboundedList);

which is *wrong*; `unboundedList` is the `List`, `value1` is `Object`!

This was fixed in JDK-11, with the `MethodParameters` annotations
specifying:

	MyStringList(String a, int b, List val$unboundedList, Object val$value1);

This means that the unit tests need to take this into consideration.

Add a new `ConfiguredJdkInfo` class, which contains code similar to
`tests/TestJVM/TestJVM.cs`: it will read `bin/Build*/JdkInfo.props`
to find the path to the JDK found in `make prepare`, and then
determine the JDK version from that directory's `release` file.

[0]: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/WhileOps.java#L334
@jonpryor jonpryor merged commit 8ccb837 into dotnet:main Oct 15, 2021
jonathanpeppers added a commit that referenced this pull request Oct 26, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Apr 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants