Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New code to perform managed <-> java lookups (typemap) #3992

Merged
merged 1 commit into from
Feb 10, 2020

Conversation

grendello
Copy link
Contributor

@grendello grendello commented Dec 4, 2019

Xamarin.Assembly needs to "translate" managed types to Java types and
vice versa in order to provide a bridge between the two world. So far it
has been done using a straightforward (and fast) method of performing
the lookups - all the type pairs were stored in two tables of the same
size, with all type names padded to the width of the longest name so
that the bsearch C function can be used to quickly perform a binary
search over the data set. This approach works very well at the expense
of data size (shorter strings are 0-padded to the maximum width) and a
slightly degraded performace because of the requirement to perform
string comparisons. Furthermore, the lookup required that reflection is
used to obtain full managed type name (when translating from managed to
Java) or to get a Type instance from type name (when translating from
Java to managed).

For Release builds all the above data is placed in the
libxamarin-app.so library, for Debug builds it is also placed in two
files - one for each direction of lookup, described above.

This commit is a slight improvement over the above scheme. It eliminates
reflection from the process by using managed type tokens (which are
integers) and using UUID/Guid of the module in which the type is found.
This allows us to perform the binary search over the set of 20 bytes (16
bytes for the UUID and 4 bytes for the token ID) for managed to Java
lookups and a single string comparison + binary search over a set of
integers for the Java to managed lookup.

Java type names must still be used because Java doesn't provide any
equivalent to the .NET's type token and module UUID. Those names are
still 0-padded to the width of the longest name but there are no longer
duplicated. Managed type names are eliminated completely.

If Xamarin.Android Instant Run is not used (which is the case for OSS
code) for Debug builds, the operation is performed in the same way for
both Release and Debug builds. If, however, Instant Run is in effect,
the type maps are stored in several files with the .typemap extension -
one per module. The files contain both the Java to managed maps as
well as managed to Java maps (which use indexes into the Java to managed
maps). All of those files are loaded during Debug app startup and used
to construct a dataset which is the searched during all the lookups.

Typemap index file format, all data is little-endian:

Header format

[Magic string] # XATI
[Format version] # 32-bit unsigned integer, 4 bytes
[Entry count] # 32-bit unsigned integer, 4 bytes
[Module file name width] # 32-bit unsigned integer, 4 bytes
[Index entries] # Format described below, Entry count entries

Index entry format:

[Module UUID][File name]<NUL>

Where:

[Module UUID] is 16 bytes long
[File name] is right-padded with <NUL> characters to the [Module file name width] boundary.

Typemap file format, all data is little-endian:

Header format

[Magic string] # XATM
[Format version] # 32-bit integer, 4 bytes
[Module UUID] # 16 bytes
[Entry count] # unsigned 32-bit integer, 4 bytes
[Duplicate count] # unsigned 32-bit integer, 4 bytes (might be 0)
[Java type name width] # unsigned 32-bit integer, 4 bytes
[Assembly name size] # unsigned 32-bit integer, 4 bytes
[Assembly name] # Non-null terminated assembly name
[Java-to-managed map] # Format described below, [Entry count] entries
[Managed-to-java map] # Format described below, [Entry count] entries
[Managed-to-java duplicates map] # Map of unique managed IDs which point to the same Java type name (might be empty)

Java-to-managed map format:

[Java type name]<NUL>[Managed type token ID]

Each name is padded with <NUL> to the width specified in the [Java type name width] field above.
Names are written without the size prefix, instead they are always terminated with a nul character
to make it easier and faster to handle by the native runtime.

Each token ID is an unsigned 32-bit integer, 4 bytes

Managed-to-java map format:

[Managed type token ID][Java type name table index]

Both fields are unsigned 32-bit integers, to a total of 8 bytes per entry. Index points into the
[Java-to-managed map] table above.

Managed-to-java duplicates map format:

Format is identical to [Managed-to-java] above.

Size changes (XF integration test, libxamarin-app.so, Release build):

  • armeabi-v7a
    • before: 376616
      • after: 97860
  • arm64-v8a
    • before: 377408
      • after: 104192
  • x86
    • before: 376424
    • after: 97604

Performance changes (XF integration test, Release build):

    Device name: **Pixel 3 XL**

Device architecture: arm64-v8a
Number of test runs: 10

Native to managed Runtime init Displayed Notes
master 141.102 160.606 839.80 preload enabled; 32-bit build
this commit 134.539 154.701 836.10
master 141.743 158.325 837.20 preload disabled; 32-bit build
this commit 134.064 149.137 831.90
master 134.526 152.640 805.10 preload enabled; 64-bit build
this commit 126.376 143.226 788.60
master 134.049 149.543 779.40 preload disabled; 64-bit build
this commit 124.847 139.227 776.10

Build performance (Release build):

Before

389 ms  GenerateJavaStubs                          1 calls

After

247 ms  GenerateJavaStubs                          1 calls

New code generates only native assembly or only the binary typemap
files, unlike the old code which generated both. Initially the new
generator code was moved to a separate task, but Jonathan Peppers
determined that it was suboptimal and re-integrated the code back with
GenerateJavaStubs

@grendello grendello added the do-not-merge PR should not be merged. label Dec 4, 2019
@grendello grendello force-pushed the new-typemap branch 6 times, most recently from 8bf4da4 to cc5e0e5 Compare December 9, 2019 10:08
@grendello grendello force-pushed the new-typemap branch 4 times, most recently from 6f93cf4 to 690efa6 Compare January 13, 2020 19:46
@grendello grendello force-pushed the new-typemap branch 3 times, most recently from 5cdfa14 to 6cc5192 Compare January 14, 2020 21:09
@grendello grendello changed the title [WIP] New typemap New code to perform managed <-> java lookups (typemap) Jan 14, 2020
@grendello grendello added full-mono-integration-build For PRs; run a full build (~6-10h for mono bumps), not the faster PR subset (~2h for mono bumps) and removed do-not-merge PR should not be merged. labels Jan 14, 2020
@grendello grendello self-assigned this Jan 14, 2020
Copy link
Contributor

@dellis1972 dellis1972 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok. One formatting issue, and we need the performance data. Other than that 👍

{
const char *e = reinterpret_cast<const char*> (bsearch (name, map, header.entry_count, header.entry_length, TypeMappingInfo_compare_key ));
if (e == nullptr)
// This comes from the app, so let's be civil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it comes from the app, shouldn't we just SIGSEGV?

...and why am I having a strong sense of deja vu writing that comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...perhaps the issue is I'm misinterpreting "the app", and this could be "not us user code"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "user code" ("something outside XA runtime") may, somehow, pass a null string to e.g. typemap_java_to_managed and I see no reason to crash in this case. If "they" want to crash, that's OK, but since it's an outside factor from our POV, we should recover gracefully.

}

constexpr size_t size = sizeof(Entry);
while (nmemb > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nmemb is a size_t, meaning it's unsigned, so the only way this loop will terminate is when it reaches 0, and only if it reaches zero.

I cannot easily determine if e.g. nmemb -= nmemb / 2 + 1 could ever result in underflow -- never mind the rest of this loop --resulting in an infinite loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will underflow only if nmemb == 0, but that's the loop termination condition so it won't happen.

} else {
base = ret + 1;
}
nmemb -= nmemb / 2 + 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some gcc/clang builtin to crash the process if this results in underflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cannot underflow. The lowest value nmemb can reach is 1, and in this case nmemb / 2 + 1 will yield a value of 1 which when subtracted from nmemb will give us 0, which will terminate the loop

@grendello
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Xamarin.Assembly needs to "translate" managed types to Java types and
vice versa in order to provide a bridge between the two world. So far it
has been done using a straightforward (and fast) method of performing
the lookups - all the type pairs were stored in two tables of the same
size, with all type names padded to the width of the longest name so
that the `bsearch` C function can be used to quickly perform a binary
search over the data set. This approach works very well at the expense
of data size (shorter strings are 0-padded to the maximum width) and a
slightly degraded performace because of the requirement to perform
string comparisons. Furthermore, the lookup required that reflection is
used to obtain full managed type name (when translating from managed to
Java) or to get a `Type` instance from type name (when translating from
Java to managed).

For Release builds all the above data is placed in the
`libxamarin-app.so` library, for Debug builds it is also placed in two
files - one for each direction of lookup, described above.

This commit is a slight improvement over the above scheme. It eliminates
reflection from the process by using managed type tokens (which are
integers) and using UUID/Guid of the module in which the type is found.
This allows us to perform the binary search over the set of 20 bytes (16
bytes for the UUID and 4 bytes for the token ID) for managed to Java
lookups and a single string comparison + binary search over a set of
integers for the Java to managed lookup.

Java type names must still be used because Java doesn't provide any
equivalent to the .NET's type token and module UUID. Those names are
still 0-padded to the width of the longest name but there are no longer
duplicated. Managed type names are eliminated completely.

If Xamarin.Android Instant Run is not used (which is the case for OSS
code) for Debug builds, the operation is performed in the same way for
both Release and Debug builds. If, however, Instant Run is in effect,
the type maps are stored in several files with the .typemap extension -
one per **module**. The files contain both the Java to managed maps as
well as managed to Java maps (which use indexes into the Java to managed
maps). All of those files are loaded during Debug app startup and used
to construct a dataset which is the searched during all the lookups.

Typemap index file format, all data is little-endian:
----

 **Header format**

 `[Magic string]`             # XATI
 `[Format version]`           # 32-bit unsigned integer, 4 bytes
 `[Entry count]`              # 32-bit unsigned integer, 4 bytes
 `[Module file name width]`   # 32-bit unsigned integer, 4 bytes
 `[Index entries]`            # Format described below, `Entry count` entries

 **Index entry format:**

 `[Module UUID][File name]<NUL>`

  *Where:*

 `[Module UUID]` is 16 bytes long
 `[File name]` is right-padded with `<NUL>` characters to the `[Module file name width]` boundary.

Typemap file format, all data is little-endian:
----

 **Header format**

 `[Magic string]`                    # XATM
 `[Format version]`                  # 32-bit integer, 4 bytes
 `[Module UUID]`                     # 16 bytes
 `[Entry count]`                     # unsigned 32-bit integer, 4 bytes
 `[Duplicate count]`                 # unsigned 32-bit integer, 4 bytes (might be 0)
 `[Java type name width]`            # unsigned 32-bit integer, 4 bytes
 `[Assembly name size]`              # unsigned 32-bit integer, 4 bytes
 `[Assembly name]`                   # Non-null terminated assembly name
 `[Java-to-managed map]`             # Format described below, `[Entry count]` entries
 `[Managed-to-java map]`             # Format described below, `[Entry count]` entries
 `[Managed-to-java duplicates map]`  # Map of unique managed IDs which point to the same Java type name (might be empty)

 **Java-to-managed map format:**

 `[Java type name]<NUL>[Managed type token ID]`

 Each name is padded with `<NUL>` to the width specified in the `[Java type name width]` field above.
 Names are written without the size prefix, instead they are always terminated with a nul character
 to make it easier and faster to handle by the native runtime.

 Each token ID is an unsigned 32-bit integer, 4 bytes

 **Managed-to-java map format:**

 `[Managed type token ID][Java type name table index]`

 Both fields are unsigned 32-bit integers, to a total of 8 bytes per entry. Index points into the
 `[Java-to-managed map]` table above.

 **Managed-to-java duplicates map format:**

 Format is identical to `[Managed-to-java]` above.

Size changes (XF integration test, `libxamarin-app.so`, Release build):
----

  - armeabi-v7a
	  - before: 376616
		-  after: 97860
  - arm64-v8a
	  - before: 377408
		-  after: 104192
  - x86
	  - before: 376424
    -  after: 97604

Performance changes (XF integration test, Release build):
----

        Device name: **Pixel 3 XL**
Device architecture: **arm64-v8a**
Number of test runs: **10**

|                 | **Native to managed**  | **Runtime init** | **Displayed** | **Notes**                      |
|-----------------|------------------------|------------------|---------------|--------------------------------|
| **master**      | 141.102                | 160.606          | 839.80        |  preload enabled; 32-bit build |
| **this commit** | 134.539                | 154.701          | 836.10        |  |
| **master**      | 141.743                | 158.325          | 837.20        | preload disabled; 32-bit build |
| **this commit** | 134.064                | 149.137          | 831.90        |  |
| **master**      | 134.526                | 152.640          | 805.10        |  preload enabled; 64-bit build |
| **this commit** | 126.376                | 143.226          | 788.60        |  |
| **master**      | 134.049                | 149.543          | 779.40        | preload disabled; 64-bit build |
| **this commit** | 124.847                | 139.227          | 776.10        |  |

Build performance (**Release** build):
----

**Before**

    389 ms  GenerateJavaStubs                          1 calls

**After**

    247 ms  GenerateJavaStubs                          1 calls

New code generates only native assembly or only the binary typemap
files, unlike the old code which generated both. Initially the new
generator code was moved to a separate task, but Jonathan Peppers
determined that it was suboptimal and re-integrated the code back with
`GenerateJavaStubs`
@jonpryor jonpryor merged commit ce2bc68 into dotnet:master Feb 10, 2020
@grendello grendello deleted the new-typemap branch February 10, 2020 20:41
jonpryor pushed a commit that referenced this pull request Feb 11, 2020
Typemap data is used to correlate JNI type names to .NET Assembly-
Qualified Type Names, and vice versa:

	java/lang/Object <=> Java.Lang.Object, Mono.Android

Typemap data is used from `JNIEnv.GetJniName()` for managed-to-JNI
lookups, and from `TypeManager.GetJavaToManagedType()` for
JNI-to-managed lookups.

When [typemap files were first introduced][0], they relied on:

 1. A string-oriented mapping from Java type names to .NET Assembly
    Qualified names and vice versa; and

 2. A binary search via **bsearch**(3) over this table to find the
    associated type, using the source type as the "key".

(The introduction of `libxamarin-app.so` (decfbcc) merely moved the
(formerly separate) typemap data into `libxamarin-app.so` for Release
config builds -- Debug builds continued using separate typemap files --
but didn't otherwise change how these mappings work.)

This approach works very well at the expense of data size -- shorter
strings are 0-padded to a common width -- and slightly degraded
performance because of the requirement to perform string comparisons.
Furthermore, the managed-to-JNI lookup required that Reflection is
used to obtain the Assembly Qualified type name
(`Type.AssemblyQualifiedName`), while the JNI-to-managed lookup
likewise requires some Reflection to obtain `Type` instances (via
`Type.GetType()`).

Rework the typemap data in an effort to reduce Reflection use:
For the managed-to-JNI mapping, use the combination of
`type.Module.ModuleVersionId` and `Type.MetadataToken` -- a GUID
and an int -- instead of using `Type.AssemblyQualifiedName`.  This
allows us to perform the binary search over a set of 20 bytes (16
bytes for the UUID and 4 bytes for the token ID).

JNI-to-managed lookups still need to rely on a binary search across
strings, but instead of mapping the JNI name to an Assembly-Qualified
Type Name and using `Type.GetType()`, we instead map the JNI name to
the same GUID+token pair via a new internal call which uses
`mono_class_get()` & `mono_type_get_object()` to return the `Type`.

As a result of this fundamental change, `libxamarin-app.so` decreases
in size, and app startup time is reduced.  For a Release configuration
build of `tests/Xamarin.Forms-Performance-Integration`,
`libs/arm64-v8a/libxamarin-app.so` shrinks from 377KB to 104KB (!),
and on a Pixel 3 XL app the `ActivityTaskManager: Displayed` time was
reduced from 805ms to 789ms (`$(AndroidEnablePreloadAssemblies)`=True),
a nearly 10% improvement.

Build time is also minimally impacted; `<GenerateJavaStubs/>` task
time is reduced from 389ms to 247ms for the Xamarin.Forms build.


~~ Fast Deployment ~~

When Xamarin.Android Fast Deployment is *not* used for Debug builds
(which is the case for OSS builds of xamarin-android), the typemap
generation and deployment is identical for both Release and Debug
builds: `libxamarin-app.so` contains the new typemap information.

In commercial Xamarin.Android builds which use Fast Deployment, the
typemap data is instead stored in two sets of files:

  * `typemap.index`: stores the mapping from module GUIDs to
    assembly filenames.

  * `*.typemap`: One file per .NET *module*, contain both the JNI-to-
    managed and managed-to-JNI maps, the latter using indexes into
    the Java to managed maps.

All of these files are loaded during Debug app startup and used to
construct a dataset which is then searched during all the lookups.


~~ File Formats ~~

All data in all file formats is little-endian.

The `typemap.index` file stores the mapping from GUIDs to module
filenames such as `Mono.Android.dll`.  The file format in pseudo-C++:

	struct TypemapIndexHeader {
	    byte                magic [4];              // "XATI"
	    uint32_t            format_version;
	    uint32_t            entry_count;
	    uint32_t            module_filename_width;
	    TypemapIndexEntry   entries [entry_count];
	};

	struct TypemapIndexEntry {
	    UUID        module_uuid;  // 16 bytes
	    byte        file_name [TypemapIndexHeader::module_filename_width];
	};

`TypemapIndexHeader::module_filename_width` is the maximum filename
length of any entry within `TypemapIndexEntry::file_name` + 1 for a
terminating `NUL`.

There is no order required within `TypemapIndexHeader::entries`.

`TypemapIndexEntry::file_name` is `NUL` padded, filling the entire
array until the next `TypemapIndexEntry` entry.


The `*.typemap` file stores the mappings from JNI type names to
module GUID and type token pairs.  The file format in pseudo-C++:

	struct TypemapFileHeader {
	    byte                            magic [4];              // "XATM"
	    uint32_t                        format_version;
	    GUID                            module_uuid;
	    uint32_t                        entry_count;
	    uint32_t                        duplicate_count;
	    uint32_t                        jni_name_width;
	    uint32_t                        assembly_name_size;
	    byte                            assembly_name [assembly_name_size];
	    TypemapFileJavaToManagedEntry   java_to_managed [entry_count];
	    TypemapFileManagedToJavaEntry   managed_to_java [entry_count];
	    TypemapFileManagedToJavaEntry   duplicates [duplicate_count];
	};

	struct TypemapFileJavaToManagedEntry {
	    byte        jni_name [TypemapFileHeader::jni_name_width];
	    uint32_t    managed_type_token;
	};

	struct TypemapFileManagedToJavaEntry {
	    uint32_t    managed_type_token;
	    uint32_t    java_to_managed_index;
	};

`TypemapFileHeader::duplicate_count` may be 0.

`TypemapFileJavaToManagedEntry::jni_name` is `NUL` padded.

`TypemapFileJavaToManagedEntry::managed_type_token` is the value of
`Type.MetadataToken`.

`TypemapFileManagedToJavaEntry::java_to_managed_index` is the index
within `TypemapFileHeader::java_to_managed` that contains the JNI name.

[0]: xamarin/monodroid@e69b76e
[1]: https://github.com/xamarin/java.interop/blob/3226a4b57ad84574a69a151a310b077cfe69ee19/src/Java.Interop.Tools.JavaCallableWrappers/Java.Interop.Tools.JavaCallableWrappers/TypeNameMapGenerator.cs#L16-L56
@github-actions github-actions bot locked and limited conversation to collaborators Jan 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
full-mono-integration-build For PRs; run a full build (~6-10h for mono bumps), not the faster PR subset (~2h for mono bumps) use-rebase-and-merge Normally we squash-and-merge PRs. Use this label so we instead rebase & merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants