Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.2.10 fails on Ubuntu 16 and 18 #217

Closed
AArnott opened this issue Aug 25, 2018 · 35 comments
Closed

2.2.10 fails on Ubuntu 16 and 18 #217

AArnott opened this issue Aug 25, 2018 · 35 comments

Comments

@AArnott
Copy link
Collaborator

AArnott commented Aug 25, 2018

Gah! if it's not one failure it's another. Now 2.2.10, which fixed the dotnet build failure (but apparently only on Windows) fails on Ubuntu 16 and 18 with this:

Repro steps

On a Linux machine (or Linux on Windows) such as Ubuntu 16, 18, or Linux Mint 19:

git clone https://github.com/aarnott/nerdbank.streams
cd nerdbank.streams/src/Nerdbank.Streams
git checkout fixLinuxBuild2
LD_DEBUG=versions dotnet build

The output is long, but includes this error:

      2512:	/usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.3/System.Globalization.Native.so: error: symbol lookup error: undefined symbol: GlobalizationNative_ToAsciiW (fatal)
      2512:	/usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.3/System.Globalization.Native.so: error: symbol lookup error: undefined symbol: GlobalizationNative_GetSortKeyW (fatal)

/home/andrew/.nuget/packages/nerdbank.gitversioning/2.2.10/build/Nerdbank.GitVersioning.targets(63,5): error MSB4018: The "Nerdbank.GitVersioning.Tasks.GetBuildVersion" task failed unexpectedly. [/home/andrew/git/nerdbank.streams/src/Nerdbank.Streams/Nerdbank.Streams.csproj]
/home/andrew/.nuget/packages/nerdbank.gitversioning/2.2.10/build/Nerdbank.GitVersioning.targets(63,5): error MSB4018: System.TypeInitializationException: The type initializer for 'LibGit2Sharp.Core.NativeMethods' threw an exception. ---> System.DllNotFoundException: Unable to load shared library 'git2-6311e88' or one of its dependencies. In order to help diagnose loading problems, consider setting the LD_DEBUG environment variable: libgit2-6311e88: cannot open shared object file: No such file or directory [/home/andrew/git/nerdbank.streams/src/Nerdbank.Streams/Nerdbank.Streams.csproj]
/home/andrew/.nuget/packages/nerdbank.gitversioning/2.2.10/build/Nerdbank.GitVersioning.targets(63,5): error MSB4018:    at LibGit2Sharp.Core.NativeMethods.git_libgit2_init() [/home/andrew/git/nerdbank.streams/src/Nerdbank.Streams/Nerdbank.Streams.csproj]
/home/andrew/.nuget/packages/nerdbank.gitversioning/2.2.10/build/Nerdbank.GitVersioning.targets(63,5): error MSB4018:    at LibGit2Sharp.Core.NativeMethods.LoadNativeLibrary() [/home/andrew/git/nerdbank.streams/src/Nerdbank.Streams/Nerdbank.Streams.csproj]
/home/andrew/.nuget/packages/nerdbank.gitversioning/2.2.10/build/Nerdbank.GitVersioning.targets(63,5): error MSB4018:    at LibGit2Sharp.Core.NativeMethods..cctor() [/home/andrew/git/nerdbank.streams/src/Nerdbank.Streams/Nerdbank.Streams.csproj]
/home/andrew/.nuget/packages/nerdbank.gitversioning/2.2.10/build/Nerdbank.GitVersioning.targets(63,5): error MSB4018:    --- End of inner exception stack trace --- [/home/andrew/git/nerdbank.streams/src/Nerdbank.Streams/Nerdbank.Streams.csproj]
/home/andrew/.nuget/packages/nerdbank.gitversioning/2.2.10/build/Nerdbank.GitVersioning.targets(63,5): error MSB4018:    at LibGit2Sharp.Core.NativeMethods.git_libgit2_opts(Int32 option, UInt32 level, String path) [/home/andrew/git/nerdbank.streams/src/Nerdbank.Streams/Nerdbank.Streams.csproj]
/home/andrew/.nuget/packages/nerdbank.gitversioning/2.2.10/build/Nerdbank.GitVersioning.targets(63,5): error MSB4018:    at LibGit2Sharp.GlobalSettings.SetConfigSearchPaths(ConfigurationLevel level, String[] paths) [/home/andrew/git/nerdbank.streams/src/Nerdbank.Streams/Nerdbank.Streams.csproj]
/home/andrew/.nuget/packages/nerdbank.gitversioning/2.2.10/build/Nerdbank.GitVersioning.targets(63,5): error MSB4018:    at Nerdbank.GitVersioning.GitExtensions.OpenGitRepo(String pathUnderGitRepo) [/home/andrew/git/nerdbank.streams/src/Nerdbank.Streams/Nerdbank.Streams.csproj]
/home/andrew/.nuget/packages/nerdbank.gitversioning/2.2.10/build/Nerdbank.GitVersioning.targets(63,5): error MSB4018:    at Nerdbank.GitVersioning.VersionOracle.Create(String projectDirectory, String gitRepoDirectory, ICloudBuild cloudBuild, Nullable`1 overrideBuildNumberOffset, String projectPathRelativeToGitRepoRoot) [/home/andrew/git/nerdbank.streams/src/Nerdbank.Streams/Nerdbank.Streams.csproj]
/home/andrew/.nuget/packages/nerdbank.gitversioning/2.2.10/build/Nerdbank.GitVersioning.targets(63,5): error MSB4018:    at Nerdbank.GitVersioning.Tasks.GetBuildVersion.ExecuteInner() [/home/andrew/git/nerdbank.streams/src/Nerdbank.Streams/Nerdbank.Streams.csproj]
@AArnott AArnott added the bug label Aug 25, 2018
@AArnott
Copy link
Collaborator Author

AArnott commented Aug 25, 2018

@bording @luqunl @AaronRobinsonMSFT With #215 behind us, this is the next thing that appears as a regression. Interestingly enough, the #216 workaround seems to have somehow made things worse since now all projects on all versions of linux fail (rather than #215 which only failed on some projects).

Regarding the LD_DEBUG errors:

      2512:	/usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.3/System.Globalization.Native.so: error: symbol lookup error: undefined symbol: GlobalizationNative_ToAsciiW (fatal)
      2512:	/usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.3/System.Globalization.Native.so: error: symbol lookup error: undefined symbol: GlobalizationNative_GetSortKeyW (fatal)

Is this because Core CLR linked against linux .so files that don't have those symbols at runtime?

@bording
Copy link

bording commented Aug 25, 2018

@AArnott Which version of LibGit2Sharp are you using? I would expect Ubuntu 16 to be working without a problem, assuming you're pointing to the to the native library correctly.

To support most of the linux distros that .NET Core runs on, you'd need to be using 0.26.0-preview-0027, and then ensure you're loading the correct native library based on the RID.

Even if you do that, however, it's known that Ubuntu 18 doesn't work, because it shipped a newer version of curl. We'd need to build another separate library for that. I have a PR open that does that.

However, instead of having to keep chasing more and more native binaries, there is another plan brewing to reduce the native dependencies and simplify things as much as possible.

So, I would expect 16 to work and 18 to not work.

@bording
Copy link

bording commented Aug 25, 2018

@AArnott Can you post the entire LD_DEBUG log?

@AArnott
Copy link
Collaborator Author

AArnott commented Aug 25, 2018

you'd need to be using 0.26.0-preview-0027, and then ensure you're loading the correct native library based on the RID

I'd rather not depend on an unstable package, but if it resolved the issue I could live with that for a while. Do I need to do the work to load the correct native library, or does libgit2sharp take care of that (or at least offer something to help)?

it's known that Ubuntu 18 doesn't work

That's peculiar, since my nbgv tool works fine on Ubuntu 18 (and all other OSs I've tested), and that uses the same version of libgit2sharp and nb.gv. All these issues only show up when running in the context of dotnet build.

@AArnott
Copy link
Collaborator Author

AArnott commented Aug 25, 2018

Can you post the entire LD_DEBUG log?

I'd be happy to. But the > redirector operator doesn't seem to work in bash. Any suggestion how to modify the command line in my repro steps to capture the log in a file? I'm quite inexperienced in bash.

@bording
Copy link

bording commented Aug 25, 2018

I'd be happy to. But the > redirector operator doesn't seem to work in bash. Any suggestion how to modify the command line in my repro steps to capture the log in a file? I'm quite inexperienced in bash.

It's probably outputting to stderr, so try using 2> instead.

Do I need to do the work to load the correct native library, or does libgit2sharp take care of that (or at least offer something to help)?

When I was talking with @tmat about this, the conclusion was that there wasn't really something LibGit2Sharp could do, which is why he ended up adding RID-resolving logic to dotnet/sourcelink.

That's peculiar, since my nbgv tool works fine on Ubuntu 18

Hmm, I'm not sure how that would be, unless your Ubuntu 18 instance also had curl3 installed, which it doesn't by default AFAIK.

@AArnott
Copy link
Collaborator Author

AArnott commented Aug 25, 2018

I may have added curl, I don't know. But the annoying point to me is that msbuild tasks always have dependency issues and tools tend to work. I may just rewrite my msbuild task to spawn my tool.

@bording
Copy link

bording commented Aug 25, 2018

I may have added curl, I don't know. But the annoying point to me is that msbuild tasks always have dependency issues and tools tend to work. I may just rewrite my msbuild task to spawn my tool.

Honestly that might be the best way to handle it. That way you aren't fighting against .NET Core's assembly loading design that doesn't really account for these kind of scenarios.

On the other hand, if you don't want to make that change, if you can wait until it's ready, the 2nd option in dotnet/roslyn#29289 (comment) will definitely help simplify this a lot.

@bording
Copy link

bording commented Aug 25, 2018

If you do want to keep investigating this, give the stderr redirection a try, because I'd like to see what is actually causing the libgit2 load to fail.

@AArnott
Copy link
Collaborator Author

AArnott commented Aug 25, 2018

Here is my LD_DEBUG log file. Your stderr redirection idea worked. LD_DEBUG.zip

@bording
Copy link

bording commented Aug 25, 2018

@AArnott Thanks; Here are the relevant lines:

11132:	checking for version `CURL_OPENSSL_3' in file /usr/lib/x86_64-linux-gnu/libcurl.so.4 [0] required by file /home/andrew/.nuget/packages/nerdbank.gitversioning/2.1.65/build/MSBuildFull/lib/linux/x86_64/libgit2-1196807.so [0]
11132:	/usr/lib/x86_64-linux-gnu/libcurl.so.4: error: version lookup error: version `CURL_OPENSSL_3' not found (required by /home/andrew/.nuget/packages/nerdbank.gitversioning/2.1.65/build/MSBuildFull/lib/linux/x86_64/libgit2-1196807.so) (fatal)

Which version of Ubuntu is the log file from?

@AArnott
Copy link
Collaborator Author

AArnott commented Aug 25, 2018

That one is from Linux Mint 19, which I believe is based on Ubuntu 18.
It fails on Ubuntu 16 as well though. Perhaps with a different LD_DEBUG error.

@bording
Copy link

bording commented Aug 25, 2018

@AArnott Can you get a file from Ubuntu 16 as well?

@AArnott
Copy link
Collaborator Author

AArnott commented Aug 25, 2018

Here is Ubuntu 16: LD_DEBUG.zip

@bording
Copy link

bording commented Aug 25, 2018

Looking at the two log files, there are differences. However, the 18 log shows it's trying to use nerdbank.gitversioning 2.1.65 and the 16 log is using 2.2.10, so I don't think we have a fair comparison.

I assume 2.2.10 is the one we actually want to see errors from? If so, can you get another 18 log with that version?

@AArnott
Copy link
Collaborator Author

AArnott commented Aug 25, 2018

Yes, sorry. I have 3 machines and forgot to keep them in sync, obviously.
Here is Ubuntu 18 with nb.gv 2.2.10: LD_DEBUG.zip

@bording
Copy link

bording commented Aug 25, 2018

Ok, I believe I see what's going on.

In the 2.1.65 log, it's searching for and finding libgit2 in /home/andrew/.nuget/packages/nerdbank.gitversioning/2.1.65/build/MSBuildFull/lib/linux/x86_64/libgit2-1196807.so

And then failing to load because of the whole curl/OpenSSL problem I would expect to see on an Ubuntu 18-based machine.

Note that it appears it found the copy in MSBuildFull, not MSBuildCore, so there's probably some bug in what path you're passing in there.

For the two 2.2.10 logs, different behavior can be seen.

It's searching for libgit2, and can't find it. It's looking at /home/andrew/.nuget/packages/nerdbank.gitversioning/2.2.10/build/MSBuildCore/libgit2-6311e88.so

That path doesn't include the /lib/linux/x86_64/ portion it should have.

@bording
Copy link

bording commented Aug 25, 2018

Looking at 59adb4b#diff-c518cf4d2fd4bc942aa8b03d9b8c52ff

That makes sense. You're loading LibGit2Sharp into the Default context now, which means your custom LoadUnmanagedDll logic isn't being used.

@AArnott
Copy link
Collaborator Author

AArnott commented Aug 25, 2018

Thanks.

You're loading LibGit2Sharp into the Default context now, which means your custom LoadUnmanagedDll logic isn't being used.

Well now I'm between a rock and a hard place then. P/Invoke only works on Core CLR if I load libgit2sharp into the Default context, but it can only load the native DLL if it's in my custom one. Gah! I wonder if I can forcibly load the native dll myself so that the load context of the managed assembly doesn't matter. I might also look for a way to help find native binaries on the Default context.

As for MSBuildFull vs. MSBuildCore, an MSBuild property used to disclose that dotnet MSBuild was just .NET Core, but I think recent versions stopped defining it (I don't know why), so I started reading from MSBuildFull. But I'll double-check.

@bording
Copy link

bording commented Aug 25, 2018

I wonder if I can forcibly load the native dll myself so that the load context of the managed assembly doesn't matter.

I'm not sure that's possible. That was what @tmat's changes in libgit2/libgit2sharp#1563 were attempting to do (and I had planned on enhancing in libgit2/libgit2sharp#1571), but it turns out that it only works on Windows.

To take full control of the native library loading, we'd have to discard DLLImport altogether, and directly wire up everything manually.

@bording
Copy link

bording commented Aug 25, 2018

P/Invoke only works on Core CLR if I load libgit2sharp into the Default context

It seems like it would be worth continuing the investigation into why that would be. Is it something that we can resolve in some way, or is it actually a CoreCLR bug?

@AArnott
Copy link
Collaborator Author

AArnott commented Aug 25, 2018

It seems like it would be worth continuing the investigation into why that would be. Is it something that we can resolve in some way, or is it actually a CoreCLR bug?

I agree. The CoreCLR bug is still active, but inconclusive at the moment. I don't know any other way to workaround it at the moment.

I'm not sure that's possible.

That's a long PR to search. I haven't made it work yet, but given an AssemblyLoadContext-derived type can call LoadUnmanagedDll, it seems like I can force it to load IMO. That's the approach I'm taking. I'm running into a few random snags, which are slowing me down though.

@AArnott
Copy link
Collaborator Author

AArnott commented Aug 25, 2018

Maybe this is what you were suggesting @bording, but while it seems I can successfully load the native library on-demand, when it comes to p/invoking, it will try again to load it using the Default context and will fail. It's not smart enough to realize the library is already loaded. Perhaps it's because the p/invoke signature merely states git2-6311e88 but the native image that is already loaded is libgit2-6311e88.so.

It works on Windows, but not Linux. Given Linux is the only OS that changes the image name to prefix a lib string, I'm pretty sure that's the problem. Why is the lib prefix so important? Do we have to keep it?

@bording
Copy link

bording commented Aug 25, 2018

Yes that is the problem I'm referring to. I don't believe the lib prefix is the problem. You can see in the LD_DEBUG log that .NET Core knows to search for both versions.

@AArnott
Copy link
Collaborator Author

AArnott commented Aug 25, 2018

I just forcibly renamed it to remove the prefix. It still loaded on demand, but p/invoke did not find it, as you predicted.

@bording
Copy link

bording commented Aug 25, 2018

From the log:

file=/home/andrew/.nuget/packages/nerdbank.gitversioning/2.2.10/build/MSBuildCore/git2-6311e88.so
file=/home/andrew/.nuget/packages/nerdbank.gitversioning/2.2.10/build/MSBuildCore/libgit2-6311e88.so

It looks for both variations when given git2-6311e88 in the DLLImport.

@bording
Copy link

bording commented Aug 25, 2018

I just forcibly renamed it to remove the prefix. It still loaded on demand, but p/invoke did not find it, as you predicted.

Yeah, I really wish it worked properly on linux as well, because then I'd be able to finish libgit2/libgit2sharp#1571 and handle all of this internally for you.

@bording
Copy link

bording commented Aug 25, 2018

I agree. The CoreCLR bug is still active, but inconclusive at the moment. I don't know any other way to workaround it at the moment.

I'm still curious about the repro you have, and how it might be interacting with dotnet/sourcelink. That is the other tool I'm aware of that could also be hitting the problem you're seeing, but AFAIK it's not being seen there.

@AArnott
Copy link
Collaborator Author

AArnott commented Aug 25, 2018

how it might be interacting with dotnet/sourcelink

There is no interaction, because I'm not using SourceLink on any of these repos.
NB.GV 2.2.3 worked on some projects as well. Maybe the investigation should pursue what's different about the ones it works on and those it doesn't. My only guess is that it works on leaf projects, but not on projects with P2Ps that also use NB.GV. I'm going to (in)validate that hypothesis next.

@AArnott
Copy link
Collaborator Author

AArnott commented Aug 25, 2018

Hypothesis confirmed: I have the most trivial hello world C# project that works. I then copied it, and added a P2P to the original. Then when I build the copy, the original builds first (and succeeds) and the second one fails.
This is with code equivalent to 2.2.3, BTW, which loads libgit2sharp in my custom AssemblyLoadContext so that it can find the native library.

So perhaps the problem is that once I have loaded libgit2 in one custom assemblyloadcontext, and CoreCLR generates whatever interop stuff it needs for its custom marshaling, subsequently loading it in another assemblyloadcontext causes p/invoke handling to cross wire and fail.

@AArnott
Copy link
Collaborator Author

AArnott commented Aug 25, 2018

Ooh! early read on a workaround showed promise. I only create one custom AssemblyLoadContext and store it as a static and reuse it for subsequent task invocations. That got two projects building in a row without an error. :)

AArnott added a commit that referenced this issue Aug 25, 2018
@bording
Copy link

bording commented Aug 25, 2018

Interesting. I wonder if this is a similar issue to what using UniqueId.UniqueIdentifier as the MarshalCookie is intending to solve: https://github.com/libgit2/libgit2sharp/blob/master/LibGit2Sharp/Core/NativeMethods.cs#L100

I'd have to track down the original issues, but IIRC that was added to work around a custom marshaling bug when two different versions of LibGit2Sharp were loaded.

In this case, you've still got two different copies of the assembly loaded, but they share the same MarshalCookie, so things are getting messed up.

@AArnott
Copy link
Collaborator Author

AArnott commented Aug 25, 2018

Well, this will be fix number 3 for getting nb.gv working consistently everywhere (which of course used to work, but Ubuntu 18 and other recent Linux versions broke it). I hope the 3rd time's the charm on this.

@AArnott
Copy link
Collaborator Author

AArnott commented Aug 25, 2018

Thanks for your help investigating and brainstorming on this, @bording! I really appreciate it.

@bording
Copy link

bording commented Aug 25, 2018

Glad to help! I wish this stuff was easier to deal with. Having a native dependency always complicates things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants