Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify build_native via eng/native/build-commons #1753

Merged
merged 6 commits into from
Jan 23, 2020
Merged

Unify build_native via eng/native/build-commons #1753

merged 6 commits into from
Jan 23, 2020

Conversation

am11
Copy link
Member

@am11 am11 commented Jan 15, 2020

In order to add new platform, architecture, compiler or its newer
version, there are currently multiple places to update, which can be
deduplicated.

This patch unifies:

  • directories creation
  • prerequisites checking
  • version.c generation
  • native build invocation
  • common argument parsing
  • building help menu

for various build scripts under coreclr, installer and libraries.

The common entry-point script is now eng/native/build-commons.sh and
rest of the scripts under eng/native are implementation detail.

Also extracted CMake platform detection in
eng/native/configureplatform.cmake, to share with coreclr and
installer/corehost.

@am11
Copy link
Member Author

am11 commented Jan 16, 2020

@janvorli, arm pri0 test failure looks real going by helix log. Do you have some clues what might be the cause of this assertion:

Assert failure(PID 133 [0x00000085], Thread: 133 [0x0085]): !"Recursion in CLRException::GetThrowable"
    File: /__w/4/s/src/coreclr/src/vm/clrex.cpp Line: 148
    Image: /root/helix/work/correlation/corerun

Alternatively, to investigate it locally using lldb etc., could you tell me the command to run pri0 tests? I have arm build ready via rootfs, I will try to run it on arm device.

@am11 am11 requested a review from janvorli January 16, 2020 14:08
@janvorli
Copy link
Member

This error usually indicates mismatch between libcoreclr.so and System.Private.CoreLib.dll. E.g. when these files are built from different source tree states or one of them is of debug /checked and the other release build.

As for debugging on arm device, what I usually do is to cross-build the tests using the build-test.sh (pass it the same options as to build.sh). Then I clone the same state of the repo on arm, ideally to the same path as on the build device and copy over the artifacts folder from the build machine to the arm one.
Then I run all the tests using the exact command line the build-test prints at the end or run just a single test. For debugging, the easiest way is to run a single test as follows (assuming you are using debug build, which I would highly recommend)

  1. cd into the binary folder of the test (where the .sh file for the test resides). It is somewhere under the artifacts/tests/coreclr/Linux.arm.Debug.
  2. run gdb --args /full/path/to/corerun/in/tests/core_root your_test_assembly.dll (the corerun is located in artifacts/tests/coreclr/Linux.arm.Debug/Tests/Core_Root)

The .sh file for each test can also be used for starting the test under a debugger (-debug option), but it has a problem with passing arguments to gdb / lldb currently, it doesn't put -- after lldb and --args after gdb that is necessary for correctly passing the arguments to the corerun. So I don't use it when I need to debug stuff.

@am11
Copy link
Member Author

am11 commented Jan 17, 2020

Thanks! I could not spot anything obvious in this PR which suggest the configuration mismatch. So I am investigating on arm device.

The error was spotted with Checked build, so I went ahead with that configuration. Here is a stack trace on executing a test on arm device (same path as my az vm running Ubuntu where i cross compiled for arm):

click to expand
pi@raspberrypi:/datadrive/projects/runtime $ bash artifacts/tests/coreclr/Linux.arm.Checked/JIT/CodeGenBringUpTests/ArrayExc_d/ArrayExc_d.sh -coreroot=$(pwd)/artifacts/bin/coreclr/Linux.arm.Checked
BEGIN EXECUTION
/datadrive/projects/runtime/artifacts/bin/coreclr/Linux.arm.Checked/corerun ArrayExc_d.dll ''
Unhandled exception. Process terminated. Infinite recursion during resource lookup within System.Private.CoreLib.  This may be a bug in System.Private.CoreLib, or potentially in certain extensibility points such as assembly resolve events or CultureInfo names.  Resource name: Word_At
   at System.Environment.FailFast(System.String)
   at System.SR.InternalGetResourceString(System.String)
   at System.SR.GetResourceString(System.String, System.String)
   at System.Diagnostics.StackTrace.ToString(TraceFormat, System.Text.StringBuilder)
   at System.Diagnostics.StackTrace.ToString(TraceFormat)
   at System.Diagnostics.DebugProvider.Fail(System.String, System.String)
   at System.Globalization.CultureInfo.get_InvariantCulture()
   at System.Resources.ManifestBasedResourceGroveler.GetNeutralResourcesLanguage(System.Reflection.Assembly, System.Resources.UltimateResourceFallbackLocation ByRef)
   at System.Resources.ResourceManager.CommonAssemblyInit()
   at System.Resources.ResourceManager..ctor(System.Type)
   at System.SR.get_ResourceManager()
   at System.SR.InternalGetResourceString(System.String)
   at System.SR.GetResourceString(System.String, System.String)
   at System.Diagnostics.StackTrace.ToString(TraceFormat, System.Text.StringBuilder)
   at System.Diagnostics.StackTrace.ToString(TraceFormat)
   at System.Diagnostics.DebugProvider.Fail(System.String, System.String)
   at System.Globalization.CultureInfo.get_InvariantCulture()
   at System.Resources.ManifestBasedResourceGroveler.GetNeutralResourcesLanguage(System.Reflection.Assembly, System.Resources.UltimateResourceFallbackLocation ByRef)
   at System.Resources.ResourceManager.CommonAssemblyInit()
   at System.Resources.ResourceManager..ctor(System.Type)
   at System.SR.get_ResourceManager()
   at System.SR.InternalGetResourceString(System.String)
   at System.SR.GetResourceString(System.String, System.String)
   at System.DllNotFoundException..ctor()
   at System.Globalization.GlobalizationMode.GetGlobalizationInvariantMode()
   at System.Globalization.GlobalizationMode..cctor()
   at System.Globalization.CultureData.CreateCultureWithInvariantData()
   at System.Globalization.CultureData.get_Invariant()
   at System.Globalization.CultureInfo..cctor()
   at System.Globalization.CultureInfo.get_InvariantCulture()
   at System.Resources.ManifestBasedResourceGroveler.GetNeutralResourcesLanguage(System.Reflection.Assembly, System.Resources.UltimateResourceFallbackLocation ByRef)
   at System.Resources.ResourceManager.CommonAssemblyInit()
   at System.Resources.ResourceManager..ctor(System.Type)
   at System.SR.get_ResourceManager()
   at System.SR.InternalGetResourceString(System.String)
   at System.SR.GetResourceString(System.String, System.String)
   at System.IO.FileNotFoundException.ToString()
artifacts/tests/coreclr/Linux.arm.Checked/JIT/CodeGenBringUpTests/ArrayExc_d/ArrayExc_d.sh: line 275:  1327 Aborted                 $LAUNCHER $ExePath "${CLRTestExecutionArguments[@]}"
Expected: 100
Actual: 134
END EXECUTION - FAILED

Looks like string resources are missing, not sure if that is the same case as CI, but return code is same as seen in helix log: 134. Can we pinpoint to something concrete in the patch with this info? I can switch to debug build, if that would give more insights.

@janvorli
Copy link
Member

Can you please check if the System.Globalization.Native.so is present and valid?

@am11
Copy link
Member Author

am11 commented Jan 17, 2020

The nearest System.Globalization.Native.so is present at artifacts/bin/coreclr/Linux.arm.Checked/crossgen2. I copied System.Globalization.Native.so and System.Runtime.dll from crossgen2 to artifacts/bin/coreclr/Linux.arm.Checked/ and reran ArrayExc_d.sh and it passed.

Also ran all hardwareintrinsic tests successfully, after copying an assembly from different location: cp artifacts/bin/runtime/netcoreapp5.0-Linux-Release-arm/System.Runtime.Intrinsics.Experimental.dll artifacts/bin/coreclr/Linux.arm.Checked/. On the CI, one or more of these hardwareintrinsic tests are failing with exit code 134: https://helix.dot.net/api/2019-06-17/jobs/ba003676-b60a-4de5-987e-c31b7bfcd86b/workitems/JIT.HardwareIntrinsics/console, which is likely "file not found"; then it sounds like this PR is expected to have some missing copy step?

@am11
Copy link
Member Author

am11 commented Jan 17, 2020

The Helix runs do not give more information (full console output which we see on device), or maybe I am looking at a wrong place? 🤔

@janvorli
Copy link
Member

@am11 the System.Globalization.Native.so should not be located in the artifacts/bin/coreclr/Linux.arm.Checked. It is not built in the coreclr part of the repo anymore. Now it is built in libraries.

The reason why it fixed the test for you is that the testing combines files from artifacts/bin/coreclr/Linux.arm.Checked and from the libraries into the artifacts/tests/coreclr/Linux.arm.Checked/Tests/Core_Root directory.
But it should get there together with other stuff built in libraries (e.g. System.Native.so, System.IO.Ports.Native.so, etc). Maybe these could give you a hint.

@janvorli
Copy link
Member

@am11 the copying from libraries happens in this step in runtest.proj:

  <Target Name="CreateTestOverlay">
    <MSBuild Projects="$(MSBuildProjectFile)"
             Targets="CopyDependencyToCoreRoot"
             Properties="Language=C#;TargetRid=$(TargetRid);RuntimeIdentifier=$(TargetRid)" />
  </Target>

It is invoked from build-test.sh using

build_MSBuild_projects "Tests_Overlay_Managed" "${__ProjectDir}/tests/src/runtest.proj" "Creating test overlay" "/t:CreateTestOverlay"

I have noticed that your change touched code related to RIDs, I wonder if that could somehow cause the problem.

Anyways, maybe you could look at the logged command line of msbuild.sh invoked from the build_MSBuild_projects for this case and compare it with the one without your changes.

In order to add new platform, architecture, compiler or its newer
version, there are currently multiple places to update, which can be
deduplicated.

This patch unifies:

* directories creation
* prerequisites checking
* `version.c` generation
* native build invocation
* common argument parsing
* building help menu

for various build scripts under coreclr, installer and libraries.

The common entry-point script is now `eng/native/build-commons.sh` and
rest of the scripts under `eng/native` are implementation detail.

Also extracted CMake platform detection in
`eng/native/configureplatform.cmake`, to share with coreclr and
`installer/corehost`.
@am11
Copy link
Member Author

am11 commented Jan 18, 2020

/azp run runtime-installer

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1753 in repo dotnet/runtime

@am11 am11 marked this pull request as ready for review January 18, 2020 03:25
@am11
Copy link
Member Author

am11 commented Jan 18, 2020

Thanks for your tips @janvorli! The issue was with Libraries Linux arm Release job not passing -cross argument explicitly. In master, we have:

$RepoRoot/coreclr.sh -cross # edit: for coreclr, top-level script does not allow -cross argument
  -> src/coreclr/build.sh -cross
$RepoRoot/installer.sh /p:CrossBuild=true
  -> src/installer/corehost/build.sh -cross
$RepoRoot/libraries.sh    # implicitly sets crossbuild=true in build-native.sh
  -> src/libraries/Native/build-native.sh  # still implicit

pr combines the implementation, favoring the majority and expects explicit arguments, so i have made libraries.sh do the same thing as installer.sh in master:

$RepoRoot/coreclr.sh -cross
  -> src/coreclr/build.sh -cross
$RepoRoot/installer.sh /p:CrossBuild=true
  -> src/installer/corehost/build.sh -cross
$RepoRoot/libraries.sh /p:CrossBuild=true
  -> src/libraries/Native/build-native.sh -cross

updated CI yml, where this argument is assigned based on the value of crossrootfsDir (like happens in case of installer).

in future, we can make eng/common/build.sh to accept -cross as first-class argument for cross compilation. This PR focuses on converging internal implementation.

@am11
Copy link
Member Author

am11 commented Jan 18, 2020

WinNT checked failures are unrelated (test timeout). This PR is ready for review. Could you please request runtime-installer jobs? It was triggered automatically in last run, but missed in this one. Thanks!

src/coreclr/build-test.sh Outdated Show resolved Hide resolved
src/coreclr/build-test.sh Show resolved Hide resolved
src/coreclr/build.sh Outdated Show resolved Hide resolved
src/coreclr/build.sh Outdated Show resolved Hide resolved
src/coreclr/build.sh Show resolved Hide resolved
src/coreclr/build.sh Show resolved Hide resolved
src/coreclr/configurecompiler.cmake Show resolved Hide resolved
src/installer/corehost/build.sh Outdated Show resolved Hide resolved
src/libraries/Native/build-native.cmd Show resolved Hide resolved
@am11
Copy link
Member Author

am11 commented Jan 21, 2020

@janvorli, I do not have permission to request installer pipeline run, which was running earlier (total 168 checks were executed) but not anymore. Could you please request:
/azp run runtime-installer
Also, could you manually re-run runtime (Test crossgen-comparison Linux arm checked), which has been cancelled due to unknown reason? This job was failing earlier due to a misplaced quote, so I wanted to make sure if f37f9b4 has fixed it.

@janvorli
Copy link
Member

/azp run runtime-installer

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@am11
Copy link
Member Author

am11 commented Jan 21, 2020

Just noticed that runtime-installer pipeline no longer exist: https://dev.azure.com/dnceng/public/_build?pipelineNameFilter=runtime (or maybe its Service Connection is dropped due to expired token). Nevertheless, 10 Installer jobs were ran in runtime pipeline.

@am11
Copy link
Member Author

am11 commented Jan 22, 2020

@janvorli, there were merge conflict with recent renaming in configurecompiler.cmake file (in the portions which this PR moves to env/native/configureplatform.cmake to share between coreclr and installer). Windows NT failures are unrelated Condition(s) not met: "NotElevatedAndSupportsEventLogs" and The "DownloadFile" task failed unexpectedly.. Could you please give it another pass?

@am11
Copy link
Member Author

am11 commented Jan 23, 2020

There were some more merge conflicts since the previous resolution.

Linux arm failure is unrelated: The TLS connection was non-properly terminated. while fetching dotnet-install.sh.

@janvorli, seems like more and more PRs are coming in touching these files. Can this get a turn before they get merged? had to revalidate property values with grep/diff tools. 😅

@janvorli
Copy link
Member

I am sorry for the hassle with resolving the conflicts. Let's get this in. I've added a few cosmetic micro nit comments yesterday, unfortunately I've somehow forgotten push the button to send out the comments. We can fix them later.

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I've said, please don't care about the new comments now so that we can get it in as soon as possible to avoid further conflicts.


if [[ ( "$__HostOS" == "Linux" ) && ( "$__HostArch" == "x64" || "$__HostArch" == "arm" || "$__HostArch" == "arm64" ) ]]; then
__IsMSBuildOnNETCoreSupported=1
elif [[ "$__HostArch" == "x64" && ( "$__HostOS" == "OSX" || "$__HostOS" == "FreeBSD" ) ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit - could you please move the OS check to the beginning of the condition so that parts of the condition are ordered the same way in this elif as in the if part?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, i will fix these in a followup PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #2071.

;;

armv7l)
echo "Unsupported CPU $CPUName detected, build might not succeed!"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message is clearly obsolete, can you please remove it when you are on it?

@@ -397,7 +400,7 @@ build_MSBuild_projects()
__msbuildWrn="\"/flp1:WarningsOnly;LogFile=${__BuildWrn};Append=${__AppendToLog}\""
__msbuildErr="\"/flp2:ErrorsOnly;LogFile=${__BuildErr};Append=${__AppendToLog}\""

export __TestGroupToBuild=$testGroupToBuild
export __TestGroupToBuild="$testGroupToBuild"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this export ( and the __NumberOfTestGroups above) be separated from the variable setting? I am asking since you have made such a change for the HOME variable below.

@janvorli janvorli merged commit 879e596 into dotnet:master Jan 23, 2020
@am11 am11 deleted the feature/consolidations branch January 23, 2020 09:06
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants