Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm64 linux build machines seem to be missing key components? #12395

Closed
5 tasks
AndyAyersMS opened this issue Feb 2, 2023 · 13 comments
Closed
5 tasks

arm64 linux build machines seem to be missing key components? #12395

AndyAyersMS opened this issue Feb 2, 2023 · 13 comments

Comments

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Feb 2, 2023

Build

https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=156337

Build leg reported

Build / CoreCLR Product Build linux arm64 checked / Build CoreCLR Runtime

Pull Request

dotnet/runtime#81377

Action required for the engineering services team

To triage this issue (First Responder / @dotnet/dnceng):

  • Open the failing build above and investigate
  • Add a comment explaining your findings

If this is an issue that is causing build breaks across multiple builds and would get benefit from being listed on the build analysis check, follow the next steps:

  1. Add the label "Known Build Error"
  2. Edit this issue and add an error string in the Json below that can help us match this issue with future build breaks. You should use the known issues documentation
{
   "ErrorMessage" : "ld.lld: error: cannot open crt1.o: No such file or directory",
   "BuildRetry": false,
   "ErrorPattern": "",
   "ExcludeConsoleLog": false
}

Release Note Category

  • Feature changes/additions
  • Bug fixes
  • Internal Infrastructure Improvements

Release Note Description

Additional information about the issue reported

No response

Report

Build Definition Step Name Console log Pull Request
155366 dotnet/runtime Prepare TestHost with runtime Mono Log dotnet/runtime#81465
156168 dotnet/runtime Prepare TestHost with runtime CoreCLR Log dotnet/runtime#78736
157049 dotnet/runtime Build product Log
157051 dotnet/runtime Build product Log
156662 dotnet/runtime Build CoreCLR Runtime Log dotnet/runtime#81513
156337 dotnet/runtime Build CoreCLR Runtime Log dotnet/runtime#81377
156713 dotnet/runtime Build CoreCLR Runtime Log dotnet/runtime#81517
156490 dotnet/runtime Build CoreCLR Runtime Log dotnet/runtime#81052
156698 dotnet/runtime Build CoreCLR Runtime Log dotnet/runtime#81518
156666 dotnet/runtime Build product Log dotnet/runtime#81359
156653 dotnet/runtime Build CoreCLR Runtime Log
156632 dotnet/runtime Build product Log dotnet/runtime#81439
156147 dotnet/runtime Prepare TestHost with runtime CoreCLR Log
156631 dotnet/runtime Build CoreCLR Runtime Log dotnet/runtime#81063
156421 dotnet/runtime Build CoreCLR Runtime Log dotnet/runtime#81510
156596 dotnet/runtime Build product Log
156518 dotnet/runtime Build CoreCLR Runtime Log
156503 dotnet/runtime Build product Log dotnet/runtime#81335
156483 dotnet/runtime Build product Log dotnet/runtime#81162
156435 dotnet/runtime Build product Log
156142 dotnet/runtime Prepare TestHost with runtime CoreCLR Log
156431 dotnet/runtime Build product Log dotnet/runtime#81380
156413 dotnet/runtime Build CoreCLR Runtime Log dotnet/runtime#81509
156150 dotnet/runtime Build Log dotnet/runtime#81503
156384 dotnet/runtime Build CoreCLR Runtime Log dotnet/runtime#81454
156194 dotnet/runtime Build Log dotnet/runtime#74820
156273 dotnet/runtime Build CoreCLR Runtime Log dotnet/runtime#81409
156213 dotnet/runtime Prepare TestHost with runtime CoreCLR Log

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 28 28
@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Feb 2, 2023

CMAKE configuration fails right out of the gate:

/__w/1/s/src/coreclr/build-runtime.sh checked arm64 -cross  -ci -clang9      $(CoreClrPgoDataArg) --keepnativesymbols
========================== Starting Command Output ===========================
/usr/bin/bash --noprofile --norc /__w/_temp/232183c9-3530-4c83-a3e8-1adda3995083.sh
/__w/_temp/232183c9-3530-4c83-a3e8-1adda3995083.sh: line 1: CoreClrPgoDataArg: command not found
Commencing CoreCLR Repo build
__DistroRid: linux-arm64
Setting up directories for build
Checking prerequisites...
Commencing build of "install" target in "CoreCLR component" for linux.arm64.Checked in /__w/1/s/artifacts/obj/coreclr/linux.arm64.Checked
Invoking "/__w/1/s/eng/native/gen-buildsys.sh" "/__w/1/s/src/coreclr" "/__w/1/s/artifacts/obj/coreclr/linux.arm64.Checked" arm64 linux -clang9 Checked ""  -DCLR_CMAKE_PGO_INSTRUMENT=0 -DCLR_CMAKE_OPTDATA_PATH= -DCLR_CMAKE_PGO_OPTIMIZE=0 -DFEATURE_DISTRO_AGNOSTIC_SSL=1  -DCLR_CMAKE_KEEP_NATIVE_SYMBOLS=true
/__w/1/s/artifacts/obj/coreclr/linux.arm64.Checked /__w/1/s
Not searching for unused variables given on the command line.
loading initial cache file /__w/1/s/eng/native/tryrun.cmake
-- The C compiler identification is Clang 9.0.1
-- The CXX compiler identification is Clang 9.0.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - failed
-- Check for working C compiler: /usr/bin/clang-9
-- Check for working C compiler: /usr/bin/clang-9 - broken
CMake Error at /usr/share/cmake-3.25/Modules/CMakeTestCCompiler.cmake:70 (message):
  The C compiler

    "/usr/bin/clang-9"

  is not able to compile a simple test program.

 It fails with the following output:

    Change Dir: /__w/1/s/artifacts/obj/coreclr/linux.arm64.Checked/CMakeFiles/CMakeScratch/TryCompile-jjq0Ej
    
    Run Build Command(s):/usr/bin/make -f Makefile cmTC_ed78d/fast && /usr/bin/make  -f CMakeFiles/cmTC_ed78d.dir/build.make CMakeFiles/cmTC_ed78d.dir/build
    make[1]: Entering directory '/__w/1/s/artifacts/obj/coreclr/linux.arm64.Checked/CMakeFiles/CMakeScratch/TryCompile-jjq0Ej'
    Building C object CMakeFiles/cmTC_ed78d.dir/testCCompiler.c.o
    /usr/bin/clang-9 --target=aarch64-linux-gnu --gcc-toolchain=/crossrootfs/usr --sysroot=/crossrootfs    -MD -MT CMakeFiles/cmTC_ed78d.dir/testCCompiler.c.o -MF CMakeFiles/cmTC_ed78d.dir/testCCompiler.c.o.d -o CMakeFiles/cmTC_ed78d.dir/testCCompiler.c.o -c /__w/1/s/artifacts/obj/coreclr/linux.arm64.Checked/CMakeFiles/CMakeScratch/TryCompile-jjq0Ej/testCCompiler.c
    Linking C executable cmTC_ed78d
    /usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_ed78d.dir/link.txt --verbose=1
    /usr/bin/clang-9 --target=aarch64-linux-gnu --gcc-toolchain=/crossrootfs/usr --sysroot=/crossrootfs -Wl,--rpath-link=/crossrootfs/lib/aarch64-linux-gnu -Wl,--rpath-link=/crossrootfs/usr/lib/aarch64-linux-gnu -Wl,--rpath-link=/crossrootfs/lib/aarch64-linux-gnu -Wl,--rpath-link=/crossrootfs/usr/lib/aarch64-linux-gnu -fuse-ld=lld  CMakeFiles/cmTC_ed78d.dir/testCCompiler.c.o -o cmTC_ed78d 
    ld.lld: error: cannot open crt1.o: No such file or directory
    ld.lld: error: cannot open crti.o: No such file or directory

@AndyAyersMS
Copy link
Member Author

Also seeing this in other PRs, eg dotnet/runtime#81516

@AndyAyersMS
Copy link
Member Author

Potential fix dotnet/runtime#81354

@michellemcdaniel
Copy link
Contributor

Please let us know if that change addresses this issue. We rolled out the docker containers yesterday, so it is possible that something changed under the hood when we rebuilt them. @sbomer made the last changes to the arm64 cross image, though I can't see any evidence (there) that cmake was changed. Doesn't preclude it though. Since we get cmake from (most likely) a kitware apt repository, and they change things without us knowing about them, it's possible they made a change that broke you. A potential fix is to go back to the previous build, which I believe should be ubuntu-20.04-cross-arm64-20230125182303-e516922. I also know that @jkotas has investigated these sorts of build errors before too

@jkotas
Copy link
Member

jkotas commented Feb 2, 2023

any evidence (there) that cmake was changed.

This build break was not introduced by cmake changes. It was introduced by the arm64 cross image changes.

Given that the container updates are rolled out without any validation gates in the new system, we are going to see sudden wide-spread build breaks like this one with regular frequency.

@jkotas
Copy link
Member

jkotas commented Feb 2, 2023

For reference, this was the previous wide-spread build break caused by container rollout that I have helped fixing: dotnet/runtime#78522

@michellemcdaniel
Copy link
Contributor

In the new system (which we aren't using yet. That's still months away), we do plan on having validation of the images. The current system, you are correct, has no way of validating

@jkotas
Copy link
Member

jkotas commented Feb 2, 2023

Do you have a design doc for the new system and how it is going to do validation? The last design doc that I have seen assumed floating container labels that are updated without going through repo CI validation.

@michellemcdaniel
Copy link
Contributor

I must have misunderstood what you meant by validation. I don't know if we are going to be able to have repo ci validation, beyond folks having pipelines that use the staging docker images, which exist today, and you are welcome to use. I just mean that we will have validation that everything we said will be installed is installed correctly, like we do for our Helix VMs. This issue is something we could potentially have with the VMs as well: something changes in, for example, a package manager, that isn't specifically communicated and it breaks product repos that assume something else.

@jkotas
Copy link
Member

jkotas commented Feb 2, 2023

we will have validation that everything we said will be installed is installed correctly, like we do for our Helix VMs

Helix VMs are used for tests. Test dependencies are a lot less brittle than build dependencies. If this kind of validation was proven to be sufficient to prevent breaks with test image updates, it is not necessarily going to be sufficient to prevent breaks with build image updates.

We can give it a try and see whether it is causing unreasonable pain.

@AndyAyersMS
Copy link
Member Author

Please let us know if that change addresses this issue. W

Things seem to be fixed.

As Jan says this was a problem detected by CMAKE but not a problem with CMAKE; something broke the ability to compile and link simple programs and the error was noticed by the linker. Very early in a build CMAKE will compile and run simple programs to figure out how to properly configure the build settings for the things we actually want to compile. These early compiles are a natural place to hit issues if various bits of build componentry are not present.

@mmitche
Copy link
Member

mmitche commented Feb 2, 2023

@jkotas Helix VMs are also used for builds. Images are updated regularly with patches and VS updates, including moving to the next minor version of VS when it goes RTM. We provide scouting queues so that repo owners can verify what will get rolled out soon (staging, essentially), or in cases where they need new features outside of the normal rollout schedule.

@ilyas1974
Copy link
Contributor

As it appears that thing are fixed, closing issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants