Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use goto to exit in build.cmd so that error code propagates out of cmd #990

Merged
merged 3 commits into from
Dec 19, 2019

Conversation

safern
Copy link
Member

@safern safern commented Dec 17, 2019

If there is a failure in some nested steps of coreclr/build.cmd the exit code was not being propagated due to nested calls and if blocks, instead, go to the end of the script and exit. Because the exit code wasn't propagated, when running build.cmd from the root, coreclr.proj is called and that shells out via Exec into coreclr/build.cmd, and MSBuild was getting a 0 exit code, so the build wouldn't fail if there was a failure. I.e: the sdk.txt failure which was manifesting latter on the build up when the installer tried to crossgen libraries assemblies.

cc: @dotnet/runtime-infrastructure

@ViktorHofer
Copy link
Member

What's the difference between an exit and an goto somewhere and then exit?

@trylek
Copy link
Member

trylek commented Dec 17, 2019

I don't think there's a difference, I guess Santi is basically trying to create a unified choke point for error handling.

@safern
Copy link
Member Author

safern commented Dec 17, 2019

There is a difference if you're within a subroutine or a call. So basically doing goto and then exiting, goto is telling the script to exit the subroutine and then execute exit /b 1.

@dagood
Copy link
Member

dagood commented Dec 17, 2019

This call is the (only) subroutine call in this script, right?

for %%i in (%__BuildArchList%) do (
for %%j in (%__BuildTypeList%) do (
call :BuildOne %%i %%j
)
)

(Based on looking it up--didn't know batch files had this concept.)

@safern
Copy link
Member Author

safern commented Dec 17, 2019

Right. That's the only subroutine call. The weird thing is that if we exit outside of any of these ifs:

https://github.com/dotnet/runtime/blob/master/src/coreclr/build.cmd#L528

The exit code is propagated. However if we exit within the if ( ) blocks, then it is not propagated and the calling process (either Powershell or MSBuild) would get an exit code 0. That's why I decided to do a goto, to be in the root of the script, and that indeed fixes the issue. I've tried everything else, I will try and read the script one more time very careful to see why that can be, however, if any of you have any other better ideas I would really appreciate it.

exit /b 1

:ExitWithCode
exit /b !__exitCode!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe document the weird subroutine behavior here, so someone doesn't try to remove it in the future thinking it's just someone trying to do "single return". (And if someone hits a similar issue, maybe they'll see this and get their fix without as much headache.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the expectation of this method that a zero exit code is acceptable? Basically is this explicitly an error routine or a general purpose exit routine?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a general purpose exit routine is acceptable. That's why there's also ExitWithError which exits with 1.

@safern
Copy link
Member Author

safern commented Dec 18, 2019

@BruceForstall @jkoritzinsky @trylek it seems like the format windows job was already broken but since the exit code wasn't flowing from src\coreclr\build.cmd the python script was not exiting with error. Could you help me understand what is wrong?

-- Configuring done
CMake Error: Error required internal CMake variable not set, cmake may not be built correctly.
Missing variable is:
CMAKE_RC_CREATE_SHARED_LIBRARY
CMake Error: Error required internal CMake variable not set, cmake may not be built correctly.
Missing variable is:
CMAKE_RC_CREATE_SHARED_LIBRARY
CMake Error: Error required internal CMake variable not set, cmake may not be built correctly.
Missing variable is:
CMAKE_RC_CREATE_SHARED_LIBRARY
-- Generating done
CMake Generate step failed.  Build files cannot be regenerated correctly.
BUILD: Error: failed to generate native component build project

@BruceForstall
Copy link
Member

@safern Are you saying that "goto" from within a call, then calling "exit" will exit the entire script? I don't believe that's how it works. I thought that any call :local_label needs a corresponding exit /b XXX (or fall off the end of the script).

@BruceForstall
Copy link
Member

@BruceForstall @jkoritzinsky @trylek it seems like the format windows job was already broken but since the exit code wasn't flowing from src\coreclr\build.cmd the python script was not exiting with error. Could you help me understand what is wrong?

The formatting job probably needs work to make it work in the new repo. If there's not an issue, we should add one. In particular, it looks like src\jit-format\jit-format.cs in the https://github.com/dotnet/jitutils repo might need work to handle the new artifacts directory and build scripts. e.g., it's looking for:

string[] compileCommandsPath = { _rootPath, "bin", "nmakeobj", "Windows_NT." + _arch + "." + _build, "compile_commands.json" };        

where _rootPath is the src\coreclr directory, and it's obviously not going to find it.

@erozenfeld Can you look into getting the formatting job to work?

@erozenfeld
Copy link
Member

@erozenfeld Can you look into getting the formatting job to work?

Sure, I'll fix it tomorrow.

@safern
Copy link
Member Author

safern commented Dec 18, 2019

Sure, I'll fix it tomorrow.

Thanks @erozenfeld could you cc me on whatever it is needed and let me know whenever it is fixed so I can merge this PR?

REM =========================================================================================
REM === These two routines are intended for the exit code to propagate to the parent process
REM === Like MSBuild or Powershell. If we directly exit /b 1 from within a if statement in
REM === any of the routines, the exit code is not propagated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another great item / design principal that we could include in scripting design guideline documents.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a doc for those guidelines yet? If not, do we have an issue?

@erozenfeld
Copy link
Member

@BruceForstall

The formatting job probably needs work to make it work in the new repo. If there's not an issue, we should add one. In particular, it looks like src\jit-format\jit-format.cs in the https://github.com/dotnet/jitutils repo might need work to handle the new artifacts directory and build scripts.

That was already fixed in dotnet/jitutils#227

We had this CMAKE error

CMake Error: Error required internal CMake variable not set, cmake may not be built correctly.
Missing variable is:
CMAKE_RC_CREATE_SHARED_LIBRARY

for at least 3 years, e.g., see the output here: dotnet/jitutils#64

We get this error when we run
build.cmd -usenmakemakefiles

It's benign, compile_commands.json file is generated and clang-tidy uses it to run.

I'm trying to figure out a way to work around this error. cmake documentation is not very helpful.
I found one relevant thread: https://cmake.org/pipermail/cmake/2012-January/048647.html
and the suggested fix here https://octolinker-demo.now.sh/sakura-editor/sakura/commit/537bdf11452f62d10e1dbc09176d7a2d91c7e68a
Not sure if we need something similar.

Will keep digging.

@safern
Copy link
Member Author

safern commented Dec 18, 2019

We had this CMAKE error

@janvorli might have ideas...

@erozenfeld
Copy link
Member

Adding

SET(CMAKE_RC_CREATE_SHARED_LIBRARY "${CMAKE_CXX_CREATE_SHARED_LIBRARY}")

at the end of
https://github.com/dotnet/runtime/blob/0c002ccca614688d2bc666ae44ff0ac07b3add5f/src/coreclr/configurecompiler.cmake

makes the error disappear.

@safern
Copy link
Member Author

safern commented Dec 18, 2019

Thanks @erozenfeld. I'll include that as part of my PR.

@safern
Copy link
Member Author

safern commented Dec 18, 2019

I added: ceefc5f to fix it.

@safern safern merged commit 639122a into dotnet:master Dec 19, 2019
@safern safern deleted the CoreclrBuildcmdExit branch December 19, 2019 01:37
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants