-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken C++ runtime on windows-2022 version 20240603.1.0 #10020
Comments
We worked around the problem by switching to using the static msvc runtime (bfgroup/b2@0075644). And plan on staying with the static runtime to avoid this problem again and again. As it's not the first time such botched updates have happened. |
Looks like the same as #10004, there are some workarounds in that issue. |
Glad it works for you. However, I have many DLL's and executables in the project. We can't link with the static runtime. If we did so, each process would end up with as many instances of the runtime as DLL's (+1 for the .exe), which does not work.
Thanks for pointing this. This is a very recent one too. They opened it while I was chasing the reason for the problem (it took a long time to identify the mutex issue). |
In fact, unless you have total control over your users' machines, you should be grateful to Github for showing you the problem. The expression the Windows DLL hell exists for a reason and the only viable solution when shipping binaries for Windows is to either go static, either ship this DLL yourself. At least when it comes for me, this is the last time I got burned by the MSVC runtime. |
There has definately been some changes in the 'default construction' of a mutex. Extracts from various recent versions... 14.32.31326
14.39.35519
14.40.33807 - this seems to be the latest
in 14.32 and 14.39, the default behaviour (when no compiler directive is specified) is
Not providing the |
It's worse than that. I tried to install the same VC runtime as used during the build and the problem remains the same, see #10004 (comment) |
This is normal, if you build with this version of MSVC, you need the new runtime. |
Precisely, I explicitly install on the runner system the MSVC runtime with which I built the code. So, the same runtime is used in compilation and run. But it does not work. Running the application still fails on locking the mutex. There is an inconsistency somewhere. Either the VCRedist package in the VS tree is not the same as used by the compiler, or there is a mess of already installed MSVC runtime which takes precedence because of a PATH settings. So, this is either an inconsistency in VS or an inconsistency in the GH runner. |
MSVC does not build with the runtime. MSVC produces code that expects to find its runtime. This would be the same with |
You misinterpreted what I meant. The compiler and the RTL work together, always. The compiler (well, the compilation environment at large) provides header files. These header files contain specific definitions, here the variants of the mutex constructor. The binary of the runtime must be compatible with these definitions. If a new version of the compiler (and compilation environment) introduces a new version of a constructor, the corresponding code must be in the RTL. Therefore, there is a new version of the RTL which comes (somehow) with the compiler. When you install Visual Studio, the installed tree of files contains a VCRedist package, a package to install on target systems to make sure that they will be able to run applications which are compiled by this compiler. This is what I mean: When you build with a given version of Visual Studio, the headers which are used during the compilation must be compatible with the provided VCRedist package. This is why, when you package an application, you typically include this VCRedist in the package of your application and you install both at the same time. Thus, you can be confident that your application will work on the target system, even if it is a bit older (not too old up to some point). This is why, in my test, I explicitly installed the VCRedist that is found in the Visual Studio setup of the GH runner, before running the application (before compiling in fact). The expected result is that the application which is built is compatible with the RTL that we just installed. And this is what fails... Therefore, something is rotten in the state of GitHub. |
Yes, indeed. I agree, they updated the compiler without updating the VCRedist package. However this is also a huge wake-up call for everyone who ships Windows binaries. In my case, it is a Node.js addon that is installed through |
@lelegard We are looking into the issue, we will get back to you after investigating this issue. |
@RaviAkshintala, you initially wrote:
Then I wrote this:
And, after my comment, you edited your previous comment and you removed the sentence "Actually we look into the issue". It is fortunate that the editing history of posts is available to demonstrate this. Let me say that this is extremely offensive and dishonest. As @MarkCallow and @mprather confirmed with me, the problem is NOT fixed. Not only you close the issue without a complete fix but you also erase the part of the discussion which exhibits this. |
I can confirm that the problem is not solved. Please reopen this issue and, please, solve it asap. At least by reverting to the old working runner. We are experiencing two weeks of broken runners and this is very unprofessional. |
Hi @lelegard We Apologise for the mistake, will look into the carefully.Thanks. |
If GitHub supported it, the right thing to do with this is mark it as a duplicate of #10004 so there aren't multiple threads of discussion going on. The description I gave earlier of the JNI failure is what remains of the original problem since deployment of 20240610.1.0. Actually #10055 was opened specifically re the JNI issue. That too, in my view, is a duplicate of #10004. |
@RaviAkshintala and all GitHub folks, Because characterizing the problem was only possible in a GitHub Actions runner context, I had to run many workflows on a copy of a big repo to come to the conclusion of the C++ Because of this problem, which was created by GitHub with a careless, insufficiently tested, upgrade, I burnt all my Actions credits:
This is the first time it happens to me in 11 years of GitHub usage. GitHub cannot credit back the many hours of my time I lost on this issue (and many others' time as well). However, it would be fare from GitHub to restore my Actions credits. Again, this credit was lost because of a GitHub bug, not for my own usage or the usage of my project. So, please consider recrediting my Actions quota. |
To all, 7 days after complaining that investigating GitHub's problem burnt all my GH Action credits and asking for a refill of the credits, I still got nothing. My GH Actions credit is still zero. All burnt to do what GH should have done: investigating the problem that they created. And the problem is still not fixed. The contempt and disregard of GH for its users seems have no limit. |
@lelegard Did you file a customer support request? Those have always been resolved to my satisfaction. A comment in this thread won't get you the help you seek. |
I have been googling a similar issue, so maybe this is the same problem? It seems to be the JVM in general.... |
Almost certainly. See #10055. As I noted there, since you can't know what JVM your users are using you need to define the macro to avoid the new constexpr mutex constructor to make your JNI modules compatible with older versions of msvcp140.dll. |
@lelegard - I hope your issue got resolved, as i can see there was a workaround been used - |
@RaviAkshintala, do you understand the difference between "fixing a bug" and "implementing a workaround" ? "Implementing a workaround" is a user's temporary survival effort to mitigate the consequences of a bug. Under no circumstances, implementing a workaround should be considered as a bug fix, a license to close the corresponding issue. The problem is demonstrated in project lelegard/gh-runner-lock. The workflow ci.yml tests various configurations, including two "default" configurations (meaning "normal" projects with "default" options) which should work out of the box. As long as at least one of these two default configurations is flagged with a red tick (as in this worlflow log), the problem is NOT FIXED. That is the second time you attempt to close this issue without fixing the problem. If you define your objectives as "closing issues" rather than "fixing problems", this is very unprofessional, at best. |
@RaviAkshintala the issue hasn't been resolved yet. My workflow is still encountering segmentation faults (segfaults) when running some tests that involve mutexes: https://github.com/alemuntoni/vclib/actions/runs/9854078683/job/27206058584 This issue doesn't occur on any of my local Windows machines, nor on any of the runners prior to version 20240603.1.0. Please, reopen this issue and address it directly. Workarounds aren't a permanent solution and shouldn't be considered a fix. |
@alemuntoni, they also closed #10055 with the same type of comment ("thank you for reporting this, we see that your have found a workaround, bla, bla"). It seems they want to bury all open symptoms of the issue, without fixing it. This is silly because the new constexpr constructor for the mutex was a good idea. Not only it speeds up the mutex constructor, but more importantly it solved potential determinism issues in initialization order of static mutexes. However, because of a few guys whose KPI seems to be the issue closing rate, we will have to disable it forever. |
A crash was caused by a combination of old msvcp140.dll and new SDK. Details were discussed on the issue [1]. This commit should be reverted once this repository started to require OBS Studio 30.2 or later since OBS Studio 30.2 will require the updated DLL [2] so that the crash won't happen. - [1] actions/runner-images#10020 - [2] obsproject/obs-studio@00c68981ab9
This is completely unprofessional. Unfortunately, I haven't been able to find a workaround within my repository. As a result, I'm forced to disable the tests on Windows, as they consistently fail due to an issue that GitHub hasn't addressed yet. @RaviAkshintala please reopen and address the issue properly. |
I you have control over the C++ code, defining I also tried to cleanup
I do not know what is So, this is hardly an acceptable workaround. |
A few days ago this issue has started cropping up in our builds:
See this run: https://github.com/EsotericSoftware/spine-runtimes/actions/runs/11912589016/job/33196524119 The project compiles fine on a local Windows machine. The issue seems to be non-deterministic, as the code in question compiles usually, but fails on some runs, without any changes to the code or build system. |
Description
Something was broken in runner image windows-2022 version 20240603.1.0 , maybe in the VC++ runtime.
When running test programs in a workflow, using a C++ mutex (std::mutex) immediately terminates the application with error 1.
As a consequence, all continuous integration pipelines / non-regression test suites for C++ applications are broken, unusable, as soon as the application uses a mutex.
This is a very serious issue which should be addressed with a high priority.
The problem appears after the upgrade of windows-2022 (aka "windows-latest")
Platforms affected
Runner images affected
Image version and build link
The problem is demonstrated in the following simple repo: https://github.com/lelegard/gh-runner-lock
The log with the sample failure: https://github.com/lelegard/gh-runner-lock/actions/runs/9431982526/job/25981214253
Is it regression?
Last worked on windows-2022 runner version 2.316.1, image version 20240514.3.0
Expected behavior
The C++ applications which are built as part of a workflow should not crash during the subsequent test phase.
Actual behavior
See repro section.
Repro steps
The problem is demonstrated in the following simple repo: https://github.com/lelegard/gh-runner-lock
The log with the sample failure: https://github.com/lelegard/gh-runner-lock/actions/runs/9431982526/job/25981214253
The C++ program is quite simple:
Of course, being so simple, this program works well everywhere, including on local Windows development systems.
When executed in a GitHub workflow, starting with Windows runner version 2.317.0, it fails in the lock step. The above-mentioned log contains this:
Background
The problem initially appears after that upgrade on the project TSDuck where all workflows suddenly failed on Windows platforms.
The project is quite big (~ 350,000 loc, C++). Everything works fine on local Windows development systems. Only the GitHub CI failed. Because of the size of the project and the absence of direct interaction with the GitHub runner, identifying the reason for the failure was quite hard. I spent hours of test repos and run 52 versions of the CI workflow to understand the nature of the problem.
The text was updated successfully, but these errors were encountered: