-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port git-artifacts
pipeline to git-for-windows-automation
#22
Conversation
1101e89
to
5d1a988
Compare
git-artifacts
pipeline to git-for-windows-automation
d76a960
to
100ee78
Compare
8ab9d48
to
26f3828
Compare
@dennisameling after thinking about this, I believe that it makes most sense to add that functionality to the workflow instead of keeping it only in a topic branch (and then essentially dropping it via |
@dennisameling I experienced the same hang that you reported here. I connected via RDP and investigated a little bit. I did not come terribly far, as it was late at night and eventually I had to give up and forgot to write down the details. But here is what I remember: there were two Since the On a hunch that the subshell was stuck waiting for something after doing its job, e.g. a Unfortunately, I had forgotten that the |
It's hanging again. Let me document better what I did. First of all, I RDP'ed into the machine, then opened a Git Bash. In that Bash, I looked for the process that would be hanging by first calling $ wmic process where 'CommandLine like "%sign%"' get ProcessId,ParentProcessId,CommandLine
CommandLine ParentProcessId ProcessId
D:\git-sdk-arm64-build-installers\usr\bin\make.exe -f ../mingw-w64-git.mak sign-executables 4408 6100
D:\git-sdk-arm64-build-installers\usr\bin\sh.exe -c "eval git --git-dir=D:/a/_work/git-for-windows-automation/git-for-windows-automation/.git signtool headless-git.exe git-daemon.exe git-http-backend.exe git-imap-send.exe git-sh-i18n--envsubst.exe git-shell.exe git-http-fetch.exe git-http-push.exe git-remote-http.exe git-remote-https.exe git-remote-ftp.exe git-remote-ftps.exe \ contrib/credential/wincred/git-credential-wincred.exe git.exe \ cmd/git{,-gui,k}.exe compat-bash.exe git-{bash,cmd,wrapper}.exe" 1872 2684
D:\git-sdk-arm64-build-installers\usr\bin\sh.exe -c "eval git --git-dir=D:/a/_work/git-for-windows-automation/git-for-windows-automation/.git signtool headless-git.exe git-daemon.exe git-http-backend.exe git-imap-send.exe git-sh-i18n--envsubst.exe git-shell.exe git-http-fetch.exe git-http-push.exe git-remote-http.exe git-remote-https.exe git-remote-ftp.exe git-remote-ftps.exe \ contrib/credential/wincred/git-credential-wincred.exe git.exe \ cmd/git{,-gui,k}.exe compat-bash.exe git-{bash,cmd,wrapper}.exe" 2684 5676
D:\git-sdk-arm64-build-installers\clangarm64\bin\git.exe --git-dir=D:/a/_work/git-for-windows-automation/git-for-windows-automation/.git signtool headless-git.exe git-daemon.exe git-http-backend.exe git-imap-send.exe git-sh-i18n--envsubst.exe git-shell.exe git-http-fetch.exe git-http-push.exe git-remote-http.exe git-remote-https.exe git-remote-ftp.exe git-remote-ftps.exe contrib/credential/wincred/git-credential-wincred.exe git.exe cmd/git.exe cmd/git-gui.exe cmd/gitk.exe compat-bash.exe git-bash.exe git-cmd.exe git-wrapper.exe 5676 5200
sh -c "sh \"/usr/src/build-extra/signtool.sh\" \"$@\"" "sh \"/usr/src/build-extra/signtool.sh\"" headless-git.exe git-daemon.exe git-http-backend.exe git-imap-send.exe git-sh-i18n--envsubst.exe git-shell.exe git-http-fetch.exe git-http-push.exe git-remote-http.exe git-remote-https.exe git-remote-ftp.exe git-remote-ftps.exe contrib/credential/wincred/git-credential-wincred.exe git.exe cmd/git.exe cmd/git-gui.exe cmd/gitk.exe compat-bash.exe git-bash.exe git-cmd.exe git-wrapper.exe 5200 5024
C:\Windows\System32\Wbem\wmic.exe process where "CommandLine like \"%sign%\"" get ProcessId,ParentProcessId,CommandLine 10112 10016 So indeed, there were quite a few, and the process with ID 5024 seemed to be the bottom-most one. To verify, I called: $ wmic process where 'ParentProcessId=5024' get ProcessId,ParentProcessId,CommandLine
No Instance(s) Available. Okay. Now on to debug. For that, I need Unfortunately, that does not work because of the OpenSSL situation where Git for Windows stays with v1.1.1* as long as possible while MSYS2 already switched to v3.*: $ gdb.exe
D:/git-sdk-arm64/usr/bin/gdb.exe: error while loading shared libraries: msys-python3.11.dll: cannot open shared object file: No such file or directory So I downloaded MSYS2's In that GDB session, I then tried to see the current threads:
Not very informative, so let's try to see the current thread's stacktrace:
That's it. My next attempt involved downloading Sysinternals' Process Explorer. That gave me more information. Apparently there is one thread with this stack trace:
So that looks as if it called
This looks as if it was inside the The other threads' stack trace can either not be accessed or look like they're completely outside of Bash's and MSYS2 runtime's source code:
I'm out of ideas what to investigate on that side, so I downloaded Sysinternals' $ ./handle64a.exe -p 5024
Nthandle v5.0 - Handle viewer
Copyright (C) 1997-2022 Mark Russinovich
Sysinternals - www.sysinternals.com
180: Section \BaseNamedObjects\msys-2.0S5-c2a8dd8be8845dc5\shared.5
184: Section \BaseNamedObjects\msys-2.0S5-c2a8dd8be8845dc5\S-1-5-20.1
1DC: File (RWD) D:\git-sdk-arm64-build-installers\usr\src\MINGW-packages\mingw-w64-git\src\git
1EC: Section \BaseNamedObjects\msys-2.0S5-c2a8dd8be8845dc5\cygpid.10728
2C0: Section \BaseNamedObjects\msys-2.0S5-c2a8dd8be8845dc5\cygpid.10728 So for now I'll just call it yet another unexplained hang and try to let the workflow run continue, at least, by force-stopping that process: And sure enough, the workflow run continues... |
Darn, darn, darn. Again I had to RDP into the runner and force-stop the This is not good. In this shape, we cannot publish a Git for Windows/ARM64. If it hangs for us so often, it will hang for plenty of users, too. |
And it's hanging again, and also here. I worked around this by RDP'ing into the machines and running essentially: candidates= &&
for pid in $(
wmic process where 'ExecutablePath like "%arm64-build-installers%"' get ProcessId |
sed 1d
)
do
test "z${pid#[0-9]}" != "z$pid" || continue
case "$(wmic process where parentprocessid=$pid get processid 2>&1 >/dev/null)" in
"No Instance(s) Available"*)
candidates="$candidates $pid"
;;
esac
done &&
wmic process where processid=${candidates##* } delete |
I reached out to
And @jturney said:
|
@jeremyd2019 tried to debug our hangs at one point |
yeah, I couldn't get anywhere though, none of the debuggers I tried to attach were able to get the thread context for the 'main' thread. there was also another thread running which seemed to be for cygwin signal handling, which seemed to be happily waiting. I gave up trying to debug it and instead added workarounds to avoid situations where it seemed to happen. Now that somebody else is reproducing it, maybe it can be debugged. Regarding your |
Right, I feared so much.
I fear that we'll have to get a much more reliable reproducer first. Maybe the following will reproduce the hang reliably enough: calling the MINGW executable At least that is what hung pretty reliably in those CI runs for me, in the last run it was every single time that call chain happened. But earlier runs did not hang after code-signing (even if they failed later on, but for independent reasons). The hang was at the end of the inner-most shell script just after all the code-signing was done. |
Since you're RDP'ing into the machine anyway, could you maybe collect a full memory dump of a hanging process using procdump? Maybe that could shed some more light on what it's waiting for 🤔 |
Hmm, so |
@Alovchin91 it's different |
@Alovchin91 I am curious what you suggest to do with that dump. Sure, I can collect that information, but what does that buy us that the previous investigation (#1 and #2) didn't reveal yet? @dennisameling @jeremyd2019 did any of you manage to reproduce this in a non-CI setting, i.e. when starting a command interactively? I tried and failed to cause such a hang using interactive commands in an Azure VM, and would love to do that to accelerate the investigation. |
That would give some starting point to those who would like to help but cannot reproduce. From your very next sentence I understand that reproducing the issue is not a trivial task 😅 I've asked around on Mastodon if somebody (from Microsoft?) could help. That, of course, if you're looking for any help 😊 |
The hangs that I experienced with pacman (both signature verification and info database updates) would happen frequently while updating/installing packages interactively. I know I did try the process dump option in task manager, but that dump was also 'missing' the context for the main thread. From my anecdotal observations, I think this seems more likely to occur on slower machines (happened all the time on the raspberry pi, but was more rare on the QC710), but perhaps more likely with more cores? (I was playing with a 60 core VM and I think it was more likely there than with 8 core QC710). I am set up to run a KVM virtual machine on a raspberry pi, I can try to reproduce the hangs I got with pacman and try things, if anybody has any ideas to try (I can try procdump and post that somewhere if somebody thinks it would help, for example). My workarounds for the pacman hangs I saw, which seem quite effective for building packages for msys2 project: |
@jeremyd2019 did you find a minimal reproducer there, i.e. ideally a command that only verifies the signature but does not necessarily update any package? Also, do you happen to know whether the signature verification happens to use a MINGW program to perform that verification (e.g. |
@Alovchin91 the thing that makes me most dubious about the And FWIW there is already somebody from Microsoft helping this here ticket: me. |
No, when I tried I ended up reproducing a different hang that also happened on x64, and was able to be debugged and was fixed in Cygwin, so it wasn't a complete loss, but after that was fixed my minimal reproducer no longer reproduced the hang.
No, it uses msys2 gpg, via gpgme. |
I tried gdb, windbg, the task manager process dump thing, even local kernel debugging, but was never able to get user mode context for that thread. I think I heard that windbgx (the new version in the store) now supports DWARF symbols, but I haven't verified that myself, and I don't know if it would understand the split debug info that msys2-runtime uses. (does that even work right with 3.4? I thought I saw some additional stripping happening in msys2-runtime go by at one point) |
@jeremyd2019 that's interesting. It probably means that the hangs are related, at least it could give us a better idea which code path to look at. Would you happen to know the commit hash of the fix in Cygwin?
Indeed. That's why I think attaching with But I'm starting to believe that this is a dead end, we might never get a useful stack trace because the issue is more likely that something in MSYS2's tear-down code is failing to tear down the signal event handler thread, and that one is blocking the correct and expected termination of the process. So I fear that we'll need to work with heavy instrumentation ("debug small_printf() messages littered all over the tear-down code path") and then compare the logs of a hanging to a non-hanging invocation of any minimal reproducer that we can come up with. |
@jeremyd2019 this might actually help address the hangs I observed, as Git for Windows is still on the Cygwin v3.3.* train. |
msys2/msys2-runtime@0ce992c. This was actually a crash, but the code was in such a state that it resulted in a hang in the exception handlers. |
@jeremyd2019 thank you. Unfortunately my hope that this could help Git for Windows is dispelled by the fact that this commit made it into v3.3.4, and we're on v3.3.6 already, so we have that fix... |
Update: I managed to reproduce the hang interactively, in my Azure VM, not on the first attempt, but the second. With a self-signed certificate and the git -c alias.signtool='!sh "/usr/src/build-extra/signtool.sh"' signtool $PWD/g.exe where The self-signed certificate was created via: $cert = New-SelfSignedCertificate -DnsName www.yourwebsite.com -Type CodeSigning -CertStoreLocation Cert:\CurrentUser\My
$CertPassword = ConvertTo-SecureString -String "my_passowrd" -Force -AsPlainText
Export-PfxCertificate -Cert (Get-ChildItem Cert:\CurrentUser\My -CodeSigningCert)[0] -FilePath ".sig\codesign.p12" -Password $CertPassword
"my_passowrd" | Out-File -NoNewline -Encoding ascii ".sig\codesign.pass" In a second MinTTY, when I run
The full output of that
Note that the The really strange thing about this is that no Win32 process has ProcessId 8176 (remember: the MSYS2/Cygwin runtime maintains its own pid that is different from the actual Win32 ProcessId), but there is no such process when I look via The parent process (MSYS2/Cygwin pid 2128, corresponding to the Win32 ProcessId 6172) refers to the $ wmic process where 'CommandLine like "%signtool%"' get ProcessId,ParentProcessId,CommandLine
CommandLine ParentProcessId ProcessId
C:\git-sdk-arm64\clangarm64\bin\git.exe -c "alias.signtool=!sh \"/usr/src/build-extra/signtool.sh\"" signtool C:/git-sdk-arm64/usr/src/MINGW-packages/mingw-w64-git/g.exe 5652 6172
sh -c "sh \"/usr/src/build-extra/signtool.sh\" \"$@\"" "sh \"/usr/src/build-extra/signtool.sh\"" C:/git-sdk-arm64/usr/src/MINGW-packages/mingw-w64-git/g.exe 6172 6692
C:\Windows\System32\Wbem\wmic.exe process where "CommandLine like \"%signtool%\"" get ProcessId,ParentProcessId,CommandLine 6244 3288 The Win32 ProcessId of that Another strange thing is that the $ wmic process where processid=5652 get ProcessId,ParentProcessId,CommandLine
CommandLine ParentProcessId ProcessId
C:\git-sdk-arm64\usr\bin\bash.exe --login -i 8444 5652 |
Oh, I didn't know that bit, sorry 😅 I was just trying to be helpful 😬 I've been thinking about Anyway, sorry for the distraction. |
Well, here's a full procdump of a hung bash.exe, if anyone wants to try to learn anything from it.
Seems familiar... So yes, this does still duplicate with 3.4 (this is msys2-runtime-3.4.5-1) |
FWIW I could reproduce the hang on my Azure VM only once, and then no more :-( |
You might try putting the system under load while trying to reproduce it (trying to replicate my 'slower machines' observation). This was involved when I was trying to come up with a reproducer (that wound up finding that other bug), I was basically stressing fork/exec hoping it would eventually hang. |
Successful run: https://github.com/git-for-windows/git-for-windows-automation/actions/runs/3985052314