Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port git-artifacts pipeline to git-for-windows-automation #22

Merged
merged 3 commits into from
Feb 11, 2023
Merged

Conversation

dennisameling
Copy link
Contributor

@dennisameling dennisameling commented Jan 22, 2023

@dennisameling dennisameling self-assigned this Jan 22, 2023
@dennisameling dennisameling changed the title [WIP] Port git-artifacts pipeline to git-for-windows-automation Port git-artifacts pipeline to git-for-windows-automation Jan 23, 2023
@dennisameling dennisameling marked this pull request as ready for review January 23, 2023 08:57
.github/workflows/git-artifacts.yml Outdated Show resolved Hide resolved
.github/workflows/git-artifacts.yml Outdated Show resolved Hide resolved
.github/workflows/git-artifacts.yml Show resolved Hide resolved
.github/workflows/git-artifacts.yml Show resolved Hide resolved
@dennisameling dennisameling requested a review from dscho January 23, 2023 09:04
@dennisameling dennisameling changed the title Port git-artifacts pipeline to git-for-windows-automation Port git-artifacts pipeline to git-for-windows-automation Jan 23, 2023
@dennisameling dennisameling force-pushed the git-artifacts branch 3 times, most recently from d76a960 to 100ee78 Compare January 23, 2023 09:32
.github/workflows/git-artifacts.yml Outdated Show resolved Hide resolved
.github/workflows/git-artifacts.yml Outdated Show resolved Hide resolved
.github/workflows/git-artifacts.yml Show resolved Hide resolved
@dscho
Copy link
Member

dscho commented Feb 15, 2023

@dennisameling after thinking about this, I believe that it makes most sense to add that functionality to the workflow instead of keeping it only in a topic branch (and then essentially dropping it via -s ours). I.e. I want to introduce a new input that takes the name of an existing tag and another one for the build-extra revision to use. I'll get to it immediately and report back.

@dscho
Copy link
Member

dscho commented Feb 15, 2023

I'll get to it immediately and report back.

Here is the PR, and here is the workflow run that I just started to build the Git for Windows v2.39.2 artifacts targeting ARM64.

@dscho
Copy link
Member

dscho commented Feb 16, 2023

@dennisameling I experienced the same hang that you reported here. I connected via RDP and investigated a little bit. I did not come terribly far, as it was late at night and eventually I had to give up and forgot to write down the details.

But here is what I remember: there were two makepkg-mingw processes, one being the child of the other one, and when I attached to the child process using gdb I saw four threads but somehow gdb was unhappy with the frame pointer and therefore I could not see any stacktraces.

Since the makepkg-mingw program is actually a shell script, I am fairly certain that the child process (which was reported by wmic process to have the exact same command-line as its parent process) was essentially a subshell, probably executing git rev-parse --short HEAD.

On a hunch that the subshell was stuck waiting for something after doing its job, e.g. a signal or something similar, I asked gdb to force-stop ("kill", I don't want to know who found such a nomenclature sensible shudder) that child process, after the workflow had run for around 20 minutes and hung for a good while. And don't you know, this "fixed" it, and it continued! You can see it here (you have to click on the sprocket wheel next to the "Search logs" text box and enable "Show timestamps"): the pkgver step was started at Wed, 15 Feb 2023 22:57:53 GMT and should not have taken more than a couple of seconds, yet the workflow run only continued at Wed, 15 Feb 2023 23:39:18 GMT (i.e. over 40 minutes later), after I had stopped that child process forcefully.

Unfortunately, I had forgotten that the git-artifacts workflow insists on making a MINGW-packages commit (which it cannot do because we're not actually modifying PKGBUILD), and the build failed at that point, even if it had successfully built the Pacman packages.

@dscho
Copy link
Member

dscho commented Feb 16, 2023

It's hanging again. Let me document better what I did. First of all, I RDP'ed into the machine, then opened a Git Bash. In that Bash, I looked for the process that would be hanging by first calling wmic process where 'ExecutablePath like "%git%"' get ProcessId,ParentProcessId,CommandLine, seeing that it's probably the signtool script, and then calling:

$ wmic process where 'CommandLine like "%sign%"' get ProcessId,ParentProcessId,CommandLine
CommandLine                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            ParentProcessId  ProcessId
D:\git-sdk-arm64-build-installers\usr\bin\make.exe -f ../mingw-w64-git.mak sign-executables                                                                                                                                                                                                                                                                                                                                                                                                                                                            4408             6100
D:\git-sdk-arm64-build-installers\usr\bin\sh.exe -c "eval git --git-dir=D:/a/_work/git-for-windows-automation/git-for-windows-automation/.git signtool headless-git.exe git-daemon.exe git-http-backend.exe git-imap-send.exe git-sh-i18n--envsubst.exe git-shell.exe git-http-fetch.exe git-http-push.exe git-remote-http.exe git-remote-https.exe git-remote-ftp.exe git-remote-ftps.exe \    contrib/credential/wincred/git-credential-wincred.exe git.exe \ cmd/git{,-gui,k}.exe compat-bash.exe git-{bash,cmd,wrapper}.exe"                          1872             2684
D:\git-sdk-arm64-build-installers\usr\bin\sh.exe -c "eval git --git-dir=D:/a/_work/git-for-windows-automation/git-for-windows-automation/.git signtool headless-git.exe git-daemon.exe git-http-backend.exe git-imap-send.exe git-sh-i18n--envsubst.exe git-shell.exe git-http-fetch.exe git-http-push.exe git-remote-http.exe git-remote-https.exe git-remote-ftp.exe git-remote-ftps.exe \    contrib/credential/wincred/git-credential-wincred.exe git.exe \ cmd/git{,-gui,k}.exe compat-bash.exe git-{bash,cmd,wrapper}.exe"                          2684             5676
D:\git-sdk-arm64-build-installers\clangarm64\bin\git.exe --git-dir=D:/a/_work/git-for-windows-automation/git-for-windows-automation/.git signtool headless-git.exe git-daemon.exe git-http-backend.exe git-imap-send.exe git-sh-i18n--envsubst.exe git-shell.exe git-http-fetch.exe git-http-push.exe git-remote-http.exe git-remote-https.exe git-remote-ftp.exe git-remote-ftps.exe contrib/credential/wincred/git-credential-wincred.exe git.exe cmd/git.exe cmd/git-gui.exe cmd/gitk.exe compat-bash.exe git-bash.exe git-cmd.exe git-wrapper.exe  5676             5200
sh -c "sh \"/usr/src/build-extra/signtool.sh\" \"$@\"" "sh \"/usr/src/build-extra/signtool.sh\"" headless-git.exe git-daemon.exe git-http-backend.exe git-imap-send.exe git-sh-i18n--envsubst.exe git-shell.exe git-http-fetch.exe git-http-push.exe git-remote-http.exe git-remote-https.exe git-remote-ftp.exe git-remote-ftps.exe contrib/credential/wincred/git-credential-wincred.exe git.exe cmd/git.exe cmd/git-gui.exe cmd/gitk.exe compat-bash.exe git-bash.exe git-cmd.exe git-wrapper.exe                                                   5200             5024
C:\Windows\System32\Wbem\wmic.exe process where "CommandLine like \"%sign%\"" get ProcessId,ParentProcessId,CommandLine                                                                                                                                                                                                                                                                                                                                                                                                                                10112            10016

So indeed, there were quite a few, and the process with ID 5024 seemed to be the bottom-most one. To verify, I called:

$ wmic process where 'ParentProcessId=5024' get ProcessId,ParentProcessId,CommandLine
No Instance(s) Available.

Okay. Now on to debug. For that, I need gdb.exe. So I cloned the entire SDK using the command-line git -C /d/ clone --depth=1 https://github.com/git-for-windows/git-sdk-arm64 in the Git Bash. After that, I opened a Git SDK Bash by pressing the Windows button and then typing D:\git-sdk-arm64\git-bash.exe Enter.

Unfortunately, that does not work because of the OpenSSL situation where Git for Windows stays with v1.1.1* as long as possible while MSYS2 already switched to v3.*:

$ gdb.exe
D:/git-sdk-arm64/usr/bin/gdb.exe: error while loading shared libraries: msys-python3.11.dll: cannot open shared object file: No such file or directory

So I downloaded MSYS2's python package via curl -LO https://repo.msys2.org/msys/x86_64/python-3.11.2-1-x86_64.pkg.tar.zst and then installed it via pacman -U ./python-3.11.2-1-x86_64.pkg.tar.zst. This let me start gdb.exe bash.exe 5024 (that stanza asks gdb to load the gdb.exe binary and attach to the process with ID 5024).

In that GDB session, I then tried to see the current threads:

(gdb) info thread
  Id   Target Id          Frame
  1    Thread 5024.0x10fc error return ../../gdb-11.1/gdb/windows-nat.c:609 was 31
0x0000000000000000 in ?? ()
  2    Thread 5024.0x1978 0x00007fff30ad6a64 in ?? ()
  3    Thread 5024.0x18a8 0x00007fff30ad69c4 in ?? ()
  4    Thread 5024.0x6a8  error return ../../gdb-11.1/gdb/windows-nat.c:609 was 31
0x0000000000000000 in ?? ()
  5    Thread 5024.0x1468 0x00007fff30ad69c4 in ?? ()
* 6    Thread 5024.0x2058 0x00007fff30ae2374 in ?? ()

Not very informative, so let's try to see the current thread's stacktrace:

(gdb) bt
#0  0x00007fff30ae2374 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

That's it.

My next attempt involved downloading Sysinternals' Process Explorer. That gave me more information. Apparently there is one thread with this stack trace:

ntoskrnl.exe!ZwUpdateWnfStateData+0xaf60
ntoskrnl.exe!KeUpdateThreadTag+0xdc98
ntoskrnl.exe!KiDispatchInterrupt+0x7298
ntoskrnl.exe!KeWaitForMutexObject+0x440
ntoskrnl.exe!KeUpdateThreadTag+0xa754
ntoskrnl.exe!KiDeliverApc+0x18c
ntoskrnl.exe!KeUpdateThreadTag+0xdd1c
ntoskrnl.exe!KiDispatchInterrupt+0x7298
ntoskrnl.exe!KeWaitForMutexObject+0x440
ntoskrnl.exe!ObWaitForSingleObject+0xb4
ntoskrnl.exe!NtWaitForSingleObject+0x34
ntoskrnl.exe!ZwUpdateWnfStateData+0xadf0
ntoskrnl.exe!ZwUpdateWnfStateData+0xa9e8
!ZwWaitLowEventPair+0x158284
!ZwWaitLowEventPair+0x19d190
!ZwWaitLowEventPair+0x1950a8
!ZwWaitLowEventPair+0x1a9ad0
!ZwWaitLowEventPair+0x290f80
!ZwWaitLowEventPair+0x29ff14
!_assert+0x41c6

So that looks as if it called _assert(). And another one with this stack trace:

ntoskrnl.exe!ZwUpdateWnfStateData+0xaf60
ntoskrnl.exe!KeUpdateThreadTag+0xdc98
ntoskrnl.exe!KiDispatchInterrupt+0x7298
ntoskrnl.exe!KeWaitForMutexObject+0x440
ntoskrnl.exe!FsRtlNotifyVolumeEventEx+0x4470
ntoskrnl.exe!NtQueryVolumeInformationFile+0xfe0
ntoskrnl.exe!NtReadFile+0xb0
ntoskrnl.exe!ZwUpdateWnfStateData+0xadf0
ntoskrnl.exe!ZwUpdateWnfStateData+0xa9e8
!ZwWaitLowEventPair+0x158324
!ZwWaitLowEventPair+0x178bc8
!PackageSidFromProductId+0x2318c0
!PackageSidFromProductId+0x26a360
!sigfillset+0x2d61

This looks as if it was inside the sigfillset() function. But that does not make too much sense because that function does not call any other function.

The other threads' stack trace can either not be accessed or look like they're completely outside of Bash's and MSYS2 runtime's source code:

ntoskrnl.exe!ZwUpdateWnfStateData+0xaf60
ntoskrnl.exe!KeUpdateThreadTag+0xdc98
ntoskrnl.exe!KiDispatchInterrupt+0x7298
ntoskrnl.exe!KeWaitForMutexObject+0x440
ntoskrnl.exe!KeUpdateThreadTag+0xa754
ntoskrnl.exe!KiDeliverApc+0x18c
ntoskrnl.exe!KeUpdateThreadTag+0xdd1c
ntoskrnl.exe!KiDispatchInterrupt+0x7298
ntoskrnl.exe!KeDelayExecutionThread+0x614
ntoskrnl.exe!PsWow64IsMachineSupported+0x2d04
ntoskrnl.exe!ZwUpdateWnfStateData+0xadf0
ntoskrnl.exe!ZwUpdateWnfStateData+0xa9e8
!ZwWaitLowEventPair+0x1616f4
!ZwWaitLowEventPair+0x1eea48
!ZwWaitLowEventPair+0x1edea8
!ZwWaitLowEventPair+0x1ebe40
!ZwWaitLowEventPair+0x18495c
!ZwWaitLowEventPair+0x187a54
!ZwWaitLowEventPair+0x1ac658
!ZwWaitLowEventPair+0x1abc70
!ZwWaitLowEventPair+0x1ad458
!uaw_wcsrchr+0x81c70
!ZwWaitLowEventPair+0x1a9bc8

I'm out of ideas what to investigate on that side, so I downloaded Sysinternals' handle utility to see what handles are open in that process, but that also fails to shed any light into the issue:

$ ./handle64a.exe -p 5024

Nthandle v5.0 - Handle viewer
Copyright (C) 1997-2022 Mark Russinovich
Sysinternals - www.sysinternals.com

  180: Section       \BaseNamedObjects\msys-2.0S5-c2a8dd8be8845dc5\shared.5
  184: Section       \BaseNamedObjects\msys-2.0S5-c2a8dd8be8845dc5\S-1-5-20.1
  1DC: File  (RWD)   D:\git-sdk-arm64-build-installers\usr\src\MINGW-packages\mingw-w64-git\src\git
  1EC: Section       \BaseNamedObjects\msys-2.0S5-c2a8dd8be8845dc5\cygpid.10728
  2C0: Section       \BaseNamedObjects\msys-2.0S5-c2a8dd8be8845dc5\cygpid.10728

So for now I'll just call it yet another unexplained hang and try to let the workflow run continue, at least, by force-stopping that process: wmic process where ProcessID=5024 delete.

And sure enough, the workflow run continues...

@dscho
Copy link
Member

dscho commented Feb 16, 2023

Darn, darn, darn. Again I had to RDP into the runner and force-stop the signtool process in order to let it complete (toggle the timestamps to see the 24 minute gap where there should be less than 10 seconds).

This is not good. In this shape, we cannot publish a Git for Windows/ARM64. If it hangs for us so often, it will hang for plenty of users, too.

@dscho
Copy link
Member

dscho commented Feb 16, 2023

And it's hanging again, and also here. I worked around this by RDP'ing into the machines and running essentially:

candidates= &&
for pid in $(
    wmic process where 'ExecutablePath like "%arm64-build-installers%"' get ProcessId |
    sed 1d
)
do
    test "z${pid#[0-9]}" != "z$pid" || continue
    case "$(wmic process where parentprocessid=$pid get processid 2>&1 >/dev/null)" in
     "No Instance(s) Available"*)
         candidates="$candidates $pid"
         ;;
    esac
done &&
wmic process where processid=${candidates##* } delete

@dscho
Copy link
Member

dscho commented Feb 16, 2023

I reached out to #cygwin-devel on libera.chat, and @lazka pointed me to msys2/msys2-autobuild#62, noting:

it only happening for those two things and not in say autotools/configure/make seems to suggest it's not a general problem
(those two things being gpgme signature verification, and install-info)

And @jturney said:

"Not a known issue, but now we know about it.." 😉

@lazka
Copy link

lazka commented Feb 16, 2023

@jeremyd2019 tried to debug our hangs at one point

@jeremyd2019
Copy link

yeah, I couldn't get anywhere though, none of the debuggers I tried to attach were able to get the thread context for the 'main' thread. there was also another thread running which seemed to be for cygwin signal handling, which seemed to be happily waiting. I gave up trying to debug it and instead added workarounds to avoid situations where it seemed to happen. Now that somebody else is reproducing it, maybe it can be debugged.

Regarding your _assert and sigfillset observations, notice the offsets from the symbols. It seems unlikely that those are actually the functions involved, just the nearest symbols it could find.

@dscho
Copy link
Member

dscho commented Feb 16, 2023

Regarding your _assert and sigfillset observations, notice the offsets from the symbols. It seems unlikely that those are actually the functions involved, just the nearest symbols it could find.

Right, I feared so much.

Now that somebody else is reproducing it, maybe it can be debugged.

I fear that we'll have to get a much more reliable reproducer first.

Maybe the following will reproduce the hang reliably enough: calling the MINGW executable osslsigncode.exe from the MSYS2 Bash script signtool.sh, which in turn is called as a Git alias from the MINGW executable git.exe that is called from either an MSYS2 Bash script or MSYS2 make.exe.

At least that is what hung pretty reliably in those CI runs for me, in the last run it was every single time that call chain happened. But earlier runs did not hang after code-signing (even if they failed later on, but for independent reasons). The hang was at the end of the inner-most shell script just after all the code-signing was done.

@Alovchin91
Copy link

Alovchin91 commented Feb 17, 2023

Since you're RDP'ing into the machine anyway, could you maybe collect a full memory dump of a hanging process using procdump? Maybe that could shed some more light on what it's waiting for 🤔

@Alovchin91
Copy link

Alovchin91 commented Feb 17, 2023

... calling the MINGW executable osslsigncode.exe from the MSYS2 Bash script signtool.sh, which in turn is called as a Git alias from the MINGW executable git.exe ...

Hmm, so git.exe calls signtool.sh, which signs git.exe, produces git.signed.exe (according to the script), and then moves it in place of the original? Sounds like the script tries to overwrite git.exe while it's in process of executing the script. Or is it hanging while signing a different executable?

@dscho
Copy link
Member

dscho commented Feb 18, 2023

@Alovchin91 it's different git.exe files. The running one is /clangarm64/bin/git.exe, and the signed one is /usr/src/MINGW-packages/src/git/git.exe. So that's not the problem.

@dscho
Copy link
Member

dscho commented Feb 18, 2023

Since you're RDP'ing into the machine anyway, could you maybe collect a full memory dump of a hanging process using procdump? Maybe that could shed some more light on what it's waiting for 🤔

@Alovchin91 I am curious what you suggest to do with that dump. Sure, I can collect that information, but what does that buy us that the previous investigation (#1 and #2) didn't reveal yet?

@dennisameling @jeremyd2019 did any of you manage to reproduce this in a non-CI setting, i.e. when starting a command interactively? I tried and failed to cause such a hang using interactive commands in an Azure VM, and would love to do that to accelerate the investigation.

@Alovchin91
Copy link

Alovchin91 commented Feb 18, 2023

@Alovchin91 I am curious what you suggest to do with that dump. Sure, I can collect that information, but what does that buy us that the previous investigation (#1 and #2) didn't reveal yet?

That would give some starting point to those who would like to help but cannot reproduce. From your very next sentence I understand that reproducing the issue is not a trivial task 😅 I've asked around on Mastodon if somebody (from Microsoft?) could help. That, of course, if you're looking for any help 😊

@jeremyd2019
Copy link

jeremyd2019 commented Feb 18, 2023

The hangs that I experienced with pacman (both signature verification and info database updates) would happen frequently while updating/installing packages interactively. I know I did try the process dump option in task manager, but that dump was also 'missing' the context for the main thread.

From my anecdotal observations, I think this seems more likely to occur on slower machines (happened all the time on the raspberry pi, but was more rare on the QC710), but perhaps more likely with more cores? (I was playing with a 60 core VM and I think it was more likely there than with 8 core QC710).

I am set up to run a KVM virtual machine on a raspberry pi, I can try to reproduce the hangs I got with pacman and try things, if anybody has any ideas to try (I can try procdump and post that somewhere if somebody thinks it would help, for example).

My workarounds for the pacman hangs I saw, which seem quite effective for building packages for msys2 project:
https://github.com/jeremyd2019/winautoconfig/blob/544a3c019fcbd8955e91af3d4a86e12e3d1b6a49/msys2-runner-setup/setupscripts/setup.cmd#L28-L31

@dscho
Copy link
Member

dscho commented Feb 18, 2023

The hangs that I experienced with pacman (both signature verification and info database updates) would happen frequently while updating/installing packages interactively.

@jeremyd2019 did you find a minimal reproducer there, i.e. ideally a command that only verifies the signature but does not necessarily update any package?

Also, do you happen to know whether the signature verification happens to use a MINGW program to perform that verification (e.g. /clangarm64/bin/gpg.exe)? In my cases, the hang always happened in the MSYS process associated with spawning a non-MSYS program...

@dscho
Copy link
Member

dscho commented Feb 18, 2023

@Alovchin91 the thing that makes me most dubious about the procdump suggestion is that we already attached to the process, using gdb.exe, and that neither procdump nor any other regular Win32 tools have any idea about the MSYS2 runtime or about GCC's DWARF debug symbols and are therefore even less likely than gdb.exe to show any useful information.

And FWIW there is already somebody from Microsoft helping this here ticket: me.

@jeremyd2019
Copy link

@jeremyd2019 did you find a minimal reproducer there, i.e. ideally a command that only verifies the signature but does not necessarily update any package?

No, when I tried I ended up reproducing a different hang that also happened on x64, and was able to be debugged and was fixed in Cygwin, so it wasn't a complete loss, but after that was fixed my minimal reproducer no longer reproduced the hang.

Also, do you happen to know whether the signature verification happens to use a MINGW program to perform that verification (e.g. /clangarm64/bin/gpg.exe)? In my cases, the hang always happened in the MSYS process associated with spawning a non-MSYS program...

No, it uses msys2 gpg, via gpgme.

@jeremyd2019
Copy link

jeremyd2019 commented Feb 18, 2023

@Alovchin91 the thing that makes me most dubious about the procdump suggestion is that we already attached to the process, using gdb.exe, and that neither procdump nor any other regular Win32 tools have any idea about the MSYS2 runtime or about GCC's DWARF debug symbols and are therefore even less likely than gdb.exe to show any useful information.

I tried gdb, windbg, the task manager process dump thing, even local kernel debugging, but was never able to get user mode context for that thread.

I think I heard that windbgx (the new version in the store) now supports DWARF symbols, but I haven't verified that myself, and I don't know if it would understand the split debug info that msys2-runtime uses. (does that even work right with 3.4? I thought I saw some additional stripping happening in msys2-runtime go by at one point)

@dscho
Copy link
Member

dscho commented Feb 18, 2023

when I tried I ended up reproducing a different hang that also happened on x64, and was able to be debugged and was fixed in Cygwin, so it wasn't a complete loss, but after that was fixed my minimal reproducer no longer reproduced the hang.

@jeremyd2019 that's interesting. It probably means that the hangs are related, at least it could give us a better idea which code path to look at. Would you happen to know the commit hash of the fix in Cygwin?

I think I heard that windbgx (the new version in the store) now supports DWARF symbols, but I haven't verified that myself, and I don't know if it would understand the split debug info that msys2-runtime uses.

Indeed. That's why I think attaching with gdb.exe is our best bet to get any useful stack trace.

But I'm starting to believe that this is a dead end, we might never get a useful stack trace because the issue is more likely that something in MSYS2's tear-down code is failing to tear down the signal event handler thread, and that one is blocking the correct and expected termination of the process.

So I fear that we'll need to work with heavy instrumentation ("debug small_printf() messages littered all over the tear-down code path") and then compare the logs of a hanging to a non-hanging invocation of any minimal reproducer that we can come up with.

@dscho
Copy link
Member

dscho commented Feb 18, 2023

Would you happen to know the commit hash of the fix in Cygwin?

@jeremyd2019 this might actually help address the hangs I observed, as Git for Windows is still on the Cygwin v3.3.* train.

@jeremyd2019
Copy link

jeremyd2019 commented Feb 18, 2023

msys2/msys2-runtime@0ce992c. This was actually a crash, but the code was in such a state that it resulted in a hang in the exception handlers.

@dscho
Copy link
Member

dscho commented Feb 18, 2023

@jeremyd2019 thank you. Unfortunately my hope that this could help Git for Windows is dispelled by the fact that this commit made it into v3.3.4, and we're on v3.3.6 already, so we have that fix...

@dscho
Copy link
Member

dscho commented Feb 18, 2023

Update: I managed to reproduce the hang interactively, in my Azure VM, not on the first attempt, but the second. With a self-signed certificate and the /clangarm64/bin/git.exe of git-for-windows/git-sdk-arm64, I ran this:

git -c alias.signtool='!sh "/usr/src/build-extra/signtool.sh"' signtool $PWD/g.exe

where /usr/src/build-extra/signtool.sh comes from here and g.exe was the copy of a just-built git.exe (and now-already signed, due to the first attempt).

The self-signed certificate was created via:

$cert = New-SelfSignedCertificate -DnsName www.yourwebsite.com -Type CodeSigning -CertStoreLocation Cert:\CurrentUser\My
$CertPassword = ConvertTo-SecureString -String "my_passowrd" -Force -AsPlainText
Export-PfxCertificate -Cert (Get-ChildItem Cert:\CurrentUser\My -CodeSigningCert)[0] -FilePath ".sig\codesign.p12" -Password $CertPassword
"my_passowrd" |  Out-File -NoNewline -Encoding ascii ".sig\codesign.pass"

In a second MinTTY, when I run ps, I see this line:

     2129       1    2128       8176  pty0      197108 20:16:51 /usr/bin/sh <defunct>

The full output of that ps invocation is:

      PID    PPID    PGID     WINPID   TTY         UID    STIME COMMAND
     2021       1    2021      10056  ?         197108 20:14:27 /usr/bin/mintty
     2129       1    2128       8176  pty0      197108 20:16:51 /usr/bin/sh <defunct>
     2133    2132    2133       4132  pty1      197108 20:17:21 /usr/bin/bash
     2132       1    2132       9120  ?         197108 20:17:20 /usr/bin/mintty
     2217    2133    2217       1320  pty1      197108 20:28:19 /usr/bin/ps
     2022    2021    2022       8444  pty0      197108 20:14:27 /usr/bin/bash
     2128    2022    2128       6172  pty0      197108 20:16:51 /clangarm64/bin/git

Note that the pty0 processes refer to the hanging session, the pty1 processes are the ones running in the second MinTTY window where I run ps.

The really strange thing about this is that no Win32 process has ProcessId 8176 (remember: the MSYS2/Cygwin runtime maintains its own pid that is different from the actual Win32 ProcessId), but there is no such process when I look via wmic process where parentprocessid=8176 get ProcessId,ParentProcessId,CommandLine.

The parent process (MSYS2/Cygwin pid 2128, corresponding to the Win32 ProcessId 6172) refers to the git.exe process that I called, i.e. that sh should be the one that ran the signtool.sh script. But when I look for such a process, I see a different Win32 ProcessId:

$ wmic process where 'CommandLine like "%signtool%"' get ProcessId,ParentProcessId,CommandLine
CommandLine                                                                                                                                                                ParentProcessId  ProcessId
C:\git-sdk-arm64\clangarm64\bin\git.exe -c "alias.signtool=!sh \"/usr/src/build-extra/signtool.sh\"" signtool C:/git-sdk-arm64/usr/src/MINGW-packages/mingw-w64-git/g.exe  5652             6172
sh -c "sh \"/usr/src/build-extra/signtool.sh\" \"$@\"" "sh \"/usr/src/build-extra/signtool.sh\"" C:/git-sdk-arm64/usr/src/MINGW-packages/mingw-w64-git/g.exe               6172             6692
C:\Windows\System32\Wbem\wmic.exe process where "CommandLine like \"%signtool%\"" get ProcessId,ParentProcessId,CommandLine                                                6244             3288

The Win32 ProcessId of that sh process is 6692, and there is no line in the ps output corresponding to that process.

Another strange thing is that the ps output says that the parent process of that git.exe process has Win32 ProcessId 8444, but the wmic output suggests that the parent process has Win32 ProcessId 5652. So I looked for that process and it seems that it is running, with the parent process being the one with Win32 ProceessId 8444 and the same cmdline as that:

$ wmic process where processid=5652 get ProcessId,ParentProcessId,CommandLine
CommandLine                                   ParentProcessId  ProcessId
C:\git-sdk-arm64\usr\bin\bash.exe --login -i  8444             5652

@Alovchin91
Copy link

Alovchin91 commented Feb 18, 2023

And FWIW there is already somebody from Microsoft helping this here ticket: me.

Oh, I didn't know that bit, sorry 😅 I was just trying to be helpful 😬

I've been thinking about procdump because it can create a full dump. AFAIK task manager creates a minidump. My understanding is that it can be analysed by any tool afterwards. But that was just a suggestion and I don't think it's useful anymore 😀

Anyway, sorry for the distraction.

@jeremyd2019
Copy link

jeremyd2019 commented Feb 18, 2023

Well, here's a full procdump of a hung bash.exe, if anyone wants to try to learn anything from it.

bash.exe_230218_214907.dmp.gz

WARNING: Teb 0 pointer is NULL - defaulting to 00000000`7ffde000
WARNING: 00000000`7ffde000 does not appear to be a TEB
WARNING: Thread 1bd0 context retrieval failure during dump writing, Win32 error 0n31

Seems familiar... So yes, this does still duplicate with 3.4 (this is msys2-runtime-3.4.5-1)

@dscho
Copy link
Member

dscho commented Feb 18, 2023

FWIW I could reproduce the hang on my Azure VM only once, and then no more :-(

@jeremyd2019
Copy link

You might try putting the system under load while trying to reproduce it (trying to replicate my 'slower machines' observation). This was involved when I was trying to come up with a reproducer (that wound up finding that other bug), I was basically stressing fork/exec hoping it would eventually hang.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants