-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests timing out with incomplete log #35752
Comments
The log is incomplete. This is likely combination of #35451 and infrastructure problem (machine getting recycled or something similar). |
Tagging subscribers to this area: @safern, @ViktorHofer |
I think when the work item exceeds the helix timeout which is 15 mins helix kills the process, that’s why the log is incomplete, but @MattGal can confirm. |
Yes. This is a timeout. I've had the conversation several times recently, and after investigation we determined that we would have to change the Helix API to make this more obvious, as adding a new enum would break clients. If you just delete "/console" from that URI you get the other log: ... which contains:
I sympathize with how this isn't handing you the information you need as directly as you'd like, but if you fix your hangs / slow tests and/or increase your timeouts this will go away. |
Well, we cannot fix our hangs unless we know where they are. We need process dumps taken when things are killed due to timeouts like this. |
@MattGal @davidfowl I believe helix now allows to collect dumps on hangs, is that right? |
Helix handles setting up its clients in ways that dumps get created and stored to a common folder, and uploading them alongside the results of a run. It does not handle making hang dumps directly because this is something the work item itself needs to have some special knowledge of, otherwise the dumps all end up being the entry point which is usually a script interpreter (cmd.exe / bash) and not at all useful for the person trying to debug the hang. This comes from experience of folks trying to get this sort of thing (e.g. "dumpling"). @davidfowl was working on a way for work items to do this within the context of the work item; I'll defer to him as to where we're at with that work. |
Will be fixed with #39923. |
I am getting another kind of timeout from
and the executing DevOps agent reports:
@ViktorHofer, will #39923 also cover these kind of timeouts to help getting info on the rootcause (info like, which test is taking forever)? |
Yes, that PR will abort long running tests and create a dump. In this case, the timeout is likely an infra issue where not enough Helix clients are available to run the tests in time. |
https://dev.azure.com/dnceng/public/_build/results?buildId=627022&view=ms.vss-test-web.build-test-results-tab&runId=19596708&resultId=183795&paneView=debug
https://helix.dot.net/api/2019-06-17/jobs/6bdbf48d-5deb-436c-8f91-8f96823af464/workitems/System.Runtime.Extensions.Tests/console
Failed in:
#35169
Configuration: netcoreapp5.0-Windows_NT-Debug-x64-CoreCLR_release-Windows.7.Amd64
cc: @jkotas @stephentoub
The text was updated successfully, but these errors were encountered: