XUnitLogChecker causes GCStress test failures #82143

BruceForstall · 2023-02-15T01:03:28Z

https://dev.azure.com/dnceng-public/public/_build/results?buildId=171904&view=ms.vss-test-web.build-test-results-tab&runId=3436659&resultId=112296&paneView=debug

It appears that a test that is excluded from GCStress causes the new XUnitLogChecker (#80751) to report a test failure.

e.g.,

+ JIT/HardwareIntrinsics/HardwareIntrinsics_ro/HardwareIntrinsics_ro.sh
SKIPPING EXECUTION BECAUSE DOTNET_GCStress IS SET
+ dotnet /root/helix/work/correlation/XUnitLogChecker/XUnitLogChecker.dll JIT/HardwareIntrinsics/HardwareIntrinsics_ro HardwareIntrinsics_ro
[XUnitLogChecker]: No logs were found. Something went very wrong with this item...
[XUnitLogChecker]: Expected log name: 'HardwareIntrinsics_ro.tempLog.xml'
+ export _commandExitCode=254

@ivdiazsa @trylek @hoyosjs

Thus, all the HardwareIntrinsics tests are marked as failing, for example.

The text was updated successfully, but these errors were encountered:

ghost · 2023-02-15T01:03:43Z

Tagging subscribers to this area: @hoyosjs
See info in area-owners.md if you want to be subscribed.

Issue Details

https://dev.azure.com/dnceng-public/public/_build/results?buildId=171904&view=ms.vss-test-web.build-test-results-tab&runId=3436659&resultId=112296&paneView=debug

It appears that a test that is excluded from GCStress causes the new XUnitLogChecker (#80751) to report a test failure.

e.g.,

+ JIT/HardwareIntrinsics/HardwareIntrinsics_ro/HardwareIntrinsics_ro.sh
SKIPPING EXECUTION BECAUSE DOTNET_GCStress IS SET
+ dotnet /root/helix/work/correlation/XUnitLogChecker/XUnitLogChecker.dll JIT/HardwareIntrinsics/HardwareIntrinsics_ro HardwareIntrinsics_ro
[XUnitLogChecker]: No logs were found. Something went very wrong with this item...
[XUnitLogChecker]: Expected log name: 'HardwareIntrinsics_ro.tempLog.xml'
+ export _commandExitCode=254

@ivdiazsa @trylek @hoyosjs

Author:	BruceForstall
Assignees:	ivdiazsa
Labels:	`area-Infrastructure-coreclr`, `blocking-clean-ci-optional`
Milestone:	-

jkoritzinsky · 2023-02-15T06:50:28Z

I guess this means that we're sending work items that don't do anything in GCStress? We should definitely add some condition to not construct no-op work items, but getting the lane clean is more important.

ivdiazsa · 2023-02-15T22:21:43Z

So, if I'm understanding correctly, there are certain conditions that make work items be skipped altogether, like DOTNET_GCStress being set for the Hardware Intrinsics ones. Is there a way to know this beforehand, so that we might avoid sending that work item to Helix altogether like Jeremy said?

trylek · 2023-02-15T22:25:48Z

I think you've nailed the problem exactly Ivan, at the time of generating Helix work items we don't yet know that the runtime cmd / sh script will filter all tests out in GC stress mode, while I agree it's somewhat wasteful I suspect it's a very exceptional case and my initial thinking is that we should just fix the tooling to tolerate it.

ivdiazsa · 2023-02-15T22:29:23Z

I think you've nailed the problem exactly Ivan, at the time of generating Helix work items we don't yet know that the runtime cmd / sh script will filter all tests out in GC stress mode, while I agree it's somewhat wasteful I suspect it's a very exceptional case and my initial thinking is that we should just fix the tooling to tolerate it.

I see. I added that check in hopes of also catching items that crashed badly enough that we ended up with nothing to work with. But if this can also happen on purpose, like in this example, I can remove it altogether and have the checker exit successfully in these cases as well.

trylek · 2023-02-15T22:32:02Z

Could we perhaps emit something like an empty canary item into the xml at startup? I agree with you that it would be unfortunate to conflate an early crash in the merged wrapper with this situation.

ivdiazsa · 2023-02-15T22:33:46Z

That could be a good suggestion Tomas. But could you explain in more detail what you mean by "an empty canary item into the xml at startup"?

trylek · 2023-02-15T22:37:29Z

Well, if I recall correctly, the xml is basically a header followed by a list of entries corresponding to the individual unit tests. I don't remember how exactly your xml validation logic works but I guess that putting something like an empty test entry at the top might let us communicate "OK, I, the merged wrapper, have successfully launched, emitted the xml file header and the initial entry and I'm now launching the first test so consider this xml valid despite the fact that it may contain no entries if all tests cases get filtered out. I am still somewhat unsure how that could happen though - it would mean that a particular test wrapper runs all test cases out-of-proc and each of them is marked as CGStressIncompatible, that sounds somewhat far-fetched. I'm right now somewhat swamped with other bugs but I can try to take a closer look tomorrow to see what is the exact situation where this is happening.

ivdiazsa · 2023-02-15T22:47:07Z

I think that could work but would require more effort. Here, the fixer is failing due to there not being any log at all. This is because the sh script is skipping everything entirely because of GCStress being there, which means the logic that writes the XML logs is not getting a chance to run in the first place. We would have to add some more logic that generates the empty log before this happens, and then make it collaborate with the usual log writer.

I think I can just remove this additional check in the fixer for now, just to keep our CI dirty for as little as possible, and then we can discuss more in-depth your proposed approach and also get @jkoritzinsky's input on the matter :)

BruceForstall added area-Infrastructure-coreclr blocking-clean-ci-optional Blocking optional rolling runs labels Feb 15, 2023

BruceForstall assigned ivdiazsa Feb 15, 2023

ghost added the untriaged New issue has not been triaged by the area owner label Feb 15, 2023

runfoapp bot mentioned this issue Feb 15, 2023

Infrastructure - Status/Health #702

Closed

ivdiazsa added this to the 8.0.0 milestone Feb 15, 2023

ghost removed the untriaged New issue has not been triaged by the area owner label Feb 15, 2023

ivdiazsa mentioned this issue Feb 15, 2023

Prevent XUnitLogChecker from failing if no logs are found #82212

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 15, 2023

ivdiazsa closed this as completed in #82212 Feb 16, 2023

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Feb 16, 2023

ghost locked as resolved and limited conversation to collaborators Mar 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XUnitLogChecker causes GCStress test failures #82143

XUnitLogChecker causes GCStress test failures #82143

BruceForstall commented Feb 15, 2023 •

edited

Loading

ghost commented Feb 15, 2023

jkoritzinsky commented Feb 15, 2023

ivdiazsa commented Feb 15, 2023

trylek commented Feb 15, 2023

ivdiazsa commented Feb 15, 2023 •

edited

Loading

trylek commented Feb 15, 2023 •

edited

Loading

ivdiazsa commented Feb 15, 2023

trylek commented Feb 15, 2023

ivdiazsa commented Feb 15, 2023

XUnitLogChecker causes GCStress test failures #82143

XUnitLogChecker causes GCStress test failures #82143

Comments

BruceForstall commented Feb 15, 2023 • edited Loading

ghost commented Feb 15, 2023

jkoritzinsky commented Feb 15, 2023

ivdiazsa commented Feb 15, 2023

trylek commented Feb 15, 2023

ivdiazsa commented Feb 15, 2023 • edited Loading

trylek commented Feb 15, 2023 • edited Loading

ivdiazsa commented Feb 15, 2023

trylek commented Feb 15, 2023

ivdiazsa commented Feb 15, 2023

BruceForstall commented Feb 15, 2023 •

edited

Loading

ivdiazsa commented Feb 15, 2023 •

edited

Loading

trylek commented Feb 15, 2023 •

edited

Loading