-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test failure : System.Diagnostics.Tests.ProcessTests.ProcessNameMatchesScriptName #13757
Comments
Failed again in https://dev.azure.com/dnceng/public/_build/results?buildId=508452&view=logs&jobId=79ac38c7-786b-5db2-e2cf-2984d488b711. Configuration:
|
Configuration:
|
Failed again, from console logs:
|
@tmds any idea why we're occasionally not successfully getting a process name on Linux? |
Do we have some data on how long the test runs in case it fails? In particular: is it about the duration of the script that we're running as the child process?
|
@dotnet/area-system-diagnostics-process this issue hit the rolling builds today, specifically the
|
To answer @tmds question, these seem to be finishing quickly: TestResults
| join kind=inner WorkItems on WorkItemId
| join kind=inner Jobs on JobId
| where Method == "ProcessNameMatchesScriptName"
| where Finished >= now(-30d)
| where Result == "Fail"
| where Message startswith "Assert.Equal() Failure" //and Message contains "···"
| project Type, Method,
Pipeline = tostring(parse_json(Properties).DefinitionName),//WorkItemFriendlyName,
Pipeline_Configuration = tostring(parse_json(Properties).configuration),
OS = QueueName,
Arch = tostring(parse_json(Properties).architecture),
// Test = Type1,
//Result,
Finished,
Build,//, WorkItemFriendlyName, WorkItemName,
Duration,
// Method,
// Build = tostring(parse_json(Properties).BuildNumber),
Message//,//
,StackTrace
|
@tmds any other idea here? Not sure how to look next. |
Failing about every 2 days. @agocke @steveisok what is "windows.10.amd64.android.open" ? From the message, it's clearly not Windows. TestResults
| join kind=inner WorkItems on WorkItemId
| join kind=inner Jobs on JobId
| where Method == "ProcessNameMatchesScriptName"
| where Finished >= now(-30d)
| where Result == "Fail"
| where Message startswith "Assert.Equal() Failure" //and Message contains "···"
| project Type, Method,
Pipeline = tostring(parse_json(Properties).DefinitionName),//WorkItemFriendlyName,
Pipeline_Configuration = tostring(parse_json(Properties).configuration),
OS = QueueName,
Arch = tostring(parse_json(Properties).architecture),
// Test = Type1,
//Result,
Finished,//,Message,
Build,//, WorkItemFriendlyName, WorkItemName,
Duration//,
// Method,
// Build = tostring(parse_json(Properties).BuildNumber),
//,StackTrace
|
My guess was this is the Android emulation layer in Windows 10, but I don't understand why we would be running coreclr in that environment. |
The queue is for Android devices. They are tethered to a windows machine. @directhex do you have any insight on why this may be flaky on Android? |
From the tables that got shared, the distros seem to be 'old'. Does it happen on newer kernels? |
Assuming I'm doing this right, mostly the tests run in containers on the same kernel, either Ubuntu 18.04 or 16.04. @MattGal @agocke am I reading this right? Have we considered running containers on newer kernels? Also, do we use eg Ubuntu 18.04 with its original kernel (4.15) or do we use the refreshed versions that pick up newer kernels (https://ubuntu.com/kernel/lifecycle) TestResults
| join kind=inner WorkItems on WorkItemId
| join kind=inner Jobs on JobId
| where Method == "ProcessNameMatchesScriptName"
| where Finished >= now(-30d)
| extend IsOneOfTheseFailures = (Result == "Fail" and Message startswith "Assert.Equal() Failure")
| project IsOneOfTheseFailures,Type, Method,
Pipeline = tostring(parse_json(Properties).DefinitionName),//WorkItemFriendlyName,
Pipeline_Configuration = tostring(parse_json(Properties).configuration),
Container = tostring(parse_json(Properties).operatingSystem),
Kernel = QueueName,
Arch = tostring(parse_json(Properties).architecture),
// Test = Type1,
//Result,
Finished,
Build,//, WorkItemFriendlyName, WorkItemName,
Duration,
// Method,
// Build = tostring(parse_json(Properties).BuildNumber),
Message//,//
,StackTrace
| summarize count() by Kernel, Container, IsOneOfTheseFailures
|
That's correct; My understanding is the choice is yours, you could instead be using
We strive to stay on the latest version published in the public Azure gallery; currently that's 18.04.202205270, or May 27th, so fairly fresh. |
OK, opened an issue, and we can throw another kernel version into our mix.. |
Failed again in https://github.com/dotnet/runtime/pull/71013/checks?check_run_id=6972717112:
|
Frequency updated in top post ... lately it is 1 hit per week (in average 1-2 hits per week). Removing blocking-clean-ci label now. |
Failed in #73912 in NativeAOT run on arm64
|
And here - failed again:
This is on |
Is it happening at all on Ubuntu 2204 (newer kernel)? |
It's possible this fails a lot more often with NativeAOT. I've seen it fail in several of my PR's as well. Or we just send it to a different Helix queue for NativeAOT testing. But with NativeAOT it reproes in a significant percentage of runs. I would estimate one out of five runs. |
Frequency:
Job:
coreclr-corefx-jitstress:20191106.1
Details:
https://helix.dot.net/api/2019-06-17/jobs/dc1d0990-ba17-4d58-bba6-42bfb71b843b/workitems/System.Diagnostics.Process.Tests/console
OS & Arch:
Linux x64
Mode:
export COMPlus_TieredCompilation=0
export COMPlus_DbgEnableMiniDump=1
export COMPlus_DbgMiniDumpName=$HELIX_DUMP_FOLDER/coredump.%d.dmp
export COMPlus_JitStress=2
Log:
The text was updated successfully, but these errors were encountered: