Try to make ProcessNameMatchesScriptName test more reliable #46292

adamsitnik · 2020-12-21T14:47:47Z

This very simple test can sometimes fail: #13757

I was not able to reproduce it, but based on the logs attached to the original issue I believe that the GetTestFileName might in theory be returning some non-ascii characters and Linux might not like it, so I've replaced it with const value

According to man docs the file might be empty if the process is a zombie (it has already exited but parent process has not acknowledged that yet) so I've added a check for that.

This read-only file holds the complete command line for the
process, unless the process is a zombie. In the latter case,
there is nothing in this file: that is, a read on this file
will return 0 characters.

I am not a Linux expert, but I guess that sleep 600 could become a zombie right away only if there is a custom alias for the sleep command. I don't know if it's possible, but I think that we can just use full path to be 100% sure we are calling the actual /bin/sleep.

Fixes #13757

@tmds is there any chance you could take a look? It's nothing urgent, I just want to get rid of unstable tests from my areas.

ghost · 2020-12-21T14:47:51Z

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

Issue Details

This very simple test can sometimes fail: #13757

I was not able to reproduce it, but based on the logs attached to the original issue I believe that the GetTestFileName might in theory be returning some non-ascii characters and Linux might not like it, so I've replaced it with const value

According to man docs the file might be empty if the process is a zombie (it has already exited but parent process has not acknowledged that yet) so I've added a check for that.

This read-only file holds the complete command line for the
process, unless the process is a zombie. In the latter case,
there is nothing in this file: that is, a read on this file
will return 0 characters.

I am not a Linux expert, but I guess that sleep 600 could become a zombie right away only if there is a custom alias for the sleep command. I don't know if it's possible, but I think that we can just use full path to be 100% sure we are calling the actual /bin/sleep.

Fixes #13757

@tmds is there any chance you could take a look? It's nothing urgent, I just want to get rid of unstable tests from my areas.

Author:	adamsitnik
Assignees:	-
Labels:	`area-System.Diagnostics.Process`, `test bug`
Milestone:	6.0.0

tmds · 2021-01-04T08:19:32Z

src/libraries/System.Diagnostics.Process/tests/ProcessTests.Unix.cs

@@ -159,9 +159,9 @@ public void ProcessStart_UseShellExecute_OnUnix_SuccessWhenProgramInstalled(bool
        [PlatformSpecific(~TestPlatforms.OSX)] // On OSX, ProcessName returns the script interpreter.
        public void ProcessNameMatchesScriptName()
        {
-            string scriptName = GetTestFileName();
+            const string scriptName = nameof(ProcessNameMatchesScriptName);


Unless the underlying filesystem imposes additional restrictions, Linux filenames can contain any bytes except \0 and /.
.NET assumes the bytes are UTF-8.
So, I think the filename is not an issue. If it were, we should make sure GetTestFileName returns a valid filename on Linux instead of doing something specific in this test.

I agree I do'nt see a problem with GetTestFileName(), and if it has a problem, it will be affecting other tests. Either way no change is needed here.

tmds · 2021-01-04T08:21:07Z

src/libraries/System.Diagnostics.Process/tests/ProcessTests.Unix.cs


-                    Assert.Equal(scriptName, process.ProcessName);
+                    // cmdLine can sometimes be null: https://github.com/dotnet/runtime/issues/13757
+                    // according to docs, it's when the process is a zombie: https://man7.org/linux/man-pages/man5/proc.5.html


The Assert that fails was added to understand better why ProcessName doesn't end up with the proper value. It verifies some expectations of the .NET implementation.

Can we verify the failing case has a zombie process? We can check /stat for the Z flag.

Very good point, will do

Thinking about it a bit more.

Probably this test fails when the Assert happens after the script has already exited. .NET will have reaped the process, and everything under /proc is gone.

This happens when sleep doesn't work, or when the asserts are delayed for 10 min. We can safely increase the 10 min timeout. And add an assert to check !process.HasExited.

This happens when sleep doesn't work

How would this be possible?

when the asserts are delayed for 10 min

this is unlikely, as we just start the process and execute asserts right away

everything under /proc is gone

Based on the CI logs it looks like not everything is gone. The previous line that asserts the content of /proc/{process.Id}/stat works just fine. Based on the available docs the only explanation I was able to find was the zombie process.

tmds · 2021-01-04T08:21:49Z

src/libraries/System.Diagnostics.Process/tests/ProcessTests.Unix.cs

                }
                finally
                {
-                    process.Kill();
-                    process.WaitForExit();
+                    if (!process.HasExited)


Why did you add this condition?

tmds · 2021-01-04T08:25:19Z

src/libraries/System.Diagnostics.Process/tests/ProcessTests.Unix.cs

            string filename = Path.Combine(TestDirectory, scriptName);
-            File.WriteAllText(filename, $"#!/bin/sh\nsleep 600\n"); // sleep 10 min.
+            File.WriteAllText(filename, $"#!/bin/sh\n/bin/sleep 600\n"); // sleep 10 min.


Is there a reason for changing the to explicitly using /bin/sleep? I don't think we care how the sleep is implemented.

I am not a Linux expert, but I guess that sleep 600 could become a zombie right away only if there is a custom alias for the sleep command. I don't know if it's possible, but I think that we can just use full path to be 100% sure we are calling the actual /bin/sleep.

I know on Alpine it's busybox, not an actual executable file, but it's symlinked to the same place so I guess that's OK:

LAPTOP-P6UJDVTA:~$ ls -l /bin/sleep lrwxrwxrwx 1 root root 12 Oct 21 09:23 /bin/sleep -> /bin/busybox

tmds · 2021-01-04T08:38:45Z

Adam, thanks for working through flaky tests.
I think the timeout race (#46292 (comment)) is the most likely cause for occasional failures of this test.

jeffhandley · 2021-02-13T02:17:54Z

@adamsitnik It looks like this PR is close to ready with just a little feedback left to address.

danmoseley · 2021-02-13T06:25:04Z

@adamsitnik looking at history I only see one failure
(@MattGal can you tell me why the query below does not show the failures reported in #13757 (comment) ? )

https://engsrvprod.kusto.windows.net/engineeringdata?query=H4sIAAAAAAAEAG2OQUvDQBCF74H8h7GXKERiUIqXePBQiFBRG%2fC83X022yS7YWZDEfzxZrWYi6dh3vc%2beA0kvEGmPgilyRcdvXXUWWcq6xyY3j13dcAg5N3fU5t%2fu09%2b%2f1Obb21i4dSCQRvrrLQw9FCR86fL69v1jbla%2bBah9YaqilYv7DVEntWArQq6hew02zHEYLUYv4OjkW2U7bOFPLJyuqWLmTA%2bpBinvi%2fu1uV9WQzgAzIqCtorQyNHaWR%2fhA60C4oDTH72c2o%2bR%2bTnZTm9TpgQN5QxE1GHGc6O7hpWGmmSJt%2bi3mERSAEAAA%3d%3d&web=0

Started	Branch	Type	Method	QueueName1	Message	StackTrace
2021-01-06 16:38:19.6450000	refs/pull/46634/merge	System.Diagnostics.Tests.ProcessTests	ProcessNameMatchesScriptName	ubuntu.1804.armarch.open	Assert.Contains() Failure\nNot found: (ProcessNameMatc)\nIn value: 374 (dotnet) R 23 1 1 0 -1 256 1 0 0 0 0 0 0 0 20 0 1 0 87610 4096 1 18446744073709551615 0 0 0 0 0 0 0 4096 0 0 0 0 17 5 0 0 0 0 0 0 0 0 0 0 0 0 0\n	at System.Diagnostics.Tests.ProcessTests.ProcessNameMatchesScriptName() in //src/libraries/System.Diagnostics.Process/tests/ProcessTests.Unix.cs:line 172 at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) in //src/mono/netcore/System.Private.CoreLib/src/System/Reflection/RuntimeMethodInfo.cs:line 378

This failure is in the first assert -- it's reading /proc/NNN/stat and finding 'dotnet' instead of 'ProcessNameMatchesScriptName_... something'.

In the original bug, the assert was finding blank when reading /proc/NNN/cmdline.

I don't know exactly how the contents of these files change when the process exits, but I wonder whether these are consistent with it having exited, but not all been cleaned up yet. (The slightly surprising thing is we're not seeing eg FileNotFoundException, but then, there's only one failure in the history)

As you point out, it seems hard to imagine that 10 minutes could have elapsed.

I suggest to just add Assert(!process.HasExited) just before the Process.Kill() and possibly Assert(File.Exists("/bin/sleep")) at the start. And call it fixed. If it's not, we will have better info. I don't think other changes are needed.

MattGal · 2021-02-16T16:46:30Z

(@MattGal can you tell me why the query below does not show the failures reported in #13757 (comment) ? )

Taking a look. We did have a volume issue last week hotfixed by doubling the capacity of machines ingesting to Kusto, which would have resulted in delayed ingestion. Not sure this is that though.

MattGal · 2021-02-16T16:50:22Z

@danmoseley Actually, it's obvious no investigation needed. The thing you linked is from 11/7/2019, 467 days ago. We keep a bit more than 120 days of test results in Kusto, and even that costs a fair bit of money. Once results age out they are naturally reclaimed in the data cluster.

carlossanlop · 2021-03-15T19:06:25Z

Ping @adamsitnik

adamsitnik · 2021-05-12T15:34:39Z

thank you all for great feedback, I am going to close the PR for now and get back to it when I have some free cycles

adamsitnik added 4 commits December 21, 2020 15:25

use simpler name to avoid issues with special characters

10b055b

use full path to avoid potential alias issues

ad512a3

handle potential zombie process

11c2197

add comments

81d97c3

adamsitnik added area-System.Diagnostics.Process test-bug Problem in test source code (most likely) labels Dec 21, 2020

adamsitnik added this to the 6.0.0 milestone Dec 21, 2020

adamsitnik requested review from tmds and eiriktsarpalis December 21, 2020 14:47

tmds reviewed Jan 4, 2021

View reviewed changes

jeffhandley mentioned this pull request Jan 13, 2021

Fix System.Diagnostics.Process tests #46941

Closed

6 tasks

jeffhandley assigned adamsitnik Feb 13, 2021

Base automatically changed from master to main March 1, 2021 09:07

jeffhandley marked this pull request as draft April 18, 2021 06:40

jeffhandley modified the milestones: 6.0.0, Future May 12, 2021

adamsitnik closed this May 12, 2021

ghost locked as resolved and limited conversation to collaborators Jun 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to make ProcessNameMatchesScriptName test more reliable #46292

Try to make ProcessNameMatchesScriptName test more reliable #46292

adamsitnik commented Dec 21, 2020

ghost commented Dec 21, 2020

tmds Jan 4, 2021

danmoseley Feb 13, 2021

tmds Jan 4, 2021

adamsitnik Jan 4, 2021

tmds Jan 4, 2021

adamsitnik Jan 4, 2021

tmds Jan 4, 2021

tmds Jan 4, 2021

adamsitnik Jan 4, 2021

danmoseley Feb 13, 2021

tmds commented Jan 4, 2021

jeffhandley commented Feb 13, 2021

danmoseley commented Feb 13, 2021

MattGal commented Feb 16, 2021

MattGal commented Feb 16, 2021

carlossanlop commented Mar 15, 2021

adamsitnik commented May 12, 2021

Try to make ProcessNameMatchesScriptName test more reliable #46292

Try to make ProcessNameMatchesScriptName test more reliable #46292

Conversation

adamsitnik commented Dec 21, 2020

ghost commented Dec 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tmds commented Jan 4, 2021

jeffhandley commented Feb 13, 2021

danmoseley commented Feb 13, 2021

MattGal commented Feb 16, 2021

MattGal commented Feb 16, 2021

carlossanlop commented Mar 15, 2021

adamsitnik commented May 12, 2021