-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to make ProcessNameMatchesScriptName test more reliable #46292
Conversation
Tagging subscribers to this area: @eiriktsarpalis Issue DetailsThis very simple test can sometimes fail: #13757 I was not able to reproduce it, but based on the logs attached to the original issue I believe that the According to man docs the file might be empty if the process is a zombie (it has already exited but parent process has not acknowledged that yet) so I've added a check for that.
I am not a Linux expert, but I guess that Fixes #13757 @tmds is there any chance you could take a look? It's nothing urgent, I just want to get rid of unstable tests from my areas.
|
@@ -159,9 +159,9 @@ public void ProcessStart_UseShellExecute_OnUnix_SuccessWhenProgramInstalled(bool | |||
[PlatformSpecific(~TestPlatforms.OSX)] // On OSX, ProcessName returns the script interpreter. | |||
public void ProcessNameMatchesScriptName() | |||
{ | |||
string scriptName = GetTestFileName(); | |||
const string scriptName = nameof(ProcessNameMatchesScriptName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless the underlying filesystem imposes additional restrictions, Linux filenames can contain any bytes except \0
and /
.
.NET assumes the bytes are UTF-8.
So, I think the filename is not an issue. If it were, we should make sure GetTestFileName
returns a valid filename on Linux instead of doing something specific in this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree I do'nt see a problem with GetTestFileName(), and if it has a problem, it will be affecting other tests. Either way no change is needed here.
|
||
Assert.Equal(scriptName, process.ProcessName); | ||
// cmdLine can sometimes be null: https://github.com/dotnet/runtime/issues/13757 | ||
// according to docs, it's when the process is a zombie: https://man7.org/linux/man-pages/man5/proc.5.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Assert
that fails was added to understand better why ProcessName
doesn't end up with the proper value. It verifies some expectations of the .NET implementation.
Can we verify the failing case has a zombie process? We can check /stat
for the Z
flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point, will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it a bit more.
Probably this test fails when the Assert
happens after the script has already exited. .NET will have reaped the process, and everything under /proc
is gone.
This happens when sleep doesn't work, or when the asserts are delayed for 10 min. We can safely increase the 10 min timeout. And add an assert to check !process.HasExited
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens when sleep doesn't work
How would this be possible?
when the asserts are delayed for 10 min
this is unlikely, as we just start the process and execute asserts right away
everything under /proc is gone
Based on the CI logs it looks like not everything is gone. The previous line that asserts the content of /proc/{process.Id}/stat
works just fine. Based on the available docs the only explanation I was able to find was the zombie process.
} | ||
finally | ||
{ | ||
process.Kill(); | ||
process.WaitForExit(); | ||
if (!process.HasExited) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you add this condition?
string filename = Path.Combine(TestDirectory, scriptName); | ||
File.WriteAllText(filename, $"#!/bin/sh\nsleep 600\n"); // sleep 10 min. | ||
File.WriteAllText(filename, $"#!/bin/sh\n/bin/sleep 600\n"); // sleep 10 min. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for changing the to explicitly using /bin/sleep
? I don't think we care how the sleep is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a Linux expert, but I guess that
sleep 600
could become a zombie right away only if there is a custom alias for thesleep
command. I don't know if it's possible, but I think that we can just use full path to be 100% sure we are calling the actual/bin/sleep
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know on Alpine it's busybox, not an actual executable file, but it's symlinked to the same place so I guess that's OK:
LAPTOP-P6UJDVTA:~$ ls -l /bin/sleep
lrwxrwxrwx 1 root root 12 Oct 21 09:23 /bin/sleep -> /bin/busybox
Adam, thanks for working through flaky tests. |
@adamsitnik It looks like this PR is close to ready with just a little feedback left to address. |
@adamsitnik looking at history I only see one failure TestResults
This failure is in the first assert -- it's reading /proc/NNN/stat and finding 'dotnet' instead of 'ProcessNameMatchesScriptName_... something'. In the original bug, the assert was finding blank when reading /proc/NNN/cmdline. I don't know exactly how the contents of these files change when the process exits, but I wonder whether these are consistent with it having exited, but not all been cleaned up yet. (The slightly surprising thing is we're not seeing eg FileNotFoundException, but then, there's only one failure in the history) As you point out, it seems hard to imagine that 10 minutes could have elapsed. I suggest to just add |
Taking a look. We did have a volume issue last week hotfixed by doubling the capacity of machines ingesting to Kusto, which would have resulted in delayed ingestion. Not sure this is that though. |
@danmoseley Actually, it's obvious no investigation needed. The thing you linked is from 11/7/2019, 467 days ago. We keep a bit more than 120 days of test results in Kusto, and even that costs a fair bit of money. Once results age out they are naturally reclaimed in the data cluster. |
Ping @adamsitnik |
thank you all for great feedback, I am going to close the PR for now and get back to it when I have some free cycles |
This very simple test can sometimes fail: #13757
I was not able to reproduce it, but based on the logs attached to the original issue I believe that the
GetTestFileName
might in theory be returning some non-ascii characters and Linux might not like it, so I've replaced it withconst
valueAccording to man docs the file might be empty if the process is a zombie (it has already exited but parent process has not acknowledged that yet) so I've added a check for that.
I am not a Linux expert, but I guess that
sleep 600
could become a zombie right away only if there is a custom alias for thesleep
command. I don't know if it's possible, but I think that we can just use full path to be 100% sure we are calling the actual/bin/sleep
.Fixes #13757
@tmds is there any chance you could take a look? It's nothing urgent, I just want to get rid of unstable tests from my areas.