Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work around EPROTOTYPE race condition on macOS #27221

Closed
halter73 opened this issue Aug 23, 2018 · 8 comments · Fixed by dotnet/corefx#37208 or #69882
Closed

Work around EPROTOTYPE race condition on macOS #27221

halter73 opened this issue Aug 23, 2018 · 8 comments · Fixed by dotnet/corefx#37208 or #69882
Assignees
Milestone

Comments

@halter73
Copy link
Member

There are some flaky tests in Kestrel that are caused by a race condition on macOS since at least Yosemite where EPROTOTYPE will sometimes be raised instead of EPIPE when sends occur during connection teardown. This results in a SocketError.ProtocolType being raised instead of the expected SocketError.Shutdown from Socket.SendAsync().

This was fixed in libuv 3 years ago by retrying the write operation. You can find more background info about this issue here.

@wfurt
Copy link
Member

wfurt commented Sep 19, 2018

Is this specific to Yosemite? (10.10). That is far beyond support IMHO.
If this is reproducible on 10.12+, could you please try to craft some repro?

@halter73
Copy link
Member Author

halter73 commented Sep 19, 2018

It is reproducible in 10.12 (this is the version of macOS the last agent we've seen this error was running) and very well could have been an issue before 10.10, but that's the timeframe when it was first reported to libuv.

This is a race condition that occurs when a send and a connection reset occur simultaneously, so it requires a loop to repro. Here's a repro written in C against sys/socket.h. This is the very detailed blog post from the original finder of the bug four years ago that tracks the bug all the way to the offending kernel code.

@wfurt
Copy link
Member

wfurt commented Sep 19, 2018

thanks @halter73 . I'll give it a try.

@karelz
Copy link
Member

karelz commented Oct 12, 2018

@wfurt do we have enough info to make it actionable?

@wfurt
Copy link
Member

wfurt commented Oct 12, 2018

I think so. The next step is to isolate repro and gather some more data.
If/when we get repro with provided C code, we could do that in c# as well or simply guess fix.
I don't think we will be bale to cover this with unit tests as it is race condition.

@halter73
Copy link
Member Author

If/when we get repro with provided C code

Is there something wrong with this repro that I mentioned in my last comment? I haven't tried it myself, but it seems like what you're looking for.

@karelz
Copy link
Member

karelz commented Oct 12, 2018

@halter73 we likely just didn't get to it ... too many high-pri bugs elsewhere and this one does not seem to be super high pri ;)

@wfurt
Copy link
Member

wfurt commented Oct 13, 2018

yes, that is the one I was planning to use @halter73

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants