Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Suppress legit OS errors on socket shutdown #342

Merged
merged 3 commits into from
Dec 5, 2020

Conversation

webknjaz
Copy link
Member

@webknjaz webknjaz commented Nov 30, 2020

When closing a socket we normally want to terminate the transport-
level connection by sending a TCP FIN packet over the wire. This works
great if the client is willing to perform a 3-way handshake but
sometimes the client-side just drops the connection by sending an RST
instead of a FIN. In this case, our server-side socket.shutdown()
call will raise an OSError (socket.error on Python 2) or its
derivatives. Which is perfectly fine and it's alright for us to ignore
those.

The kernel socket close helper rewrite introduced a behavior of only
suppressing ENOTCONN in v8.4.8 (the case when the client is no longer
with us) but it left own a few other cases that may be happening too.

This change fixes that by extending the list of errors that are to be
suppressed. Here's what's handled from now on:

  • ENOTCONN — client is no longer connected
  • EPIPE — write on a pipe while the other end has been closed
  • ESHUTDOWN — write on a socket which has been shutdown for writing
  • ECONNRESET — connection is reset by the peer

What kind of change does this PR introduce?

  • 🐞 bug fix
  • 🐣 feature
  • 📋 docs update
  • 📋 tests/coverage improvement
  • 📋 refactoring
  • 💥 other

📋 What is the related issue number (starting with #)

Fixes #341

What is the current behavior? (You can also link to an open issue here)

Connection drop/reset/shutdown initiated by the client-side while the server attempts to send a TCP FIN triggers an unhandled OSError/socket.error during the socket shutdown and thus leaks into the logs.

What is the new behavior (if this is a feature change)?

It's handled gracefully and suppressed.

📋 Other information:

Refs:

📋 Checklist:

  • I think the code is well written
  • I wrote good commit messages
  • I have squashed related commits together after the changes have been approved
  • Unit tests for the changes exist
  • Integration tests for the changes exist (if applicable)
  • I used the same coding conventions as the rest of the project
  • The new code doesn't generate linter offenses
  • Documentation reflects the changes
  • The PR relates to only one subject with a clear title
    and description in grammatically correct, complete sentences

This change is Reviewable

@webknjaz webknjaz requested review from cyraxjoe and jaraco November 30, 2020 18:10
@webknjaz webknjaz force-pushed the bugfixes/341-conn-reset-on-sock-shutdown branch from 3095268 to b527eb2 Compare November 30, 2020 18:10
@webknjaz
Copy link
Member Author

cc @liamstask && @The-Compiler

@webknjaz webknjaz force-pushed the bugfixes/341-conn-reset-on-sock-shutdown branch from b527eb2 to 211581e Compare November 30, 2020 18:42
Copy link
Contributor

@liamstask liamstask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change lgtm, not sure of the best way to test

cheroot/errors.py Outdated Show resolved Hide resolved

acceptable_sock_shutdown_error_codes = {
errno.ENOTCONN,
errno.EPIPE, errno.ESHUTDOWN, # corresponds to BrokenPipeError in Python 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do both of these map to BrokenPipeError? maybe worth putting on their own line so the comment is unambiguous

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's why I've put them on the same line: https://docs.python.org/3/library/exceptions.html#BrokenPipeError

Copy link
Contributor

@The-Compiler The-Compiler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay! I set up a CI run which runs 10 Windows jobs resulting in 4 of them failing:

Then I tested this branch and all jobs pass again:

So I'm guessing this PR indeed fixes my issue - thanks for the quick fix! 👍

@webknjaz
Copy link
Member Author

webknjaz commented Dec 3, 2020

Thanks for the feedback! I'll see if I'll be able to come up with some regression tests soonish and if not, I'll just address the suggestions and merge the PR as is.
P.S. I can't promise a quick release though — our automated publishing is affected by #343 and I'm working to fix that.

When closing a socket we normally want to terminate the transport-
level connection by sending a TCP FIN packet over the wire. This works
great if the client is willing to perform a 3-way handshake but
sometimes the client-side just drops the connection by sending an RST
instead of a FIN. In this case, our server-side `socket.shutdown()`
call will raise an OSError (socket.error on Python 2) or its
derivatives. Which is perfectly fine and it's alright for us to ignore
those.

The kernel socket close helper rewrite introduced a behavior of only
suppressing ENOTCONN in v8.4.8 (the case when the client is no longer
with us) but it left own a few other cases that may be happening too.

This change fixes that by extending the list of errors that are to be
suppressed. Here's what's handled from now on:
  * ENOTCONN — client is no longer connected
  * EPIPE — write on a pipe while the other end has been closed
  * ESHUTDOWN — write on a socket which has been shutdown for writing
  * ECONNRESET — connection is reset by the peer

Fixes #341

Refs:
  * https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Connection_termination
  * http://cs.baylor.edu/~donahoo/practical/CSockets/TCPRST.pdf
  * #341 (comment)

Co-Authored-By: Liam Staskawicz <liam@stask.net>
@webknjaz webknjaz force-pushed the bugfixes/341-conn-reset-on-sock-shutdown branch from 211581e to 384e6d0 Compare December 4, 2020 14:04
@webknjaz
Copy link
Member Author

webknjaz commented Dec 4, 2020

This is how one can trigger RST (ConnectionResetError: [Errno 104] Connection reset by peer) https://stackoverflow.com/a/40779258/595220

@webknjaz
Copy link
Member Author

webknjaz commented Dec 4, 2020

I can only reproduce RST on recv() calls but not on shutdown(), but I was able to consistently cause OSError(107, 'Transport endpoint is not connected') in tests... Can it be that shutdown() only raises ECONNRESET on Windows?

@webknjaz
Copy link
Member Author

webknjaz commented Dec 5, 2020

CPython devs appear to be struggling with reproducers too: https://bugs.python.org/issue30319 / https://bugs.python.org/issue30329 / python/cpython@83a2c28 / https://github.com/python/cpython/blob/c39b52f/Lib/poplib.py#L297-L302 / https://docs.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-shutdown.

I guess I'll just try to monkey-patch shutdown to raise states that are hard to reproduce until somebody with Windows can take a look.

@webknjaz
Copy link
Member Author

webknjaz commented Dec 5, 2020

Hey @mxii-ca, I recall you performed an excellent debugging of TLS earlier this year so I thought you may be able to help us out here. Do you know of any ways to trigger TCP RST to reproduce an issue when the client sends RST while the server initiates a shutdown sequence with FIN? Anything Windows-specific maybe?

@jaraco
Copy link
Member

jaraco commented Dec 5, 2020

I can't promise a quick release though

Releases shouldn't be blocked on automation. I'll cut a manual release if needed.

@jaraco
Copy link
Member

jaraco commented Dec 5, 2020

Since this change fixes an observable regression and as the user was able to confirm this change addresses that regression, I'd like to proceed with merging this fix and getting it out. I've filed #344 to capture the need to follow-up with a proper regression test.

@jaraco jaraco merged commit 6b10c6c into master Dec 5, 2020
@jaraco jaraco deleted the bugfixes/341-conn-reset-on-sock-shutdown branch December 5, 2020 16:36
@webknjaz
Copy link
Member Author

webknjaz commented Dec 5, 2020

Oh, I had some local tests that I was about to push today.

@jaraco
Copy link
Member

jaraco commented Dec 5, 2020

Great! Just push them as a separate PR (if you wish), or just push them to maint/8.x, and that'll close out #344.

@webknjaz
Copy link
Member Author

webknjaz commented Dec 5, 2020

I'll work on it. I had an amended commit so I need to deal with that first.

@webknjaz
Copy link
Member Author

webknjaz commented Dec 5, 2020

@jaraco I noticed you had to make a manual release. That should work now that I've applied a hotfix to GHA: 71e03b1. Since there's some flaky test jobs, it may fail in the middle of the test stage, though. If that happens, just restarting the workflow should get things going.

@webknjaz
Copy link
Member Author

webknjaz commented Dec 5, 2020

@The-Compiler This is released as v8.5.0 FYI

@jaraco
Copy link
Member

jaraco commented Dec 6, 2020

I'll work on it. I had an amended commit so I need to deal with that first.

Gah. I hadn't considered that contingency. Let me know if I can help.

@mxii-ca
Copy link
Contributor

mxii-ca commented Dec 7, 2020

@webknjaz you can change the behavior of close from the graceful FIN shutdown to sending a RST by setting the SO_LINGER socket option

when l_onoff is non-zero (true), the connection will linger open after close has been called to allow for send queues to empty.
however, if the amount of time to wait, l_linger, is set to 0, then instead of waiting, it will hard close the connection by sending a reset

import socket

s = socket.create_connection(("127.0.0.1", 5673))

linger = bytearray()
linger.extend((1).to_bytes(2, "little", signed=False))  # l_onoff
linger.extend((0).to_bytes(2, "little", signed=False))  # l_linger
s.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, bytes(linger))

# send RST
s.close()

this should work cross-platform

@mxii-ca
Copy link
Contributor

mxii-ca commented Dec 7, 2020

unfortunately, using SO_LINGER to send a RST in the middle of a FIN shutdown is nearly impossible as it becomes a timing issue

however, reliably sending a RST in the middle of a FIN shutdown should be possible with a raw socket; you'd have to write your own TCP stack or modify someone else's to send the RST at the desired time.
a quick google search brought up this example for using raw sockets in python:
https://www.binarytides.com/raw-socket-programming-in-python-linux/

note that using raw sockets requires running as an Administrator on Windows and the CAP_NET_RAW capability on Linux

@webknjaz
Copy link
Member Author

webknjaz commented Dec 7, 2020

unfortunately, using SO_LINGER to send a RST in the middle of a FIN shutdown is nearly impossible as it becomes a timing issue

yeah.. that's what I've concluded after some experiments.

note that using raw sockets requires running as an Administrator on Windows and the CAP_NET_RAW capability on Linux

Woah, this adds a lot of complications. I'll have to postpone this idea and just simulate the exceptions with a Mock(side_effect=) where possible (https://github.com/cherrypy/cheroot/compare/testing/341-conn-reset-on-sock-shutdown-regression).

P.S. Thanks for the feedback! I did have in mind some ideas that I wanted to emit RST in a controlled way but never thought about the raw sockets... I also wanted to see if mitmproxy would help me achieve the required result but it doesn't look like it'd be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[8.4.8 regression?] ConnectionResetError on Windows
5 participants