Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threadpool worker threads exit on socket errors #358

Closed
1 of 3 tasks
cameronbrunner opened this issue Jan 15, 2021 · 5 comments Β· Fixed by #649
Closed
1 of 3 tasks

Threadpool worker threads exit on socket errors #358

cameronbrunner opened this issue Jan 15, 2021 · 5 comments Β· Fixed by #649
Labels
bug Something is broken reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR

Comments

@cameronbrunner
Copy link
Contributor

cameronbrunner commented Jan 15, 2021

❓ I'm submitting a ...

  • 🐞 bug report
  • 🐣 feature request
  • ❓ question about the decisions made in the repository

🐞 Describe the bug. What is the current behavior?
When cheroot threads handle a conn that throws a socket error the thread exits. The easiest way to get this to occur is to create an HTTPS cherrypy server with a cheroot backend and then issue HTTP requests. SSL errors will be detected for each connection and a worker thread will exit. Once all worker threads have exited cherrypy becomes unresponsive.

The issue appears to be due to this commit for v8.1.0...

6f9a8f9

when breaking out the code from 'tick' into 'from_server_socket'. Previously any socket error or other error now handled by 'from_server_socket' used to prevent the conn being put into the queue. Now these events cause a None to be put on the queue. The None is equivalent to a threadpool shutdown event thus leading to the exit of the processing worker thread. In master the code the lines of note are:

❓ What is the motivation / use case for changing the behavior?

This issue leads to many scenarios where a server using cheroot will have less than 'min' worker threads running and very easily could have 0 worker threads running. In general ThreadPools should provide some level of guarantee that the number of running / active threads lies somewhere between the min and max configured values.

πŸ’‘ To Reproduce

Steps to reproduce the behavior:

  1. Setup an HTTPS server with a thread pool of 10 listening on port 8443.
  2. Validate functionality with a curl https://$(hostname -f)
  3. Make 10 HTTP requests to the server curl http://$(hostname-f)
  4. Validate that curl https://$(hostname -f) hangs.

Example... Note the 10 second delay on curl prior to breaking out of the hang.

[1:47 PM] Cameron Brunner
    
[root@ip-10-1-0-64 ~]# curl https://$(hostname -f):8443/ > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   513  100   513    0     0   3841      0 --:--:-- --:--:-- --:--:--  3857
[root@ip-10-1-0-64 ~]# for i in $(seq 1 10); do curl http://$(hostname -f):8443; done
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
[root@ip-10-1-0-64 ~]# curl https://$(hostname -f):8443/ > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:10 --:--:--     0

πŸ’‘ Expected behavior
Threads should never exit or threads should be recreated after they exit.

πŸ“‹ Details

πŸ“‹ Environment

  • Cheroot version: 8.5.0
  • CherryPy version: 18.6.0
  • Python version: 3.6.12
  • OS: Centos 7
  • Browser: [all ]

πŸ“‹ Additional context
I think an appropriate simple fix would be to check the return value from _from_server_socket and not call process_conn if it is None.

Changed showed below:
https://github.com/cherrypy/cheroot/blob/master/cheroot/connections.py#L216

                if conn is self.server:
                    # New connection
                    new_conn = self._from_server_socket(self.server.socket)
                    if new_conn is not None:
                        self.server.process_conn(new_conn)
@cameronbrunner cameronbrunner added bug Something is broken triage labels Jan 15, 2021
@cameronbrunner
Copy link
Contributor Author

@dtougas

@cameronbrunner
Copy link
Contributor Author

It would be a much larger change, but might be worth it, to have some thread maintenance to check threads that might have exited. Wrapping the worker run code to catch all exceptions would be another option to consider.

cameronbrunner added a commit to cameronbrunner/cheroot that referenced this issue Jan 15, 2021
The `_from_server_socket routine` may return a None which should
not be sent to workers to process.

for cherrypy#358
@webknjaz webknjaz added reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR and removed triage labels Jan 17, 2021
webknjaz pushed a commit to cameronbrunner/cheroot that referenced this issue Jan 18, 2021
The `_from_server_socket routine` may return a None which should
not be sent to workers to process.

for cherrypy#358
webknjaz added a commit that referenced this issue Jan 18, 2021
This change makes sure that only valid connections are processed.

The `_from_server_socket` routine may return a `None` when the attempt
to establish the connection goes wrong. Like if a client tries to
speak plain text HTTP to HTTPS socket. Earlier, this was causing
`None` to be forwarded to workers to processes making them to exit.
Eventually, there would be no workers left in the pool and the server
would reset any attempts to establish new connections.

Ref #358
@pavelbrych
Copy link

pavelbrych commented Apr 26, 2021

Hi, we just run into similar issue in our project, see the traceback bellow. Check for not-empty connection does not help here.

This SSLError in parse_request() will trigger _conditional_error(), but subsequent exception in simple_response() is not handled and results in interruption of WorkerThread.run() method.

When all worker-threads are stopped, the server became unresponsive.

I see no harm in the proposed solution to catch all exceptions related to connection inside WorkerThread.run() method β€” it should not be worse than leaving WorkerThread instance in unusable state, what do you think?

2021-04-26 20:37:13 WARNING  [26/Apr/2021:20:37:13] ENGINE socket.error 1
Traceback (most recent call last):
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 1277, in communicate
    req.parse_request()
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 706, in parse_request
    success = self.read_request_line()
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 747, in read_request_line
    request_line = self.rfile.readline()
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 304, in readline
    data = self.rfile.readline(256)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\_pyio.py", line 557, in readline
    b = self.read(nreadahead())
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\_pyio.py", line 536, in nreadahead
    readahead = self.peek(1)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\_pyio.py", line 1129, in peek
    return self._peek_unlocked(size)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\_pyio.py", line 1136, in _peek_unlocked
    current = self.raw.read(to_read)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
ssl.SSLError: [SSL] internal error (_ssl.c:2623)

Exception in thread CP Server Thread-17:
Traceback (most recent call last):
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 1277, in communicate
    req.parse_request()
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 706, in parse_request
    success = self.read_request_line()
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 747, in read_request_line
    request_line = self.rfile.readline()
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 304, in readline
    data = self.rfile.readline(256)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\_pyio.py", line 557, in readline
    b = self.read(nreadahead())
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\_pyio.py", line 536, in nreadahead
    readahead = self.peek(1)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\_pyio.py", line 1129, in peek
    return self._peek_unlocked(size)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\_pyio.py", line 1136, in _peek_unlocked
    current = self.raw.read(to_read)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
ssl.SSLError: [SSL] internal error (_ssl.c:2623)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\threading.py", line 932, in _bootstrap_inner
    self.run()
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\workers\threadpool.py", line 125, in run
    keep_conn_open = conn.communicate()
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 1306, in communicate
    self._conditional_error(req, '500 Internal Server Error')
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 1349, in _conditional_error
    req.simple_response(response)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 1115, in simple_response
    self.conn.wfile.write(EMPTY.join(buf))
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\makefile.py", line 438, in write
    res = super().write(val, *args, **kwargs)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\makefile.py", line 36, in write
    self._flush_unlocked()
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\makefile.py", line 45, in _flush_unlocked
    n = self.raw.write(bytes(self._write_buf))
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\socket.py", line 687, in write
    return self._sock.send(b)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\ssl.py", line 1173, in send
    return self._sslobj.write(data)
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2473)

@thezoggy
Copy link

thezoggy commented Dec 8, 2021

Not we see this with our app when older ssl requests come in from random scanners or local apps that still tried ssl rather than tls. After like 5 requests in a row cherrypy crashes

@webknjaz
Copy link
Member

I see no harm in the proposed solution to catch all exceptions related to connection inside WorkerThread.run() method β€” it should not be worse than leaving WorkerThread instance in unusable state, what do you think?

Thanks for the suggestion! I missed this but also came to the same conclusion while looking into it more.

There are exceptions to handle better on other levels too, though.

webknjaz added a commit to webknjaz/cheroot that referenced this issue Mar 12, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this issue Mar 13, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this issue Mar 13, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this issue Mar 21, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this issue Mar 22, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this issue Mar 23, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit to webknjaz/cheroot that referenced this issue Mar 23, 2024
This patch aims to prevent a situation when the working threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all. This was causing DoS
in many situations, including TLS errors and attempts to close the
underlying sockets erroring out.

Fixes cherrypy#358
Ref cherrypy#375
Resolves cherrypy#365
webknjaz added a commit that referenced this issue Mar 31, 2024
A DoS would happen in many situations, including TLS errors and
attempts to close the underlying sockets erroring out.

This patch aims to prevent a situation when the worker threads are
killed by arbitrary exceptions that bubble up to their entry point
layers that aren't handled properly or at all.

PR #649

Fixes #358
Fixes #354

Ref #310
Ref #346
Ref #375
Ref #599
Ref #641

Resolves #365
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants