Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexplained CancelledErrors Over Extended Period #2056

Closed
owaaa opened this issue Jul 5, 2017 · 14 comments
Closed

Unexplained CancelledErrors Over Extended Period #2056

owaaa opened this issue Jul 5, 2017 · 14 comments

Comments

@owaaa
Copy link

owaaa commented Jul 5, 2017

Long story short

I am experiencing intermittent but frequent unexplained CancelledErrors under load in AIOHTTP2. I'm not sure if I am doing something wrong or their is an issue. Looking at server logs and metrics, I'm not seeing any evidence this is a downstream hangup on aiohttp. Within the aiohttp code, I catch the CancelledError and timed the aiohttp requests, and the cancellation is always immediate such as 0.001 seconds (timex).

File "/usr/local/lib/python3.5/site-packages/aiohttp/client.py", line 637, in __aenter__
    self._resp = yield from self._coro
  File "/usr/local/lib/python3.5/site-packages/aiohttp/client.py", line 241, in _request
    yield from resp.start(conn, read_until_eof)
  File "/usr/local/lib/python3.5/site-packages/aiohttp/client_reqrep.py", line 559, in start
    (message, payload) = yield from self._protocol.read()
  File "/usr/local/lib/python3.5/site-packages/aiohttp/streams.py", line 509, in read
    yield from self._waiter
  File "/usr/local/lib/python3.5/asyncio/futures.py", line 380, in __iter__
    yield self  # This tells Task to wait for completion.
  File "/usr/local/lib/python3.5/asyncio/tasks.py", line 304, in _wakeup
    future.result()
  File "/usr/local/lib/python3.5/asyncio/futures.py", line 285, in result
    raise CancelledError
concurrent.futures._base.CancelledError

Reading through the source code, it looks like this is in trying to acquire a future. I'm not sure if i'm bumping into some sort of limit or something?

This is an issue for me as i'm consistently losing many requests due to this.

Expected behaviour

CancelledErrors when valid to not be so immediate or frequent, and to be the result of a downstream timeout or similar.

Actual behaviour

Seeing sporadic cancelled errors, but consistently occurring over periods of multiple hours, all in the same place inside aiohttp of trying to acquire a future, and always immediate.

Steps to reproduce

Running many requests though an aiohttp server over time. ClientSession is shared across all requests. I currently have timeouts set as ClientSession(loop=app.loop, read_timeout=120, conn_timeout=30)

Your environment

aiohttp 2.2.0, ubuntu, python 3.5,

@asvetlov
Copy link
Member

Do you use client session from aiohttp web handler?

@owsd
Copy link

owsd commented Jul 11, 2017

Yes, there is a single shared ClientSession object being stored in request.app['client'] style variable. I don't think this is the issue.

@asvetlov
Copy link
Member

The issue is: client (or web browser) used for making requests to your web server sometimes closes or just looses connection to server.
On peer disconnection aiohttp web server cancels a task used for processing web-handler async function.
In turn it cancels current IO operation which is ClientSession request instantiated from by your web-handler code.

@owsd
Copy link

owsd commented Jul 11, 2017

That's exactly right, the way to reproduce easily is to make a request on localhost with an nginx server setup like this:

location = / {
            proxy_pass http://127.0.0.1:8080/;
            proxy_read_timeout 1s;
            proxy_connect_timeout 1s;
        }

Run a simple aiohttp server on 8080, and have an await asyncio.sleep(0.99) followed by any other async code. You'll get a timeout in the sleep, or in the following async code (such as await client.get(url) or await response.json()).

@asvetlov
Copy link
Member

asvetlov commented Jul 11, 2017 via email

@owsd
Copy link

owsd commented Jul 11, 2017

Ok thanks for clarifying that point. However, making this the case makes it difficult to protect against and/or respond to. How would you advise mitigating this (other than never having clients disconnect).

The problem as I understand it is that it can be thrown on any async line of code anywhere in the server, and is indistinguishable from other ways that cancel may occur.

To be clear, in something like this, if I get past the client.get, I expect request.json to succeed, not be a potential line for failure.

async with client.get(url) as response:
    return await response.json()   # request finished, now json fails due to client disconnect

It just seems problematic to essentially make the client disconnecting able to disrupt the server code at essentially any line.

@owsd
Copy link

owsd commented Jul 11, 2017

Let me phrase it another way, I would expect failure point to be in responding to client perhaps (at the return point of web handler), or maybe a ClientDisconnect exception at the very least.

@owsd
Copy link

owsd commented Jul 11, 2017

As another aside, suppose you have a long-running process that is updating many documents, and the client is awaiting a response, but its for display purposes only ({ updateCount: 1500 }). If the client unexpectedly disconnects in the middle, the recovery cases are complex since it could stop your job at literally any line in the code -- you have to guard that literally every single line could be a failure point. There's no good way to finish the job even if you wanted to (with the way the task nesting is working at the moment).

@asvetlov
Copy link
Member

For protecting your code from cancellation you should spawn a new task.
I've created https://github.com/aio-libs/aiojobs library for providing handy helpers.

@socketpair
Copy link
Contributor

Please close issue. Everything is clearly explained.

@thehesiod
Copy link
Contributor

I tend to agree with @owsd, I'd imagine server should raise something like a ClientDisconnected error so differentiate between something that's expected, and unexpected. Cancelled errors are unexpected, ClientDisconnected are expected.

@asvetlov
Copy link
Member

@thehesiod there is no way to pass custom exception into task.cancel().
And there is no way to stop a task other than task.cancel() call.

@surfacepatterns
Copy link

I just came across this bug today.

I'm confused as to why this bug was ever closed. AFAICT, there's no way to distinguish between the case where CancelledError is raised due to the application explicitly cancelling the task, and the case where CancelledError is raised due to aiohttp closing the task when the server disconnects, and it's very important to be able to distinguish between the two so that you can implement different behavior in the two different cases. Frankly, it doesn't matter that there's no way to pass a custom exception to cancel() - that doesn't stop this bug from being a legitimate bug in aiohttp.

@asvetlov
Copy link
Member

I feel your pain.
We have no much space for a maneuver in 3.x branch.
For 4.x there is #4080 draft PR (not finished yet)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants