Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

await content.read() reads partial content under high load #3881

Closed
yapith-suplari opened this issue Jul 1, 2019 · 2 comments
Closed

await content.read() reads partial content under high load #3881

yapith-suplari opened this issue Jul 1, 2019 · 2 comments
Labels
needs-info Issue is lacking sufficient information and will be closed if not provided Stale

Comments

@yapith-suplari
Copy link

yapith-suplari commented Jul 1, 2019

Long story short

Under heavy network load + high latency, aiohttp client's await read()` is sometimes truncated.

Expected behaviour

await read() always returns full content in the buffer

Actual behaviour

The returned content is sometimes truncated

Steps to reproduce

I haven't been able to successfully repro the intermittent issue in a test-bed yet. I'll update the ticket once I have a reliable repro. I know it's not actionable without it, but I create the ticket anyway to see if you have any inkling/intuition of what might be going on.

I laid out a few leads that I have not followed up thoroughly and will update the post when I discover more clues.

Analysis

I wonder if, under heavy load, there's time window where the StreamReader's internal buffer is empty that it terminates prematurely.

I'm dealing with ~100 concurrent downloads, about ~1 - 3 MB content-length per download that I push to disk or S3. When writing to SSD locally it almost never fails, it fails more often when I swapped the SSD to S3. It looks like higher latency increases the likelihood of it happening.

When I reduce the concurrent payload to <10 downloads, the problem goes away. I used asyncio.gather(*futures) to set up the download. This points to the number of concurrent coroutines affect the likelihood of the issue.

The issue is more likely when dealing with payload > 2.5 MB. This may suggest there could be a flaw in the buffer resizing in StreamReader.

I'm running without response content-length, so await read() supposed to read until EOF.

I'm running with TCPConnector:

session = aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=3600),
                                        connector=aiohttp.TCPConnector(keepalive_timeout=600))

I'm running with a single main event loop.

fwiw, setting a breakpoint in the buffering step reliably fixes the problem, so it could be memory barrier related? I'm still hypothesizing.

Your environment

OS: ubuntu 18.04
python: 3.7.2
aiohttp: 3.5.4

aiohttp client only.

I'm running with pipenv (both dockerized and local execution)

Plenty of RAM/CPU/network bandwidth -- I swapped aiohttp with requests, the load takes ~30 mins slower staging all the payloads in mem, but they complete without the content corruption.

Let me know if you have any questions!

Potential Solution

I've also noticed aiohttp version of StreamReader doesn't have readuntil(separator=b'\n') as introduced in Streams package since 3.5.2, this would be handy for me dealing with data-pull from web-services that don't return reliable Content-Length. -- this is the follow-up of the hypothesis that under heavy load, it's possible that read() can find its internal buffer being empty and terminate prematurely.

The RFE ticket for readuntil was closed because nobody needed it: #1151, I wonder if this is enough justification for us to re-open.

@yapith-suplari yapith-suplari changed the title await content.read() reads partial content await content.read() reads partial content under high load Jul 7, 2019
@alexforster
Copy link

I'm seeing this as well. I'm dealing with a ~20mbit/sec data stream, and if I don't read quickly enough, the library appears to just silently drop chunks of payload.

@Dreamsorcerer
Copy link
Member

This needs a reproducer for us to test with. The issue might already be fixed (if it even was an issue with aiohttp).

@Dreamsorcerer Dreamsorcerer added the needs-info Issue is lacking sufficient information and will be closed if not provided label Aug 13, 2024
@github-actions github-actions bot added the Stale label Sep 13, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-info Issue is lacking sufficient information and will be closed if not provided Stale
Projects
None yet
Development

No branches or pull requests

3 participants