Skip to content
This repository has been archived by the owner on Sep 30, 2022. It is now read-only.

Use readchunk and fix timeouts #8

Merged
merged 2 commits into from
Mar 15, 2019
Merged

Use readchunk and fix timeouts #8

merged 2 commits into from
Mar 15, 2019

Conversation

livioso
Copy link
Contributor

@livioso livioso commented Jun 6, 2018

Hi again o/

I think this is a bug, or at least in our case (^1) it caused a aiohttp timeout while reading chunks.


When the chunk_size is that small (512 bytes), the loop in which we read is executed so fast that our CPU usage goes up to 100% since we practically don't await anymore. This caused a hanging state ultimately resulting in a timeout (after 60s). Increasing the chunk_size to 512kb instead of 512 bytes seemed reasonable (or was this chosen to be sooo small on purpose?).

^1: We use an ARM Cortex A7, single core (imx6ul)

@Bastian-Krause
Copy link
Member

Hi!

We could increase the read size or use aiohttp.StreamReader.readchunk(). But I would like to understand the problem first.

When the chunk_size is that small (512 bytes), the loop in which we read is executed so fast that our CPU usage goes up to 100% since we practically don't await anymore.

I can understand this, but 512 bytes does not sound that small to me. The chunk write is performed per loop cycle, but I guess the target directory is fast tmpfs, so yes, the loop will run pretty fast.

This caused a hanging state ultimately resulting in a timeout (after 60s). Increasing the chunk_size to 512kb instead of 512 bytes seemed reasonable (or was this chosen to be sooo small on purpose?).

This is the part I do not understand. The aiohttp.Timeout is applied per loop cycle. I don't get why the process should stall completely (for over 60 seconds) - even if it might be inefficient on a low-performance embedded system currently. That feels strange to me.

Are you sure that the file system you are writing too is large enough?

When does this happen? How much of the bundle is written until the timeout is triggered? What bundle file size are we talking about?

Could you write the bundle to a destination not backed by RAM? What's happening then?

Regards,
Bastian

@Bastian-Krause
Copy link
Member

Bonus question: could you provide the full timeout traceback?

@livioso
Copy link
Contributor Author

livioso commented Jun 6, 2018

Thanks for the fast reply. :)

This is the part I do not understand. The aiohttp.Timeout is applied per loop cycle. I don't get why the process should stall completely (for over 60 seconds) - even if it might be inefficient on a low-performance embedded system currently. That feels strange to me.

Agreed, it's a bit strange. :)

Are you sure that the file system you are writing too is large enough?

Yes, we have enough space.

When does this happen? How much of the bundle is written until the timeout is triggered?

Not immediately (on the fist chunks); it seems to happen after about ~40-50MB.

What bundle file size are we talking about?

Currently, ~70MB.

Could you write the bundle to a destination not backed by RAM? What's happening then?

Same outcome.

Bonus question: could you provide the full timeout traceback?

I'll try to follow up with it tomorrow. Right now we don't have a full traceback (we only see the timeout log and the the update is retired after 5 minutes).

@livioso
Copy link
Contributor Author

livioso commented Jun 11, 2018

Hi, some follow up. :)

  • I was able to reproduce this locally (with a stripped down example, fetching chunk-wise).
  • You're right, chunk size does not fix it entirely (e.g. if I fetch a 10GB file chunk wise on my Macbook, Python 3.6.5, aiohttp 3.3 using a chunk size of 512KB) -- I get the same "Timeout".
  • The timeout we see is https://github.com/aio-libs/aiohttp/blob/3.3/aiohttp/streams.py#L250 (<< not in rauc-hawkbit).
  • When using HTTP + a small chunk_size I can observe that the client struggles a lot (constantly sending TCP Zero Windows, till at a certain point read times out because no new data was fed (feed_data), there is nothing to read in time). It happens more often / earlier if the file / processing power is lower but if the file is big enough, I can see the same happening - just more randomly + at a later stage.
  • When using HTTPS, it's a bit harder to debug but I can also see that the client struggles and constantly pauses / resumes transmission; till eventually it also fails. Presumably the same happens (read times out because feeding new data didn't happen in time).

How to fix this? I'm not sure, the timeout is AFAIK at 0.1s in aiohttp... Fetching this exception and just retry might be an option but not ideal (right?).

Example that crashes

import aiohttp
import asyncio


async def fetch(session, url):
    async with session.get(url) as response:
        while True:
            chunk = await response.content.read(512000)

            # a) same issue, eventually breaks
            # chunk, _ = await response.content.readchunk()

            # b) no effect
            # asyncio.sleep(0.001)

            if not chunk:
                break


async def main():
    async with aiohttp.ClientSession() as session:
        await fetch(session, 'https://speed.hetzner.de/10GB.bin')
        # await fetch(session, 'http://localhost:2280/huge.zip')
        # await fetch(session, 'http://localhost:2280/segin-rootfs-update-02.raucb')
        # await fetch(session, 'https://speed.hetzner.de/100MB.bin')
        print('Success')

loop = asyncio.get_event_loop()
loop.set_debug(True)
loop.run_until_complete(main())

Stacktrace

Traceback (most recent call last):
  File "aiohttp-rauc-hawkbit.py", line 30, in <module>
    loop.run_until_complete(main())
  File "/usr/local/opt/asdf/installs/python/3.6.5/lib/python3.6/asyncio/base_events.py", line 468, in run_until_complete
    return future.result()
  File "aiohttp-rauc-hawkbit.py", line 22, in main
    await fetch(session, 'https://speed.hetzner.de/10GB.bin')
  File "/usr/local/opt/asdf/installs/python/3.6.5/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "aiohttp-rauc-hawkbit.py", line 8, in fetch
    chunk = await response.content.read(51200)
  File "/usr/local/opt/asdf/installs/python/3.6.5/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/Users/livioso/.local/share/virtualenvs/third-party-api-experiments-pge9BDZy/lib/python3.6/site-packages/aiohttp/streams.py", line 319, in read
    await self._wait('read')
  File "/usr/local/opt/asdf/installs/python/3.6.5/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/Users/livioso/.local/share/virtualenvs/third-party-api-experiments-pge9BDZy/lib/python3.6/site-packages/aiohttp/streams.py", line 250, in _wait
    await waiter
  File "/Users/livioso/.local/share/virtualenvs/third-party-api-experiments-pge9BDZy/lib/python3.6/site-packages/aiohttp/helpers.py", line 671, in __exit__
    raise asyncio.TimeoutError from None
concurrent.futures._base.TimeoutError

TCP ZeroWindows

server at 2280
screen shot 2018-06-08 at 16 26 24

@Bastian-Krause
Copy link
Member

Hi,

thanks for the follow-up. I tried to reproduce this using your example, but no luck. I successfully downloaded huge files from local and remote HTTP servers. I noticed that the CPU usage is pretty high, but as you said, your example is pretty much copied from the aiohttp Client API.

Maybe it is worth to open an issue with your example code at https://github.com/aio-libs/aiohttp? I think it's actually an aiohttp client issue.

If your bundle download works with..

chunk, _ = await response.content.readchunk()

..I guess we should merge that.

@Bastian-Krause
Copy link
Member

Oh, you already opened an issue there, nice!

@Bastian-Krause
Copy link
Member

Bastian-Krause commented Jun 12, 2018

Looking at the aiohttp issue it seems we are running into the aiohttp.ClientSession 5 minutes timeout. I think we want to create the aiohttp.ClientSession with something like:

timeout = ClientTimeout(total=None, connect=x, sock_connect=y, sock_read=z)
async with aiohttp.ClientSession(timeout=timeout) as session:
    ...

@Bastian-Krause
Copy link
Member

@livioso Would you like to make a PR for this and test your use case with the new code?

@livioso
Copy link
Contributor Author

livioso commented Jun 12, 2018

Lets do this (a) set the timeout properly (e.g. remove the context manager with Timeout(3600), set timeout and (b) use readchunk which leads to a better / more optimal throughput (since read(512) eventually let to the timeout for us). I'll update this PR tomorrow. :)

Replaces read(chunk_size) with readchunk which reads as much as it can.
This is the better approach than to choose a chunk_size that works well
under all circumstances (HTTP, HTTPS).

Also, sets the timeouts properly. Before the context helper didn't do
anything since the requests would timeout after 5 minutes anyway.
Now, the timeouts are respected as expected.

Signed-off-by: Livio Bieri <livio@livio.li>
@livioso livioso reopened this Jun 14, 2018
support for ClientTimeout was added in 3.3.1

Signed-off-by: Livio Bieri <livio@livio.li>
@livioso livioso changed the title Increase default chunk_size to 512kb Use read chunk and fix timeouts Jun 14, 2018
@livioso livioso changed the title Use read chunk and fix timeouts Use readchunk and fix timeouts Jun 14, 2018
@livioso
Copy link
Contributor Author

livioso commented Jun 14, 2018

👋 Let me know if you have any inputs / concerns; this works for us. :)

@livioso
Copy link
Contributor Author

livioso commented Jun 24, 2018

@BastianStender Did you have time to go over it yet? :)

@Bastian-Krause
Copy link
Member

@livioso Sorry for keeping you waiting. Unfortunately I can't dive into this until (hopefully) next week.

Copy link
Member

@ejoerns ejoerns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successfully tested on our Demo hardware.

Wow, this is amazingly faster than the previous approach!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants