Use readchunk and fix timeouts #8

livioso · 2018-06-06T12:33:36Z

Hi again o/

I think this is a bug, or at least in our case (^1) it caused a aiohttp timeout while reading chunks.

When the chunk_size is that small (512 bytes), the loop in which we read is executed so fast that our CPU usage goes up to 100% since we practically don't await anymore. This caused a hanging state ultimately resulting in a timeout (after 60s). Increasing the chunk_size to 512kb instead of 512 bytes seemed reasonable (or was this chosen to be sooo small on purpose?).

^1: We use an ARM Cortex A7, single core (imx6ul)

Bastian-Krause · 2018-06-06T13:17:18Z

Hi!

We could increase the read size or use aiohttp.StreamReader.readchunk(). But I would like to understand the problem first.

When the chunk_size is that small (512 bytes), the loop in which we read is executed so fast that our CPU usage goes up to 100% since we practically don't await anymore.

I can understand this, but 512 bytes does not sound that small to me. The chunk write is performed per loop cycle, but I guess the target directory is fast tmpfs, so yes, the loop will run pretty fast.

This caused a hanging state ultimately resulting in a timeout (after 60s). Increasing the chunk_size to 512kb instead of 512 bytes seemed reasonable (or was this chosen to be sooo small on purpose?).

This is the part I do not understand. The aiohttp.Timeout is applied per loop cycle. I don't get why the process should stall completely (for over 60 seconds) - even if it might be inefficient on a low-performance embedded system currently. That feels strange to me.

Are you sure that the file system you are writing too is large enough?

When does this happen? How much of the bundle is written until the timeout is triggered? What bundle file size are we talking about?

Could you write the bundle to a destination not backed by RAM? What's happening then?

Regards,
Bastian

Bastian-Krause · 2018-06-06T13:22:00Z

Bonus question: could you provide the full timeout traceback?

livioso · 2018-06-06T16:43:14Z

Thanks for the fast reply. :)

This is the part I do not understand. The aiohttp.Timeout is applied per loop cycle. I don't get why the process should stall completely (for over 60 seconds) - even if it might be inefficient on a low-performance embedded system currently. That feels strange to me.

Agreed, it's a bit strange. :)

Are you sure that the file system you are writing too is large enough?

Yes, we have enough space.

When does this happen? How much of the bundle is written until the timeout is triggered?

Not immediately (on the fist chunks); it seems to happen after about ~40-50MB.

What bundle file size are we talking about?

Currently, ~70MB.

Could you write the bundle to a destination not backed by RAM? What's happening then?

Same outcome.

Bonus question: could you provide the full timeout traceback?

I'll try to follow up with it tomorrow. Right now we don't have a full traceback (we only see the timeout log and the the update is retired after 5 minutes).

livioso · 2018-06-11T11:50:13Z

Hi, some follow up. :)

I was able to reproduce this locally (with a stripped down example, fetching chunk-wise).
You're right, chunk size does not fix it entirely (e.g. if I fetch a 10GB file chunk wise on my Macbook, Python 3.6.5, aiohttp 3.3 using a chunk size of 512KB) -- I get the same "Timeout".
The timeout we see is https://github.com/aio-libs/aiohttp/blob/3.3/aiohttp/streams.py#L250 (<< not in rauc-hawkbit).
When using HTTP + a small chunk_size I can observe that the client struggles a lot (constantly sending TCP Zero Windows, till at a certain point read times out because no new data was fed (feed_data), there is nothing to read in time). It happens more often / earlier if the file / processing power is lower but if the file is big enough, I can see the same happening - just more randomly + at a later stage.
When using HTTPS, it's a bit harder to debug but I can also see that the client struggles and constantly pauses / resumes transmission; till eventually it also fails. Presumably the same happens (read times out because feeding new data didn't happen in time).

How to fix this? I'm not sure, the timeout is AFAIK at 0.1s in aiohttp... Fetching this exception and just retry might be an option but not ideal (right?).

Example that crashes

import aiohttp
import asyncio


async def fetch(session, url):
    async with session.get(url) as response:
        while True:
            chunk = await response.content.read(512000)

            # a) same issue, eventually breaks
            # chunk, _ = await response.content.readchunk()

            # b) no effect
            # asyncio.sleep(0.001)

            if not chunk:
                break


async def main():
    async with aiohttp.ClientSession() as session:
        await fetch(session, 'https://speed.hetzner.de/10GB.bin')
        # await fetch(session, 'http://localhost:2280/huge.zip')
        # await fetch(session, 'http://localhost:2280/segin-rootfs-update-02.raucb')
        # await fetch(session, 'https://speed.hetzner.de/100MB.bin')
        print('Success')

loop = asyncio.get_event_loop()
loop.set_debug(True)
loop.run_until_complete(main())

Stacktrace

Traceback (most recent call last):
  File "aiohttp-rauc-hawkbit.py", line 30, in <module>
    loop.run_until_complete(main())
  File "/usr/local/opt/asdf/installs/python/3.6.5/lib/python3.6/asyncio/base_events.py", line 468, in run_until_complete
    return future.result()
  File "aiohttp-rauc-hawkbit.py", line 22, in main
    await fetch(session, 'https://speed.hetzner.de/10GB.bin')
  File "/usr/local/opt/asdf/installs/python/3.6.5/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "aiohttp-rauc-hawkbit.py", line 8, in fetch
    chunk = await response.content.read(51200)
  File "/usr/local/opt/asdf/installs/python/3.6.5/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/Users/livioso/.local/share/virtualenvs/third-party-api-experiments-pge9BDZy/lib/python3.6/site-packages/aiohttp/streams.py", line 319, in read
    await self._wait('read')
  File "/usr/local/opt/asdf/installs/python/3.6.5/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/Users/livioso/.local/share/virtualenvs/third-party-api-experiments-pge9BDZy/lib/python3.6/site-packages/aiohttp/streams.py", line 250, in _wait
    await waiter
  File "/Users/livioso/.local/share/virtualenvs/third-party-api-experiments-pge9BDZy/lib/python3.6/site-packages/aiohttp/helpers.py", line 671, in __exit__
    raise asyncio.TimeoutError from None
concurrent.futures._base.TimeoutError

TCP ZeroWindows

server at 2280

Bastian-Krause · 2018-06-11T13:30:40Z

Hi,

thanks for the follow-up. I tried to reproduce this using your example, but no luck. I successfully downloaded huge files from local and remote HTTP servers. I noticed that the CPU usage is pretty high, but as you said, your example is pretty much copied from the aiohttp Client API.

Maybe it is worth to open an issue with your example code at https://github.com/aio-libs/aiohttp? I think it's actually an aiohttp client issue.

If your bundle download works with..

chunk, _ = await response.content.readchunk()

..I guess we should merge that.

Bastian-Krause · 2018-06-11T13:31:25Z

Oh, you already opened an issue there, nice!

Bastian-Krause · 2018-06-12T10:27:37Z

Looking at the aiohttp issue it seems we are running into the aiohttp.ClientSession 5 minutes timeout. I think we want to create the aiohttp.ClientSession with something like:

timeout = ClientTimeout(total=None, connect=x, sock_connect=y, sock_read=z)
async with aiohttp.ClientSession(timeout=timeout) as session:
    ...

Bastian-Krause · 2018-06-12T12:31:26Z

@livioso Would you like to make a PR for this and test your use case with the new code?

livioso · 2018-06-12T13:08:45Z

Lets do this (a) set the timeout properly (e.g. remove the context manager with Timeout(3600), set timeout and (b) use readchunk which leads to a better / more optimal throughput (since read(512) eventually let to the timeout for us). I'll update this PR tomorrow. :)

Replaces read(chunk_size) with readchunk which reads as much as it can. This is the better approach than to choose a chunk_size that works well under all circumstances (HTTP, HTTPS). Also, sets the timeouts properly. Before the context helper didn't do anything since the requests would timeout after 5 minutes anyway. Now, the timeouts are respected as expected. Signed-off-by: Livio Bieri <livio@livio.li>

support for ClientTimeout was added in 3.3.1 Signed-off-by: Livio Bieri <livio@livio.li>

livioso · 2018-06-14T13:33:46Z

👋 Let me know if you have any inputs / concerns; this works for us. :)

livioso · 2018-06-24T20:10:18Z

@BastianStender Did you have time to go over it yet? :)

Bastian-Krause · 2018-06-29T12:42:43Z

@livioso Sorry for keeping you waiting. Unfortunately I can't dive into this until (hopefully) next week.

ejoerns

Successfully tested on our Demo hardware.

Wow, this is amazingly faster than the previous approach!

livioso force-pushed the master branch from 9eaf563 to 59a3e15 Compare June 6, 2018 12:43

livioso mentioned this pull request Jun 11, 2018

TimeoutError: Proper way to read chunk wise? aio-libs/aiohttp#3067

Closed

livioso closed this Jun 14, 2018

livioso force-pushed the master branch from 59a3e15 to 7abb898 Compare June 14, 2018 10:23

livioso reopened this Jun 14, 2018

Bump to latest aiohttp version

ef25426

support for ClientTimeout was added in 3.3.1 Signed-off-by: Livio Bieri <livio@livio.li>

livioso force-pushed the master branch from e7f080b to ef25426 Compare June 14, 2018 12:10

livioso changed the title ~~Increase default chunk_size to 512kb~~ Use read chunk and fix timeouts Jun 14, 2018

livioso changed the title ~~Use read chunk and fix timeouts~~ Use readchunk and fix timeouts Jun 14, 2018

hpatriarche mentioned this pull request Nov 27, 2018

Exception in asyncio and memory leak #13

Open

ejoerns approved these changes Mar 15, 2019

View reviewed changes

ejoerns added bug enhancement labels Mar 15, 2019

ejoerns merged commit b66c842 into rauc:master Mar 15, 2019

This was referenced Mar 15, 2019

ddi: client: replace deprecated aiohttp.Timeout #7

Closed

Travis build fails #6

Closed

ejoerns mentioned this pull request Mar 26, 2019

run download in separate task #12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use readchunk and fix timeouts #8

Use readchunk and fix timeouts #8

livioso commented Jun 6, 2018 •

edited

Loading

Bastian-Krause commented Jun 6, 2018

Bastian-Krause commented Jun 6, 2018

livioso commented Jun 6, 2018

livioso commented Jun 11, 2018 •

edited

Loading

Bastian-Krause commented Jun 11, 2018

Bastian-Krause commented Jun 11, 2018

Bastian-Krause commented Jun 12, 2018 •

edited

Loading

Bastian-Krause commented Jun 12, 2018

livioso commented Jun 12, 2018 •

edited

Loading

livioso commented Jun 14, 2018

livioso commented Jun 24, 2018

Bastian-Krause commented Jun 29, 2018

ejoerns left a comment

Use readchunk and fix timeouts #8

Use readchunk and fix timeouts #8

Conversation

livioso commented Jun 6, 2018 • edited Loading

Bastian-Krause commented Jun 6, 2018

Bastian-Krause commented Jun 6, 2018

livioso commented Jun 6, 2018

livioso commented Jun 11, 2018 • edited Loading

Example that crashes

Stacktrace

TCP ZeroWindows

Bastian-Krause commented Jun 11, 2018

Bastian-Krause commented Jun 11, 2018

Bastian-Krause commented Jun 12, 2018 • edited Loading

Bastian-Krause commented Jun 12, 2018

livioso commented Jun 12, 2018 • edited Loading

livioso commented Jun 14, 2018

livioso commented Jun 24, 2018

Bastian-Krause commented Jun 29, 2018

ejoerns left a comment

Choose a reason for hiding this comment

livioso commented Jun 6, 2018 •

edited

Loading

livioso commented Jun 11, 2018 •

edited

Loading

Bastian-Krause commented Jun 12, 2018 •

edited

Loading

livioso commented Jun 12, 2018 •

edited

Loading