Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[copy] fix the TimeoutError and ServerDisconnected issues in copy #195

Merged
merged 3 commits into from
May 13, 2022

Conversation

vladsavelyev
Copy link

Daniel King added 3 commits May 13, 2022 10:24
OK, there were two problems:

1. A timeout of 5s appears to be now too short for Google Cloud Storage. I am not sure why but we
   timeout substantially more frequently. I have observed this myself on my laptop. Just this
   morning I saw it happen to Daniel.

2. When using an `aiohttp.AsyncIterablePayload`, it is *critical* to always check if the coroutine
   which actually writes to GCS (which is stashed in the variable `request_task`) is still
   alive. In the current `main`, we do not do this which causes hangs (in particular the timeout
   exceptions are never thrown ergo we never retry).

To understand the second problem, you must first recall how writing works in aiogoogle. There are
two Tasks and an `asyncio.Queue`. The terms "writer" and "reader" are somewhat confusing, so let's
use left and right. The left Task has the owning reference to both the source "file" and the
destination "file". In particular, it is the *left* Task which closes both "files". Moreover, the
left Task reads chunks from the source file and places those chunks on the `asyncio.Queue`. The
right Task takes chunks off the queue and writes those chunks to the destination file.

This situation can go awry in two ways.

First, if the right Task encounters any kind of failure, it will stop taking chunks off of the
queue. When the queue (which has a size limit of one) is full, the left Task will hang. The system
is stuck. The left Task will wait forever for the right Task to empty the queue.

The second scenario is exactly the same except that the left Task is trying to add the "stop"
message to the queue rather than a chunk.

In either case, it is critical that the left Task waits simultaneously on the queue operation *and*
on the right Task completing. If the right Task has died, no further writes can occur and the left
Task must raise an exception. In the first scenario, we do not observe the right Task's exception
because that will be done when we close the `InsertObjectStream` (which represents the destination
"file").

---

I also added several types, assertions, and a few missing `async with ... as resp:` blocks.
@vladsavelyev vladsavelyev requested a review from lgruen May 13, 2022 00:28
@vladsavelyev vladsavelyev merged commit ad1fc0e into main May 13, 2022
@vladsavelyev vladsavelyev deleted the merge-fix-gcs-copy branch May 13, 2022 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants