Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[services] reliably retry all requests #13029

Merged
merged 7 commits into from
May 12, 2023

Conversation

danking
Copy link
Contributor

@danking danking commented May 10, 2023

request_retry_transient_errors is a charlatan. It does not retry errors that occur in reading the response body. I eliminated it in favor of the tried and true retry_transient_errors and some new helper methods that initiate the request and read the response.

`request_retry_transient_errors` is a charlatan. It does not retry errors that occur
in reading the response body. I eliminated it in favor of the tried and true
`retry_transient_errors` and some new helper methods that initiate the request *and*
read the response.
session,
'GET',
resp_json = await retry_transient_errors(
session.get_return_json,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking about calling this get_read_json. Thoughts? I'm trying to convey a distinction between this method and the one which returns a body reading handle.

@danking
Copy link
Contributor Author

danking commented May 10, 2023

An example of an error that wasn't retried properly:
https://ci.hail.is/batches/7377528/jobs/81

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiotools/copy.py", line 129, in <module>
    asyncio.run(main())
  File "/usr/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiotools/copy.py", line 119, in main
    await copy_from_dict(
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiotools/copy.py", line 85, in copy_from_dict
    await copy(
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiotools/copy.py", line 58, in copy
    copy_report = await Copier.copy(
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiotools/fs/copier.py", line 455, in copy
    await copier._copy(sema, copy_report, transfer, return_exceptions)
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiotools/fs/copier.py", line 548, in _copy
    raise e
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiotools/fs/copier.py", line 540, in _copy
    await bounded_gather2(sema, *[
  File "/usr/local/lib/python3.8/dist-packages/hailtop/utils/utils.py", line 545, in bounded_gather2
    return await bounded_gather2_raise_exceptions(sema, *pfs, cancel_on_error=cancel_on_error)
  File "/usr/local/lib/python3.8/dist-packages/hailtop/utils/utils.py", line 525, in bounded_gather2_raise_exceptions
    return await asyncio.gather(*tasks)
  File "/usr/local/lib/python3.8/dist-packages/hailtop/utils/utils.py", line 515, in run_with_sema
    return await pf()
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiotools/fs/copier.py", line 525, in _copy_one_transfer
    raise e
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiotools/fs/copier.py", line 504, in _copy_one_transfer
    await self.copy_source(sema, transfer, src_report, src, dest_type_task, return_exceptions)
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiotools/fs/copier.py", line 490, in copy_source
    await src_copier.copy(sema, source_report, return_exceptions)
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiotools/fs/copier.py", line 435, in copy
    raise e
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiotools/fs/copier.py", line 424, in copy
    raise result
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiotools/fs/copier.py", line 365, in copy_as_dir
    srcentries: Optional[AsyncIterator[FileListEntry]] = await files_iterator()
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiotools/fs/copier.py", line 357, in files_iterator
    return await self.router_fs.listfiles(src, recursive=True)
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiotools/router_fs.py", line 100, in listfiles
    return await fs.listfiles(url, recursive, exclude_trailing_slash_files)
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiocloud/aiogoogle/client/storage_client.py", line 715, in listfiles
    first_entry = await it.__anext__()
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiocloud/aiogoogle/client/storage_client.py", line 671, in _listfiles_recursive
    async for page in await self._storage_client.list_objects(bucket, params=params):
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiocloud/aiogoogle/client/storage_client.py", line 50, in __anext__
    self._page = await self._client.get(self._path, params=self._request_params, **self._request_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiocloud/common/base_client.py", line 23, in get
    async with await self._session.get(url, **kwargs) as resp:
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiocloud/common/session.py", line 18, in get
    return await self.request('GET', url, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiocloud/common/session.py", line 87, in request
    auth_headers = await self._credentials.auth_headers()
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiocloud/aiogoogle/credentials.py", line 89, in auth_headers
    self._access_token = await self._get_access_token()
  File "/usr/local/lib/python3.8/dist-packages/hailtop/aiocloud/aiogoogle/credentials.py", line 158, in _get_access_token
    return GoogleExpiringAccessToken.from_dict(await resp.json())
  File "/usr/local/lib/python3.8/dist-packages/aiohttp/client_reqrep.py", line 1099, in json
    await self.read()
  File "/usr/local/lib/python3.8/dist-packages/aiohttp/client_reqrep.py", line 1037, in read
    self._body = await self.content.read()
  File "/usr/local/lib/python3.8/dist-packages/aiohttp/streams.py", line 375, in read
    block = await self.readany()
  File "/usr/local/lib/python3.8/dist-packages/aiohttp/streams.py", line 397, in readany
    await self._wait("readany")
  File "/usr/local/lib/python3.8/dist-packages/aiohttp/streams.py", line 304, in _wait
    await waiter
  File "/usr/local/lib/python3.8/dist-packages/aiohttp/helpers.py", line 721, in __exit__
    raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError

Copy link
Contributor

@daniel-goldstein daniel-goldstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really take issue with these names. What I care most about here is that those methods properly use the response as a context manager, which I feel is easy to miss and a footgun, so I'm in favor of more convenience functions like these that return the end data type (like a Dict) instead of a httpx.ClientResponse

@@ -101,10 +101,11 @@ async def async_copy_paste_login(copy_paste_token: str, namespace: Optional[str]
raise_for_status=True,
timeout=aiohttp.ClientTimeout(total=5),
headers=headers) as session:
async with await request_retry_transient_errors(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an httpx.ClientSession

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. I removed another case of aiohttp.ClientSession from job.py as well. It now only appears in httpx.py.

@danking danking merged commit e553ebc into hail-is:main May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants