Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[batch] ImageCannotBePulled can trigger FileNotFoundError #13907

Closed
danking opened this issue Oct 25, 2023 · 0 comments · Fixed by #13911
Closed

[batch] ImageCannotBePulled can trigger FileNotFoundError #13907

danking opened this issue Oct 25, 2023 · 0 comments · Fixed by #13911
Assignees

Comments

@danking
Copy link
Contributor

danking commented Oct 25, 2023

What happened?

When the image cannot be pulled, the exception can trigger a FileNotFoundError reading the main container log.

https://cloudlogging.app.goo.gl/5h9Q9MUG7KdZRVXN9

Version

0.2.124

Relevant log output

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/batch/worker/worker.py", line 551, in pull
    await docker_call_retry(
  File "/usr/local/lib/python3.9/dist-packages/hailtop/utils/utils.py", line 840, in retry_transient_errors_with_debug_string
    return await f(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/batch/worker/worker.py", line 460, in timed_out_f
    return await asyncio.wait_for(f(*args, **kwargs), timeout)
  File "/usr/lib/python3.9/asyncio/tasks.py", line 479, in wait_for
    return fut.result()
  File "/usr/local/lib/python3.9/dist-packages/batch/worker/worker.py", line 484, in _pull_with_auth_refresh
    return await docker.images.pull(image_ref_str, auth=credentials)
  File "/usr/local/lib/python3.9/dist-packages/aiodocker/images.py", line 133, in _handle_list
    async with cm as response:
  File "/usr/local/lib/python3.9/dist-packages/aiodocker/utils.py", line 309, in __aenter__
    resp = await self._coro
  File "/usr/local/lib/python3.9/dist-packages/aiodocker/docker.py", line 275, in _do_query
    raise DockerError(response.status, json.loads(what.decode("utf8")))
aiodocker.exceptions.DockerError: DockerError(500, 'Head "https://us-docker.pkg.dev/v2/1/does-not-exist/manifests/latest": denied: Permission "artifactregistry.repositories.downloadArtifacts" denied on resource "projects/1/locations/us/repositories/does-not-exist" (or it may not exist)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/batch/worker/worker.py", line 915, in run
    await self.create()
  File "/usr/local/lib/python3.9/dist-packages/batch/worker/worker.py", line 840, in create
    await self._run_until_done_or_deleted(self.image.pull)
  File "/usr/local/lib/python3.9/dist-packages/batch/worker/worker.py", line 1012, in _run_until_done_or_deleted
    return await run_until_done_or_deleted(self.deleted_event, f, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/batch/worker/worker.py", line 682, in run_until_done_or_deleted
    return step.result()
  File "/usr/local/lib/python3.9/dist-packages/batch/worker/worker.py", line 657, in pull
    await asyncio.shield(self._localize_rootfs())
  File "/usr/local/lib/python3.9/dist-packages/batch/worker/worker.py", line 634, in _localize_rootfs
    await self._pull_image()
  File "/usr/local/lib/python3.9/dist-packages/batch/worker/worker.py", line 587, in _pull_image
    await pull()
  File "/usr/local/lib/python3.9/dist-packages/batch/worker/worker.py", line 566, in pull
    raise ImageCannotBePulled from e
ImageCannotBePulled

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/batch/worker/worker.py", line 1887, in run_container
    await container.run(on_completion)
  File "/usr/local/lib/python3.9/dist-packages/batch/worker/worker.py", line 920, in run
    await on_completion(*args, **kwargs)
  File "/usr/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.9/dist-packages/hailtop/utils/utils.py", line 1154, in step
    yield
  File "/usr/local/lib/python3.9/dist-packages/batch/worker/worker.py", line 1873, in on_completion
    await self.worker.fs.read(container.log_path),
  File "/usr/local/lib/python3.9/dist-packages/hailtop/aiotools/fs/fs.py", line 281, in read
    async with await self.open(url) as f:
  File "/usr/local/lib/python3.9/dist-packages/hailtop/aiotools/router_fs.py", line 76, in open
    return await fs.open(url)
  File "/usr/local/lib/python3.9/dist-packages/hailtop/aiotools/local_fs.py", line 252, in open
    f = await blocking_to_async(self._thread_pool, open, self._get_path(url), 'rb')
  File "/usr/local/lib/python3.9/dist-packages/hailtop/utils/utils.py", line 181, in blocking_to_async
    return await asyncio.get_event_loop().run_in_executor(
  File "/usr/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.9/dist-packages/hailtop/utils/utils.py", line 182, in <lambda>
    thread_pool, lambda: fun(*args, **kwargs))
FileNotFoundError: [Errno 2] No such file or directory: '/batch/00a8b257731544b494247db2813c7a83/main/container.log'
@daniel-goldstein daniel-goldstein self-assigned this Oct 26, 2023
danking pushed a commit that referenced this issue Oct 27, 2023
If a container is deleted before it ever runs, the log files won't
exist.

Fixes #13906
Fixes #13907
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants