Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

completion: container diff: no such pod #23282

Closed
edsantiago opened this issue Jul 15, 2024 · 6 comments · Fixed by #23325
Closed

completion: container diff: no such pod #23282

edsantiago opened this issue Jul 15, 2024 · 6 comments · Fixed by #23325
Assignees
Labels
flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

Seen when running system tests in parallel:

# [12:02:39.744447467] $ bin/podman __completeNoDesc  container diff
# [12:02:40.521000319] [Debug] [Error] no such pod
# :4
# Completion ended with directive: ShellCompDirectiveNoFileComp
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #|     FAIL: container diff: actual container listed in suggestions
# #| expected: '.*-ctrt111_bcqa22xf
# ' (using expr)
# #|   actual: '[Debug] [Error] no such pod'
# #|         > ':4'
# #|         > 'Completion ended with directive: ShellCompDirectiveNoFileComp'
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I've been unable to reproduce

@edsantiago edsantiago added the flakes Flakes from Continuous Integration label Jul 15, 2024
@Luap99
Copy link
Member

Luap99 commented Jul 17, 2024

This looks very strange, I cannot see where the pod comes from here.

@Luap99
Copy link
Member

Luap99 commented Jul 17, 2024

Do you have the actual test log? Is this sqlite or boltdb?

@edsantiago
Copy link
Member Author

Seems to be both sqlite and boltdb. Here's f40 rootless in CI

@edsantiago
Copy link
Member Author

If you want a reproducer, pull #23275 and run hack/bats --rootless --tag=para (I absolutely need to come up with a better tag name than that!) It fails about half the time on my laptop. You will need to instrument podman somehow to dump debug info when this happens, because no useful state is preserved on failure.

@Luap99
Copy link
Member

Luap99 commented Jul 17, 2024

What I don't get though is why doe sit only print no such pod, This is a valid error message but I don't see any call where such message would not be at wrapped by more context.

@Luap99
Copy link
Member

Luap99 commented Jul 18, 2024

as I expected this is not related to shell completion at all as this just lists things, running

while :; do bin/podman ps -a --pod  || break ;done

in parallel to the system tests reproduces this as well. The important bit is --pod which is always used be the shell completion code but not for most normal podman ps calls so this is why you did not notice it elsewhere so far

@Luap99 Luap99 self-assigned this Jul 18, 2024
@Luap99 Luap99 added the kind/bug Categorizes issue or PR as related to a bug. label Jul 18, 2024
Luap99 added a commit to Luap99/libpod that referenced this issue Jul 18, 2024
The pod name was quiered without holding the container lock, thus it was
possible that the pod was deleted in the meantime and podman just failed
with "no such pod" as the errors.Is() check matched the wrong error.

Move it into the locked code this should prevent anyone from removing
the pod while the container is part of it. Also fix the returned error,
there is no reason to special case one specifc error just wrap any error
here so callers at least know where it happened.

Fixes containers#23282

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Luap99 added a commit to Luap99/libpod that referenced this issue Jul 18, 2024
The pod name was queried without holding the container lock, thus it was
possible that the pod was deleted in the meantime and podman just failed
with "no such pod" as the errors.Is() check matched the wrong error.

Move it into the locked code this should prevent anyone from removing
the pod while the container is part of it. Also fix the returned error,
there is no reason to special case one specific error just wrap any error
here so callers at least know where it happened.

Fixes containers#23282

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Luap99 added a commit to Luap99/libpod that referenced this issue Jul 18, 2024
The pod name was queried without holding the container lock, thus it was
possible that the pod was deleted in the meantime and podman just failed
with "no such pod" as the errors.Is() check matched the wrong error.

Move it into the locked code this should prevent anyone from removing
the pod while the container is part of it. Also fix the returned error,
there is no reason to special case one specific error just wrap any error
here so callers at least know where it happened.

Fixes containers#23282

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Luap99 added a commit to Luap99/libpod that referenced this issue Jul 18, 2024
The pod name was queried without holding the container lock, thus it was
possible that the pod was deleted in the meantime and podman just failed
with "no such pod" as the errors.Is() check matched the wrong error.

Move it into the locked code this should prevent anyone from removing
the pod while the container is part of it. Also fix the returned error,
there is no reason to special case one specific error just wrap any error
here so callers at least know where it happened.

Fixes containers#23282

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
edsantiago pushed a commit to edsantiago/libpod that referenced this issue Jul 18, 2024
The pod name was queried without holding the container lock, thus it was
possible that the pod was deleted in the meantime and podman just failed
with "no such pod" as the errors.Is() check matched the wrong error.

Move it into the locked code this should prevent anyone from removing
the pod while the container is part of it. Also fix the returned error,
there is no reason to special case one specific error just wrap any error
here so callers at least know where it happened. However this is not
good enough because the batch doesn't update the state which means it
see everything before the container was locked. In this case it might be
possible the ctr and pod was already removed so let the caller skip both
ctr and pod removed errors.

Fixes containers#23282

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
edsantiago pushed a commit to edsantiago/libpod that referenced this issue Jul 18, 2024
The pod name was queried without holding the container lock, thus it was
possible that the pod was deleted in the meantime and podman just failed
with "no such pod" as the errors.Is() check matched the wrong error.

Move it into the locked code this should prevent anyone from removing
the pod while the container is part of it. Also fix the returned error,
there is no reason to special case one specific error just wrap any error
here so callers at least know where it happened. However this is not
good enough because the batch doesn't update the state which means it
see everything before the container was locked. In this case it might be
possible the ctr and pod was already removed so let the caller skip both
ctr and pod removed errors.

Fixes containers#23282

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Oct 17, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Oct 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants