Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman cp under vfs: ENOENT #20282

Closed
edsantiago opened this issue Oct 5, 2023 · 7 comments · Fixed by #20912
Closed

podman cp under vfs: ENOENT #20282

edsantiago opened this issue Oct 5, 2023 · 7 comments · Fixed by #20912
Labels
flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

podman cp system tests flake under VFS, with pretty consistent symptoms:

not ok podman cp XXXXX
...
# podman cp something
Error: "/tmp/sub/weirdlink" could not be found on container cpcontainer: no such file or directory

or

# podman cp cpcontainer:/srv b55ec9a6aadd4d115ff483614fd5ac7e606acfb09b6a4516883538eca3357893:/
(works)
# podman exec b55ec9a6aadd4d115ff483614fd5ac7e606acfb09b6a4516883538eca3357893 cat //srv/subdir/containerfile0 //srv/subdir/containerfile1
random-0-QrZE7sZYZs     <------ this is containerfile0
cat: can't open '//srv/subdir/containerfile1': No such file or directory

I've been chasing this one all week because it's a hard blocker for #20161. It's a BAD flake, failing on almost every run, and pretty easily reproducible in 1minutetip. Despite my hopes this morning in seeing containers/storage#1724, that does not fix the problem: the 10-05 failures below include a vendored c/storage.

Seen in: podman/remote fedora-38 root/rootless boltdb/sqlite

"fedora-38" is only special because that's prior-fedora which means force VFS under the rules of #20161. It could possibly be a kernel or distro bug only on f38, but I choose not to even bother considering that right now. I'm treating it as a VFS driver bug.

@edsantiago edsantiago added flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. labels Oct 5, 2023
@edsantiago
Copy link
Member Author

Seems to be a race. A subsequent retry succeeds.

@edsantiago
Copy link
Member Author

Sigh. I had been hoping this was fixed by #20299 ... but nope.

@giuseppe
Copy link
Member

giuseppe commented Oct 11, 2023

I think this is something different than #20299, there is no real mountpoint with VFS since it is a plain directory so it cannot be "unmounted" unless it is deleted

Copy link

A friendly reminder that this issue had no activity for 30 days.

@edsantiago
Copy link
Member Author

  • fedora-38 : sys podman fedora-38 root host boltdb
    • 10-24 14:55 in [sys] podman cp dir from container to container
    • 10-17 17:48 in [sys] podman cp file from container to host
  • fedora-38 : sys podman fedora-38 root host sqlite
    • 10-19 17:52 in [sys] podman cp file from host to container
  • fedora-38 : sys podman fedora-38 rootless host boltdb
    • 10-17 15:43 in [sys] podman cp into container: weird symlink expansion
  • fedora-38 : sys remote fedora-38 root host boltdb [remote]
    • 10-17 15:30 in [sys] podman cp file from container to container
    • 10-17 15:30 in [sys] podman cp dir from container to container
    • 10-17 15:30 in [sys] podman cp symlinked directory from container
    • 10-17 13:53 in [sys] podman cp into a subdirectory matching GraphRoot
    • 11-06 11:35 in [sys] [065] podman cp - dot notation - container to host
  • fedora-38 : sys remote fedora-38 root host sqlite [remote]
    • 10-26 16:53 in [sys] podman cp dir from container to container
    • 10-25 09:40 in [sys] podman cp file from container to container
    • 10-17 17:30 in [sys] podman cp symlinked directory from container

Seen in: sys podman+remote fedora-38 root+rootless host boltdb+sqlite

@cevich
Copy link
Member

cevich commented Dec 4, 2023

Possibly related: podman cp - dot notation - container to host

Annotated log

Looked at the test in test/system/065-cp.bats. I think the assumption that "I got a container ID" and "my container command finished executing" isn't safe. IIUC you get a container ID when 'bash' (or whatever command) has been started, not necessarily finished (esp. b/c run -d). Perhaps more significantly, storage speed in the cloud can be unpredictably really slow and "hiccup" prone.

Ideally, this test would use some mechanism to first check (inside the container) if the launch command (i.e. mkdir...touch...etc) completed. Practically, lazily, I'd just toss a sleep 3s between the container start and the check 😆

@edsantiago
Copy link
Member Author

Nice catch, thank you! Fix in progress.

edsantiago added a commit to edsantiago/libpod that referenced this issue Dec 5, 2023
Some of the tests were doing "podman run -d" without wait_for_ready.
This may be the cause of some of the CI flakes. Maybe even all?
It's not clear why the tests have been working reliably for years
under overlay, and only started failing under vfs, but shrug.

Thanks to Chris for making that astute observation.

Fixes: containers#20282  (I hope)

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/libpod that referenced this issue Dec 5, 2023
Some of the tests were doing "podman run -d" without wait_for_ready.
This may be the cause of some of the CI flakes. Maybe even all?
It's not clear why the tests have been working reliably for years
under overlay, and only started failing under vfs, but shrug.

Thanks to Chris for making that astute observation.

Fixes: containers#20282  (I hope)

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/libpod that referenced this issue Dec 5, 2023
Some of the tests were doing "podman run -d" without wait_for_ready.
This may be the cause of some of the CI flakes. Maybe even all?
It's not clear why the tests have been working reliably for years
under overlay, and only started failing under vfs, but shrug.

Thanks to Chris for making that astute observation.

Fixes: containers#20282  (I hope)

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/libpod that referenced this issue Dec 5, 2023
Some of the tests were doing "podman run -d" without wait_for_ready.
This may be the cause of some of the CI flakes. Maybe even all?
It's not clear why the tests have been working reliably for years
under overlay, and only started failing under vfs, but shrug.

Thanks to Chris for making that astute observation.

Fixes: containers#20282  (I hope)

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/libpod that referenced this issue Dec 5, 2023
Some of the tests were doing "podman run -d" without wait_for_ready.
This may be the cause of some of the CI flakes. Maybe even all?
It's not clear why the tests have been working reliably for years
under overlay, and only started failing under vfs, but shrug.

Thanks to Chris for making that astute observation.

Fixes: containers#20282  (I hope)

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/libpod that referenced this issue Dec 5, 2023
Some of the tests were doing "podman run -d" without wait_for_ready.
This may be the cause of some of the CI flakes. Maybe even all?
It's not clear why the tests have been working reliably for years
under overlay, and only started failing under vfs, but shrug.

Thanks to Chris for making that astute observation.

Fixes: containers#20282  (I hope)

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/libpod that referenced this issue Dec 5, 2023
Some of the tests were doing "podman run -d" without wait_for_ready.
This may be the cause of some of the CI flakes. Maybe even all?
It's not clear why the tests have been working reliably for years
under overlay, and only started failing under vfs, but shrug.

Thanks to Chris for making that astute observation.

Fixes: containers#20282  (I hope)

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/libpod that referenced this issue Dec 6, 2023
Some of the tests were doing "podman run -d" without wait_for_ready.
This may be the cause of some of the CI flakes. Maybe even all?
It's not clear why the tests have been working reliably for years
under overlay, and only started failing under vfs, but shrug.

Thanks to Chris for making that astute observation.

Fixes: containers#20282  (I hope)

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/libpod that referenced this issue Dec 6, 2023
Some of the tests were doing "podman run -d" without wait_for_ready.
This may be the cause of some of the CI flakes. Maybe even all?
It's not clear why the tests have been working reliably for years
under overlay, and only started failing under vfs, but shrug.

Thanks to Chris for making that astute observation.

Fixes: containers#20282  (I hope)

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/libpod that referenced this issue Jan 10, 2024
Some of the tests were doing "podman run -d" without wait_for_ready.
This may be the cause of some of the CI flakes. Maybe even all?
It's not clear why the tests have been working reliably for years
under overlay, and only started failing under vfs, but shrug.

Thanks to Chris for making that astute observation.

Fixes: containers#20282  (I hope)

Signed-off-by: Ed Santiago <santiago@redhat.com>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Mar 6, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants