podman cp under vfs: ENOENT #20282

edsantiago · 2023-10-05T21:13:05Z

podman cp system tests flake under VFS, with pretty consistent symptoms:

not ok podman cp XXXXX
...
# podman cp something
Error: "/tmp/sub/weirdlink" could not be found on container cpcontainer: no such file or directory

or

# podman cp cpcontainer:/srv b55ec9a6aadd4d115ff483614fd5ac7e606acfb09b6a4516883538eca3357893:/
(works)
# podman exec b55ec9a6aadd4d115ff483614fd5ac7e606acfb09b6a4516883538eca3357893 cat //srv/subdir/containerfile0 //srv/subdir/containerfile1
random-0-QrZE7sZYZs     <------ this is containerfile0
cat: can't open '//srv/subdir/containerfile1': No such file or directory

I've been chasing this one all week because it's a hard blocker for #20161. It's a BAD flake, failing on almost every run, and pretty easily reproducible in 1minutetip. Despite my hopes this morning in seeing containers/storage#1724, that does not fix the problem: the 10-05 failures below include a vendored c/storage.

fedora-38 : sys podman fedora-38 root host boltdb
- PR CI: test overlay and vfs #20161
  - 10-05 16:31 in [sys] podman cp symlinked directory from container
  - 10-02 15:22 in [sys] podman cp dir from container to container
fedora-38 : sys podman fedora-38 root host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 10-05 16:25 in [sys] podman cp file from container to container
  - 10-05 16:25 in [sys] podman cp dir from container to container
fedora-38 : sys podman fedora-38 rootless host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 10-05 16:22 in [sys] podman cp file from container to container
  - 10-05 16:22 in [sys] podman cp dir from container to container
fedora-38 : sys remote fedora-38 root host boltdb [remote]
- PR CI: test overlay and vfs #20161
  - 10-04 14:15 in [sys] podman cp file from container to container
  - 10-03 12:23 in [sys] podman cp file from container to container
fedora-38 : sys remote fedora-38 root host sqlite [remote]
- PR Ed's pet PR with no flake retries #17831
  - 10-05 16:12 in [sys] podman cp file from container to container
  - 10-05 16:12 in [sys] podman cp dir from container to container
  - 10-05 16:12 in [sys] podman cp symlinked directory from container
  - 10-05 16:12 in [sys] podman cp into container: weird symlink expansion
  - 10-05 16:12 in [sys] podman cp - dot notation - container to host

Seen in: podman/remote fedora-38 root/rootless boltdb/sqlite

"fedora-38" is only special because that's prior-fedora which means force VFS under the rules of #20161. It could possibly be a kernel or distro bug only on f38, but I choose not to even bother considering that right now. I'm treating it as a VFS driver bug.

The text was updated successfully, but these errors were encountered:

edsantiago · 2023-10-06T01:52:19Z

Seems to be a race. A subsequent retry succeeds.

edsantiago · 2023-10-10T17:47:37Z

Sigh. I had been hoping this was fixed by #20299 ... but nope.

giuseppe · 2023-10-11T09:15:05Z

I think this is something different than #20299, there is no real mountpoint with VFS since it is a plain directory so it cannot be "unmounted" unless it is deleted

github-actions · 2023-11-11T00:06:26Z

A friendly reminder that this issue had no activity for 30 days.

edsantiago · 2023-11-11T01:56:32Z

fedora-38 : sys podman fedora-38 root host boltdb
- 10-24 14:55 in [sys] podman cp dir from container to container
- 10-17 17:48 in [sys] podman cp file from container to host
fedora-38 : sys podman fedora-38 root host sqlite
- 10-19 17:52 in [sys] podman cp file from host to container
fedora-38 : sys podman fedora-38 rootless host boltdb
- 10-17 15:43 in [sys] podman cp into container: weird symlink expansion
fedora-38 : sys remote fedora-38 root host boltdb [remote]
- 10-17 15:30 in [sys] podman cp file from container to container
- 10-17 15:30 in [sys] podman cp dir from container to container
- 10-17 15:30 in [sys] podman cp symlinked directory from container
- 10-17 13:53 in [sys] podman cp into a subdirectory matching GraphRoot
- 11-06 11:35 in [sys] [065] podman cp - dot notation - container to host
fedora-38 : sys remote fedora-38 root host sqlite [remote]
- 10-26 16:53 in [sys] podman cp dir from container to container
- 10-25 09:40 in [sys] podman cp file from container to container
- 10-17 17:30 in [sys] podman cp symlinked directory from container

Seen in: sys podman+remote fedora-38 root+rootless host boltdb+sqlite

cevich · 2023-12-04T21:34:15Z

Possibly related: podman cp - dot notation - container to host

Annotated log

Looked at the test in test/system/065-cp.bats. I think the assumption that "I got a container ID" and "my container command finished executing" isn't safe. IIUC you get a container ID when 'bash' (or whatever command) has been started, not necessarily finished (esp. b/c run -d). Perhaps more significantly, storage speed in the cloud can be unpredictably really slow and "hiccup" prone.

Ideally, this test would use some mechanism to first check (inside the container) if the launch command (i.e. mkdir...touch...etc) completed. Practically, lazily, I'd just toss a sleep 3s between the container start and the check 😆

edsantiago · 2023-12-05T16:41:52Z

Nice catch, thank you! Fix in progress.

Some of the tests were doing "podman run -d" without wait_for_ready. This may be the cause of some of the CI flakes. Maybe even all? It's not clear why the tests have been working reliably for years under overlay, and only started failing under vfs, but shrug. Thanks to Chris for making that astute observation. Fixes: containers#20282 (I hope) Signed-off-by: Ed Santiago <santiago@redhat.com>

edsantiago added flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. labels Oct 5, 2023

edsantiago mentioned this issue Oct 18, 2023

CI: test overlay and vfs #20161

Merged

edsantiago mentioned this issue Nov 2, 2023

podman logs: error resolving symbol "sd_journal_xxx": libsystemd-journal.so.0: cannot open shared object file: ENOENT #20569

Closed

github-actions bot added the stale-issue label Nov 11, 2023

edsantiago removed the stale-issue label Nov 11, 2023

cevich mentioned this issue Dec 4, 2023

Implement bare-metal Mac M1 podman-machine testing #20691

Merged

edsantiago mentioned this issue Dec 5, 2023

systests: cp: add wait_for_ready #20912

Merged

openshift-merge-bot bot closed this as completed in #20912 Dec 6, 2023

edsantiago mentioned this issue Jan 10, 2024

[v4.8] systests: cp: add wait_for_ready #21220

Merged

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Mar 6, 2024

github-actions bot locked as resolved and limited conversation to collaborators Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

podman cp under vfs: ENOENT #20282

podman cp under vfs: ENOENT #20282

edsantiago commented Oct 5, 2023

edsantiago commented Oct 6, 2023

edsantiago commented Oct 10, 2023

giuseppe commented Oct 11, 2023 •

edited

Loading

github-actions bot commented Nov 11, 2023

edsantiago commented Nov 11, 2023

cevich commented Dec 4, 2023 •

edited

Loading

edsantiago commented Dec 5, 2023

podman cp under vfs: ENOENT #20282

podman cp under vfs: ENOENT #20282

Comments

edsantiago commented Oct 5, 2023

edsantiago commented Oct 6, 2023

edsantiago commented Oct 10, 2023

giuseppe commented Oct 11, 2023 • edited Loading

github-actions bot commented Nov 11, 2023

edsantiago commented Nov 11, 2023

cevich commented Dec 4, 2023 • edited Loading

edsantiago commented Dec 5, 2023

giuseppe commented Oct 11, 2023 •

edited

Loading

cevich commented Dec 4, 2023 •

edited

Loading