Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs.copyFileSync hangs for file created within same layer in overlay filesystem #40200

Closed
joe-barnett opened this issue Sep 23, 2021 · 17 comments
Closed
Labels
fs Issues and PRs related to the fs subsystem / file system. libuv Issues and PRs related to the libuv dependency or the uv binding.

Comments

@joe-barnett
Copy link

joe-barnett commented Sep 23, 2021

Environment

  • Platform: macOS Big Sur 11.6 on intel macbook pro
  • Docker Version: version 20.10.8, build 3967b7d
  • Node.js Version: v16.10.0
  • Image Tag: node:current-alpine3.11

Expected Behavior

See repro here. In the Dockerfile there, I copy a file with cp and then try to copy the same file again with fs.copyFileSync. I expect this to work and docker build . to succeed.

Current Behavior

docker build . hangs indefinitely.

The same issue applies to fs.copyFile and fs.promises.copyFile.

Possible Solution

Don't know

Steps to Reproduce

See https://github.com/joe-barnett/node-docker-issue

On macOS:

git clone git@github.com:joe-barnett/node-docker-issue.git
cd node-docker-issue
docker build .
@yosifkit
Copy link

This is reproducible on a regular Linux host as well. This happens specifically when using copyFileSync on a file created in the same layer and it doesn't matter if it is being copied within the overlay filesystem or to a volume (ie, non-overlay folder), so this seems like a bug in fs.copyFileSync when acting on an overlay filesystem. It should be possible to test it outside docker with mount (https://wiki.archlinux.org/title/Overlay_filesystem).

@nschonni
Copy link
Member

@Trott does it make sense to transfer this to the regular node issue tracker?

@Trott
Copy link
Member

Trott commented Sep 24, 2021

@Trott does it make sense to transfer this to the regular node issue tracker?

I think so. (And if not, we can transfer it back.) Transferring....

@Trott Trott transferred this issue from nodejs/docker-node Sep 24, 2021
@Trott
Copy link
Member

Trott commented Sep 24, 2021

@nodejs/fs

@Mesteery Mesteery added the fs Issues and PRs related to the fs subsystem / file system. label Sep 24, 2021
@stefandesu
Copy link

stefandesu commented Oct 13, 2021

I'm encountering this issue as well. Is there any way to work around it for now? I tried not using fs.copyFileSync, but the build tool I'm using seems to use it as well and I can't change that.

Edit: It works on Ubuntu 18.04!

@Trott
Copy link
Member

Trott commented Oct 13, 2021

Is everyone that is experiencing this on a Linux host or in a Linux container? I'm trying to replicate on macOS without docker and I can't seem to do it.

@stefandesu
Copy link

Is everyone that is experiencing this on a Linux host or in a Linux container? I'm trying to replicate on macOS without docker and I can't seem to do it.

I feel like this is related to Docker (and/or the overlay file system). I encountered the issue in my GitHub Workflow to build a container for Docker. After switching to Ubuntu 18.04, without any other changes, it worked again.

@joe-barnett joe-barnett changed the title fs.copyFileSync hangs after using cp fs.copyFileSync hangs for file created within same layer in overlay filesystem Oct 14, 2021
@joe-barnett
Copy link
Author

yes I believe this is specific to overlay filesystems - I've updated the title accordingly

@Trott
Copy link
Member

Trott commented Oct 16, 2021

@nodejs/fs @nodejs/docker Is anyone able to debug this to figure out if the issue is in Node.js core or libuv or docker or this is not-intuitive-but-working-as-expected or something else?

@cjihrig
Copy link
Contributor

cjihrig commented Oct 16, 2021

This is almost certainly a libuv issue.

@Trott Trott added the libuv Issues and PRs related to the libuv dependency or the uv binding. label Oct 19, 2021
@Trott
Copy link
Member

Trott commented Oct 19, 2021

@nodejs/libuv

@Linkgoron
Copy link
Member

Given that Node freezes, I think that there might be an issue with the while loop here in libuv (maybe a race condition?):

https://github.com/nodejs/node/blob/master/deps/uv/src/unix/fs.c#L1326

I'm not very familiar with the code, and sadly couldn't reproduce it locally - but I think I can think of two issues that might happen here. There's a race condition where the size of the file somehow changes and bytes_to_send somehow becomes negative, or bytes_written becomes 0 and the while loop never ends? Sadly, I couldn't make it work correctly on my system to check.

I think that this is the parallel code for Windows, it's a bit different and does check if it's non-negative and also checks if it has 0 bytes to write :
https://github.com/nodejs/node/blob/master/deps/uv/src/win/fs.c#L2109

@ntr-808
Copy link

ntr-808 commented Nov 5, 2021

I experienced the same issue copying from /tmp in a container to a folder that's volume mounted.
Putting the tmp file in the volume mount resolved the issue.

I can give more information if needed.

v14.17.5

@cdaringe
Copy link

cdaringe commented Nov 5, 2021

  • mac docker
  • alpine only (maybe, tested debian distros and it worked fine)

one command repro: curl https://gist.githubusercontent.com/cdaringe/e38880c5138a6ab4f6d48455e9fcc212/raw | bash (warning--because sync code is looping, node will be completely unresponsive)

what's particularly interesting in the repo is that i do copyFileSync on one file--works perfect. i then cp f1 f2, and afterwards copyFileSync(f2, f3), and the failure is induced. stat, diff, etc seem to look OK between files.

@dpchamps did an strace and found an infinite loop in there, he can add the deets

bytes_written becomes 0 and the while loop never ends?

i believe this is what we observe w/ strace

@dpchamps
Copy link

dpchamps commented Nov 7, 2021

After digging into this issue to a pretty deep extent, I'm not sure it's a nodejs or libuv bug (initially, I was convinced it was 😅 ).

Here's my notes and some reproducible examples that have nothing to do with node: https://github.com/dpchamps/sendfile-overlay-bug-repro.

Something at the overlay/overlay2 storage driver layer is causing some syscalls to behave unexpectedly in very specific circumstances. It's out of my skillset to figure out what's going with those storage drivers.

It might be helpful to the node/libuv devs to recap what I discovered here in this thread:

  • After a file has had bytes copied to it via the sendfile syscall, copy_file_range and sendfile start copying zero bytes from that file to another file
  • This is problematic as any files created via cp (or other things probably) are going to cause this error condition -- and also just make things start behaving unexpectedly.
  • straceing a hanging node process showed it stuck in an infinite loop with the following
    • copy_file_range($fd_in, [0], $fd_out, NULL, $file_size, 0) = 0

I'm not really sure what you might do at the libuv level to preempt this, but it's worth pointing out that copy_file_range returning zero indicates a condition that might be handled:

If the file offset of fd_in is at or past the end of file, no bytes are copied, and copy_file_range() returns zero

@ethomson
Copy link

ethomson commented Dec 8, 2021

We started seeing this after a node update that pulled in the changes that moved fs.copyFileSync over to using copy_file-range. We've been seeing this in certain circumstances, especially in EKS and AKS runtimes using overlay filesystems.

Digging in a bit more, we're seeing it in copy_file_range on overlay filesystems (independent of sendfile). Here's my reproducer that simplifies @dpchamps reproducer a bit: https://gist.github.com/ethomson/1caaa1773d3ce6a79cce6648456998ec

It looks like Docker has been tracking this issue here: docker/for-linux#1015

And it looks like this was a kernel bug in certain 5.x kernels, and is being tracked here: https://lore.kernel.org/stable/Yanx6KobwiQoBQfU@kroah.com/

@santigimeno
Copy link
Member

It seems this was already fixed in the kernel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. libuv Issues and PRs related to the libuv dependency or the uv binding.
Projects
None yet
Development

No branches or pull requests