-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
open(), rename(), fstat() -> ENOENT #8443
Comments
WSL wasn't keeping stack of renamed files (microsoft/WSL#8443) resulting in the fstat calls in os_file_set_size returning ENOENT and no fallocate fallback being possible. Users reported MySQL was ok, and it used my_seek to determine the size. We copy this concept here to avoid the WSL bug.
WSL wasn't keeping stack of renamed files (microsoft/WSL#8443) resulting in the fstat calls in os_file_set_size returning ENOENT and no fallocate fallback being possible. Users reported MySQL was ok, and it used my_seek to determine the size. We copy this concept here to avoid the WSL bug.
Failed to reproduce on Windows 11 22000.795 and Windows 10 - 19044.1826 |
Our users are reporting this is back with Windows 10 19044.2311. |
Still a problem - MDEV-31486:
|
Did you try without Docker Desktop? Try to run the dockerd service with systemd or manually with |
Works on Windows 11 Pro 22H2 22621.1848 when running everything under WSL 2 with Ubuntu 22.04. The bug is still there when running latest version of Docker Desktop on Windows itself (can't run Linux container when starting dockerd inside cmd.exe on Windows). But even when running Docker inside WSL, there is still the rename table error "Tablespace is missing for a table" as mentioned in this issue which points to this one in MariaDB tracker which finally leads to the current issue of WSL. |
Hi Please fix this. Edition Windows 11 Pro I'm using WSL 2 Please let me know if you need any further information from me. |
I'm also facing this issue with WSL2 and MariaDB. |
May I suggest you try this on a Ubuntu image running on Windows with VirtualBox? Use same version of Ubuntu and Docker, and see if it also fails. Idea is to try to narrow where problem might be. |
If its Virtualbox VM that wouldn't be using WSL. All other container users of MariaDB based on Ubuntu image and native and VM Ubuntu have not reported a problem and the simple ALTER TABLE is executed in CI on Ubuntu based tests daily. However the following minimal container images have been made and the wsl8443.c file is in the top of the image.
Occurs to me that tmpfs/tmp storage may not be enough hence volume. https://quay.io/repository/danielgblack/wsl8443?tab=tags
A reproduction on this would look like "error on fstat errno 2" on output. |
This issue was openeed in May of 2022. |
Issue still persists as of today, 23.04.2024. Using Docker Desktop v4.29.0, WSL 2 (incl. Windows Subsystem for Linux Update - 5.10.102.2) The issue only occurs when I use bind mounted volumes on an external SSD (bind mount /h/stuff/source to /data). Using regular docker volumes works fine. Can we have someone look into this please? |
Ok, finally got around to look at this myself on a windows machine. Can confirm that this (whatever the root cause is) is still happening on
Last Update: KB5037853 I can confirm the observation from @holzerseb that it only happens if you mount a directory from the host file system into the $ docker volume create storage
$ docker run --rm -v "storage:/tmp" quay.io/danielgblack/wsl8443:ubunu2204 ./wsl8443
file /tmp/testfile opened - fd 3
stat fd 3 successful This is likely because volumes are hosted on the "normal" filesystem of the WSL distro hosting docker engine (c.f. https://stackoverflow.com/a/64430683/4087068), which would use a "normal" block device filesystem driver like $ docker run --rm -v "volume:/tmp" docker.io/bash sh -c "mount | grep /tmp"
/dev/sdf on /tmp type ext4 (rw,relatime,discard,errors=remount-ro,data=ordered) If you compare that to a mounted path, you'll see that it's using the $ mkdir hostdir
$ docker run --rm -v "$($(pwd).Path)\hostdir:/tmp" docker.io/bash sh -c "mount | grep /tmp"
C:\ on /tmp type 9p (rw,dirsync,noatime,aname=drvfs;path=C:\;uid=0;gid=0;metadata;symlinkroot=/mnt/,mmap,access=client,msize=65536,trans=fd,rfd=4,wfd=4) And running the reproducer above indeed results in the expected "error on fstat errno 2". The full strace I get there is: $ mkdir hostdir
$ docker run --rm -v "$($(pwd).Path)\hostdir:/tmp" quay.io/danielgblack/wsl8443:ubunu2204 ./wsl8443
file /tmp/testfile opened - fd 3
error on fstat errno 2
$ docker run --rm -v "$($(pwd).Path)\hostdir:/tmp" quay.io/danielgblack/wsl8443:ubunu2204 strace ./wsl8443
execve("./wsl8443", ["./wsl8443"], 0x7ffe38267590 /* 3 vars */) = 0
brk(NULL) = 0x55b6b28d7000
arch_prctl(0x3001 /* ARCH_??? */, 0x7fff8d629830) = -1 EINVAL (Invalid argument)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0f1586a000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=8735, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 8735, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0f15867000
close(3) = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\237\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
pread64(3, "\4\0\0\0 \0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0"..., 48, 848) = 48
pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\244;\374\204(\337f#\315I\214\234\f\256\271\32"..., 68, 896) = 68
newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=2216304, ...}, AT_EMPTY_PATH) = 0
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
mmap(NULL, 2260560, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0f1563f000
mmap(0x7f0f15667000, 1658880, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x28000) = 0x7f0f15667000
mmap(0x7f0f157fc000, 360448, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x7f0f157fc000
mmap(0x7f0f15854000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x214000) = 0x7f0f15854000
mmap(0x7f0f1585a000, 52816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0f1585a000
close(3) = 0
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0f1563c000
arch_prctl(ARCH_SET_FS, 0x7f0f1563c740) = 0
set_tid_address(0x7f0f1563ca10) = 9
set_robust_list(0x7f0f1563ca20, 24) = 0
rseq(0x7f0f1563d0e0, 0x20, 0, 0x53053053) = 0
mprotect(0x7f0f15854000, 16384, PROT_READ) = 0
mprotect(0x55b6b1a7f000, 4096, PROT_READ) = 0
mprotect(0x7f0f158a4000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
munmap(0x7f0f15867000, 8735) = 0
openat(AT_FDCWD, "/tmp/testfile", O_RDWR|O_CREAT|O_EXCL, 0660) = 3
write(2, "file /tmp/testfile opened - fd 3"..., 33file /tmp/testfile opened - fd 3
) = 33
rename("/tmp/testfile", "/tmp/movedtestfile") = 0
newfstatat(3, "", 0x7fff8d629840, AT_EMPTY_PATH) = -1 ENOENT (No such file or directory)
write(2, "error on fstat errno 2\n", 23error on fstat errno 2
) = 23
exit_group(1) = ?
+++ exited with 1 +++ So given the reproducer works, there are two options:
Hopefully this helps @grooverdan and/or the WSL team to narrow down the issue. Edit: $ wsl -v
WSL-Version: 2.1.5.0
Kernelversion: 5.15.146.1-2
WSLg-Version: 1.0.60
MSRDC-Version: 1.2.5105
Direct3D-Version: 1.611.1-81528511
DXCore-Version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows-Version: 10.0.22631.3672 |
As for what the behavior should be. The
So this should not happen and is indeed some sort of WSL bug. Interestingly, the POSIX definition of |
One more datapoint: I was curious and tried the reproducer on a directory mounted with $ ./wsl8443
file /tmp/9c/testfile opened - fd 3
stat fd 3 successful My takeaway from this is that it's probably a bug with the 9P server in windows. EDIT: If someone wants to try:
|
Since it seems to be specific to $ ./wsl8443
file /mnt/c/Users/{username}/testfile opened - fd 3
error on fstat errno 2 This is using a "normal" Ubuntu WSL distro opened with EDIT: Sorry for spamming y'all with emails. Thought I was done with this, but there's always something more to look at. EDIT 2: https://learn.microsoft.com/de-de/archive/blogs/wsl/wsl-file-system-support#interoperability-with-windows-1 feels relevant for this case. The reason I'm talking about drvfs at all is that the |
Version
Windows 11 Pro 22H2 22621.1848
Windows 11 Pro 22631.2506
Microsoft Windows [Version 10.0.19044.1706]
Windows 10 19044.2311
WSL Version
Kernel Version
5.10.102.1
Distro Version
Ubuntu 20.04
Other Software
Docker Desktop (Windows), version 4.8.2
Repro Steps
At a high level:
The calling of the system calls in this sequence:
As the fstat is on a open file descriptor it should return successful and the information about the file opened in the first step.
This sequence works fine on all Linux, BSD, Solaris systems that MariaDB has been running on. An
ALTER TABLE
is all that's required to trigger this and there's no non-WSL systems affected.This was reproduced with:
docker run --env MARIADB_ROOT_PASSWORD=bob --env MARIADB_DATABASE=test --env MARIADB_USER=test --env MARIADB_PASSWORD=test --name m107 -d mariadb:10.7.3
docker exec -ti m107 mariadb -u test -ptest test
Past the following SQL on the prompt
To get a strace:
--cap-add SYS_PTRACE
todocker run
command linedocker exec m107 sh -c 'apt-get update && apt-get install -y strace'
docker exec m107 strace -f -p1 -s 99 -o /var/lib/mysql/m.strace
Expected Behavior
All SQL successful and no server crashes.
show create table player_report
should show the result of the table alterations and created indexes.Actual Behavior
mariadb-crash-1.log
(evidently wollysocial was the original database name)
Diagnostic Logs
Original bug report: https://jira.mariadb.org/browse/MDEV-28580
extract from mariadb-1.strace
Note the preallocation ERROR isn't fatal (
fallocate
not implemented in WSL, and we do a write based fallback`).My analysis:
In the strace there is no closing of file descriptor 50.
So its looking like the final fstat on an open file descriptor is loosing track of the file because it was renamed earlier. Without this fstat there is a resulting in a -1 filesize (from the error) and the assertion.
Procmon trace from same time as strace: LOGFile3.PML.
The text was updated successfully, but these errors were encountered: