-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Yarn fails with ESOCKETTIMEDOUT while installing a large package on a slow disk #8242
Comments
Windows 10. Executed: Failed: |
I'm experiencing the same exact issue. |
@AmirTugi the workaround that works for me is to do However, that's just a workaround, not a fix. |
Right, I tried to raise it to |
We are facing this issue on our pipeline servers too.. Has anything changed on yarn? |
me too, I had tries more than 30 times these days and always got timeout, which is so annoying |
Even, I see a the same problem in the app center build services. Can anyone please look into this issue? Is there any yarn status page available? |
The previous failures I observed corresponded to npm outages shown on this page - https://status.npmjs.org/ |
@darkk did a wonderful description of the bug. I proposed a PR with his proposal fix in mind:
|
I'm also seeing this on my builds on a Digital Ocean (SSD) build server the last couple of days (since setting the build server up). |
This is happening on GH actions for us as well. Every day a few of our checks fail, because of that. Current solution: rerun... |
For github actions failures you might like https://github.com/nick-invision/retry/ |
Closing as fixed in v2 where the timeout logic is less susceptible to this sort of issue |
I have a few questions.
Would appreciate if you share some information. Thanks! |
@jjangga0214 WRT Q#2. It might happen on any OS. It's just more probable to trigger the bug on macOS and Windows due to performance characteristics of the filesystems. HDD (or any other high-latency medium) instead of SSD also increases the probability. |
thx so much!!!! |
Bug description
Windows build of our electron app is consistently failing.
yarn install --frozen-lockfile
failed to downloadhttps://registry.yarnpkg.com/date-fns/-/date-fns-2.12.0.tgz
andhttps://registry.yarnpkg.com/@material-ui/icons/-/icons-4.9.1.tgz
. Failure to download@material-ui/icons
was reported withESOCKETTIMEDOUT
. However, I expected the buildhost to have a well-provisioned network as it was a Github Actions runner. Linux build was working fine.I assumed that high-latency disk IO may be a reason and managed to get a test-case that reproduces the issue reliably:
ESOCKETTIMEDOUT
is reliably triggered on Linux when small and realistic delay (8ms) is injected to disk IO system calls.ESOCKETTIMEDOUT
being reported because of slow disk IO is very confusing behavior, as it sounds like temporary network error while the root cause is different. It does not match my understanding of "Developing Javascript projects shouldn't leave the door open to surprises" motto, so I'm reporting this test-case as a separate issue despite possible duplicates in the issue tracker. 🙂Command
What is the current behavior?
What is the expected behavior?
If I run exactly same command with delay_exit=1 (0.001ms) instead of delay_exit=8000 (8ms), I get the expected behavior:
Steps to Reproduce
First,
strace
adds some overhead on it's own and it may affect reproducibility. E.g.yarn add @material-ui/icons@^4.5.1
isDone in 5.76s.
in the very same environment withoutstrace
wrapper. That's why I compare strace-with-delay to strace-without-delay and don't compare it to "clean" run.Second, I've taken
stat()
call from the following:strace -f -o ~/yarn-trace yarn add @material-ui/icons@^4.5.1
grep -F AccessAlarmsRounded.d.ts ~/yarn-trace
. It had 5openat()
calls, 4lstat()
calls, 1stat()
call, 1chmod()
call. So I've takenstat(/usr/local/share/.cache/yarn/v6/.../AccessAlarmsRounded.d.ts)
as a place to inject delay to.Third, I've taken 8ms delay assuming that there is single
stat()
system call per unpacked file and I was emulating HDD-based system having 125 IOPS performance. It's all a ballpark estimate: 1ms delay works on my system, 2ms fails withESOCKETTIMEDOUT
once but manages to install a package after retry, 4ms and 8ms fail reliably.Fourth, as soon as TCP buffering is involved (see comment on TCP ZeroWindow later), available network bandwidth and size of socket buffer may be also a factor playing a role in bug reproducibility. I've reproduced the bug with these exact variable values with Ubuntu 16.04 laptop connected by 100 Mbit/s link in Russian St. Petersburg and on Linode VM in Newark (see below).
Fifth, your
node
build may interact with OS kernel a bit differently, e.g. it may useopen()
instead ofopenat()
. So, if the test-case fails for you, try to increase the injected latency for the disk-related system call or change a disk-related system call. I reproduced the issue on Ubuntu 18.04 VM in Linode Newark availability zone, but I had to useopenat
as a latency-injection point instead ofstat
. 4statx()
and 3openat()
syscalls were made for the aforementioned filename at that VM.Comments and assumptions
SQLite has "faster than FS" benchmarks showing that Windows had pretty bad performance (compared to Linux) while operating with lots of small files. Both
date-fns
and@material-ui/icons
have thousands of files as well as packages mentioned in "Possible duplicates" section. That explains that Windows users are suffering way more fromESOCKETTIMEDOUT
happening while installing packages with thousands of files.@FredyC came to the same idea that high-latency HDD being used instead of low-latency SSD triggers the
ESOCKETTIMEDOUT
in #6221 (comment)@Hinaser made an excellent comment describing packet capture #5259 (comment)
yarn
probably stops reading from a socket (so client OS sendsTCP ZeroWindow
) and eventually closes the socket from the client side.I assume that
node
oryarn
is busy unpacking well-compressed tarball full of small files and does not restart reading from socket for long enough time, soESOCKETTIMEDOUT
is triggered. I assume that the code also does not disable socket timeout while putting stream in paused state.I assume, the possible fix is to download
.tgz
to a temporary file with some timeouts for network interactions and to unpack it without any timeouts as disk can't write faster anyway. Unfortunately, I'm not familiar with yarn codebase to provide a good PR.Environment
12.18.2
1.22.4
node:12-buster
running on top of Ubuntu 16.04 or 18.04yarn-error.log
is the following:Possible duplicates:
grid-styled@4.1.0
with 29090 files on Windowsnyc@11.7.3
with 4742 files on Windows 10material-design-icons@3.0.1
on macOS 10.13; 14 Mibrxjs-6.5.3
on Windowsmaterial-design-icons@3.0.1
on Windowsmaterial-design-icons
on Windowsrxjs-compat@6.2.2
with 3115 files on Windows 10material-design-icons-3.0.1
with 89814 files on Windows 10lottie-react-native-2.3.2
with 6275 files on Windows 10rxjs-5.5.12
with 3661 files on Windows@material-ui/icons@4.0.1
with 15667 files on Windowsnpm-6.11.3
with 4086 files on Windows@carbon/icons-react@0.0.1-beta.5
with 6165 files on Windowscore-js@2.6.11
with 1489 files on WindowsThe text was updated successfully, but these errors were encountered: