-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watchman leaks open files (Linux) #27
Comments
Are there any processes being triggered? Watchman could potentially be leaking fds to children, though I think we use |
The FWIW, we run with a |
(I'm going to answer your questions wrt 3.13.5 on virtualbox, because it's annoying to reboot my dev machine). There are no triggers. What's holding descriptors: 1519 ? Ss 0:00 init --user |
That seems like you aren't doing any filtering based on the kind of descriptor -- I'm particularly interested in processes that are holding inotify descriptors open. Basically the PIDs you get when you run |
Sorry about that -- this is the list if I add the grep anon_inode:inotify at the appropriate place in the previous command. 1577 ? Ss 0:00 init --user |
While running other simultaneous tests with watchman, I noticed the tests suddenly freezing up. I'm not quite sure of the precise sequence of events (because automated testing), but the tests were doing (watch, since query) one or more times, then deleting the watched directory. Now watchman is in a funny state: there appear to be 8 watchman processes running (I think usually there is only one). watchman watch-list hangs attempting to read from the socket, as does watchman watch. However, providing bad syntax (eg "watchman since /tmp/foo") doesn't hang, indicating that the problem isn't the socket but whatever's going on inside watchman. And it's definitely a kernel bug -- I found this in my dmesg: [152513.914195] watchman[4963]: segfault at 7ff04ddb09d0 ip 00007ff05b831f60 sp 00007ff04c5acce8 error 4 in libpthread-2.15.so[7ff05b825000+18000] |
Killing all of the watchman processes appears to restore order to the universe, mostly, but watchman is now rather fragile -- I get a fair number of "synchronization failed: Connection timed out" errors when doing queries, which I normally don't; killing watchman makes them go away, but only for a little while. |
I don't think it is especially productive to try sanity check watchman behavior on a system with a broken inotify implementation :-/ Watchman relies on the filesystem notification layer of the system on which it runs; if that isn't working then watchman is going to have a hard time telling you what you want to know. If you can't get things running on a working kernel, then my suggestions would be to try running your tests on a different filesystem (maybe this is a filesystem specific kernel bug?) Regarding the blocking/stuck behavior you mentioned, it would be useful to see a gstack of the watchman server process and compare gstacks of the other processes that you saw (assuming that those are acting in client mode. It's possible that they are just the threads of the server process that are showing up in whatever tool you're using to inspect the system). |
I'm going to close this out because it really seems to be a kernel problem and not something we can solve in watchman. Sorry! |
FWIW, this seems to be fixed on 3.14-rc5. On Sun, Mar 30, 2014 at 10:25 PM, Wez Furlong notifications@github.comwrote:
|
Summary: Pull Request resolved: facebook/sapling#27 Pull Request resolved: facebookexperimental/rust-shed#9 Original diffs: D22417488 (f4de30f), D22528869 (d90ddb5) Reviewed By: markbt Differential Revision: D22571972 fbshipit-source-id: c6f013565680a757b642dd79e647207fce3351ec
Summary: We have seen deadlock running `terminationHandler` -> `hasSubscribers` in 2 threads. It's unclear which other thread is holding the lock. To make things easier to debug next time, let's change terminationHandler (and also main.cpp) to bypass the logging lock and write to stderr directly. Related threads (all threads in P536343453): Thread 11 (LWP 3275661): #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #1 0x0000000001cc995b in folly::detail::(anonymous namespace)::nativeFutexWaitImpl (addr=<optimized out>, expected=<optimized out>, absSystemTime=<optimized out>, absSteadyTime=<optimized out>, waitMask=<optimized out>) at fbcode/folly/detail/Futex.cpp:126 #2 folly::detail::futexWaitImpl (futex=0x89, futex@entry=0x7f1c3ac2ef90, expected=994748889, absSystemTime=absSystemTime@entry=0x0, absSteadyTime=<optimized out>, absSteadyTime@entry=0x0, waitMask=waitMask@entry=1) at fbcode/folly/detail/Futex.cpp:254 #3 0x0000000001d34bce in folly::detail::futexWait<std::atomic<unsigned int> > (futex=0x7f1c3ac2ef90, expected=137, waitMask=1) at buck-out/v2/gen/fbcode/110b607930331a92/folly/detail/__futex__/headers/folly/detail/Futex-inl.h:96 #4 folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::WaitForever::doWait (this=<optimized out>, futex=..., expected=137, waitMask=1) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__shared_mutex__/headers/folly/SharedMutex.h:718 #5 folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::futexWaitForZeroBits<folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::WaitForever> (this=0x7f1c3ac2ef90, state=@0x7f1c149f88e4: 118379409, goal=128, waitMask=1, ctx=...) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__shared_mutex__/headers/folly/SharedMutex.h:1184 #6 0x0000000001cd42b2 in folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::yieldWaitForZeroBits<folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::WaitForever> (this=0x7f1c3ac2ef90, state=@0x7f1c149f88e4: 118379409, goal=128, waitMask=1, ctx=...) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__shared_mutex__/headers/folly/SharedMutex.h:1151 #7 folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::waitForZeroBits<folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::WaitForever> (this=0x7f1c3ac2ef90, state=@0x7f1c149f88e4: 118379409, goal=128, waitMask=1, ctx=...) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__shared_mutex__/headers/folly/SharedMutex.h:1109 #8 0x0000000001e7e14c in folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::lockSharedImpl<folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::WaitForever> (this=0x7f1c3ac2ef90, state=@0x7f1c149f88e4: 118379409, token=0x0, ctx=...) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__shared_mutex__/headers/folly/SharedMutex.h:1664 #9 folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::lockSharedImpl<folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::WaitForever> (this=0x7f1c3ac2ef90, token=0x0, ctx=...) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__shared_mutex__/headers/folly/SharedMutex.h:1356 #10 folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::lock_shared (this=0x7f1c3ac2ef90) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__shared_mutex__/headers/folly/SharedMutex.h:495 #11 std::shared_lock<folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault> >::shared_lock (this=<optimized out>, __m=...) at fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/shared_mutex:727 #12 0x0000000002d765fd in folly::LockedPtr<folly::Synchronized<watchman::Publisher::state, folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault> > const, folly::detail::SynchronizedLockPolicy<(folly::detail::SynchronizedMutexLevel)2, (folly::detail::SynchronizedMutexMethod)0> >::doLock<folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>, std::shared_lock<folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault> >, folly::detail::SynchronizedLockPolicy<(folly::detail::SynchronizedMutexLevel)2, (folly::detail::SynchronizedMutexMethod)0>, 0> (mutex=...) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__synchronized__/headers/folly/Synchronized.h:1493 #13 folly::LockedPtr<folly::Synchronized<watchman::Publisher::state, folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault> > const, folly::detail::SynchronizedLockPolicy<(folly::detail::SynchronizedMutexLevel)2, (folly::detail::SynchronizedMutexMethod)0> >::LockedPtr (this=0x7f1c149f8928, parent=<optimized out>) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__synchronized__/headers/folly/Synchronized.h:1272 #14 folly::SynchronizedBase<folly::Synchronized<watchman::Publisher::state, folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault> >, (folly::detail::SynchronizedMutexLevel)2>::rlock (this=<optimized out>) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__synchronized__/headers/folly/Synchronized.h:229 #15 watchman::Publisher::hasSubscribers (this=<optimized out>) at fbcode/watchman/PubSub.cpp:117 #16 0x0000000002eca798 in watchman::Log::log<char const (&) [39], char const*, char const (&) [3]> (this=<optimized out>, level=level@entry=watchman::ABORT, args=..., args=..., args=...) at buck-out/v2/gen/fbcode/110b607930331a92/watchman/__logging__/headers/watchman/Logging.h:42 #17 0x0000000002ec9ba7 in watchman::log<char const (&) [39], char const*, char const (&) [3]> (level=watchman::ABORT, args=..., args=..., args=...) at buck-out/v2/gen/fbcode/110b607930331a92/watchman/__logging__/headers/watchman/Logging.h:121 #18 (anonymous namespace)::terminationHandler () at fbcode/watchman/SignalHandler.cpp:159 #19 0x00007f1c3b0c7b3a in __cxxabiv1::__terminate (handler=<optimized out>) at ../../.././libstdc++-v3/libsupc++/eh_terminate.cc:48 #20 0x00007f1c3b0c7ba5 in std::terminate () at ../../.././libstdc++-v3/libsupc++/eh_terminate.cc:58 #21 0x0000000001c38c8b in __clang_call_terminate () #22 0x0000000003284c9e in folly::detail::terminate_with_<std::runtime_error, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&> (args=...) at buck-out/v2/gen/fbcode/110b607930331a92/folly/lang/__exception__/headers/folly/lang/Exception.h:93 #23 0x0000000003281bae in folly::terminate_with<std::runtime_error, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&> (args=...) at buck-out/v2/gen/fbcode/110b607930331a92/folly/lang/__exception__/headers/folly/lang/Exception.h:123 #24 folly::SingletonVault::fireShutdownTimer (this=<optimized out>) at fbcode/folly/Singleton.cpp:499 #25 0x0000000003281ad9 in folly::(anonymous namespace)::fireShutdownSignalHelper (sigval=...) at fbcode/folly/Singleton.cpp:454 #26 0x00007f1c3b42b939 in timer_sigev_thread (arg=<optimized out>) at ../sysdeps/unix/sysv/linux/timer_routines.c:55 #27 0x00007f1c3b41fc0f in start_thread (arg=<optimized out>) at pthread_create.c:434 #28 0x00007f1c3b4b21dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 ... Thread 1 (LWP 3201992): #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #1 0x0000000001cc995b in folly::detail::(anonymous namespace)::nativeFutexWaitImpl (addr=<optimized out>, expected=<optimized out>, absSystemTime=<optimized out>, absSteadyTime=<optimized out>, waitMask=<optimized out>) at fbcode/folly/detail/Futex.cpp:126 #2 folly::detail::futexWaitImpl (futex=0x89, futex@entry=0x7f1c3ac2ef90, expected=994748889, absSystemTime=absSystemTime@entry=0x0, absSteadyTime=<optimized out>, absSteadyTime@entry=0x0, waitMask=waitMask@entry=1) at fbcode/folly/detail/Futex.cpp:254 #3 0x0000000001d34bce in folly::detail::futexWait<std::atomic<unsigned int> > (futex=0x7f1c3ac2ef90, expected=137, waitMask=1) at buck-out/v2/gen/fbcode/110b607930331a92/folly/detail/__futex__/headers/folly/detail/Futex-inl.h:96 #4 folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::WaitForever::doWait (this=<optimized out>, futex=..., expected=137, waitMask=1) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__shared_mutex__/headers/folly/SharedMutex.h:718 #5 folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::futexWaitForZeroBits<folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::WaitForever> (this=0x7f1c3ac2ef90, state=@0x7ffd2d5be924: 118379408, goal=128, waitMask=1, ctx=...) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__shared_mutex__/headers/folly/SharedMutex.h:1184 #6 0x0000000001cd42b2 in folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::yieldWaitForZeroBits<folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::WaitForever> (this=0x7f1c3ac2ef90, state=@0x7ffd2d5be924: 118379408, goal=128, waitMask=1, ctx=...) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__shared_mutex__/headers/folly/SharedMutex.h:1151 #7 folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::waitForZeroBits<folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::WaitForever> (this=0x7f1c3ac2ef90, state=@0x7ffd2d5be924: 118379408, goal=128, waitMask=1, ctx=...) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__shared_mutex__/headers/folly/SharedMutex.h:1109 #8 0x0000000001e7e14c in folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::lockSharedImpl<folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::WaitForever> (this=0x7f1c3ac2ef90, state=@0x7ffd2d5be924: 118379408, token=0x0, ctx=...) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__shared_mutex__/headers/folly/SharedMutex.h:1664 #9 folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::lockSharedImpl<folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::WaitForever> (this=0x7f1c3ac2ef90, token=0x0, ctx=...) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__shared_mutex__/headers/folly/SharedMutex.h:1356 #10 folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>::lock_shared (this=0x7f1c3ac2ef90) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__shared_mutex__/headers/folly/SharedMutex.h:495 #11 std::shared_lock<folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault> >::shared_lock (this=<optimized out>, __m=...) at fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/shared_mutex:727 #12 0x0000000002d765fd in folly::LockedPtr<folly::Synchronized<watchman::Publisher::state, folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault> > const, folly::detail::SynchronizedLockPolicy<(folly::detail::SynchronizedMutexLevel)2, (folly::detail::SynchronizedMutexMethod)0> >::doLock<folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault>, std::shared_lock<folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault> >, folly::detail::SynchronizedLockPolicy<(folly::detail::SynchronizedMutexLevel)2, (folly::detail::SynchronizedMutexMethod)0>, 0> (mutex=...) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__synchronized__/headers/folly/Synchronized.h:1493 #13 folly::LockedPtr<folly::Synchronized<watchman::Publisher::state, folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault> > const, folly::detail::SynchronizedLockPolicy<(folly::detail::SynchronizedMutexLevel)2, (folly::detail::SynchronizedMutexMethod)0> >::LockedPtr (this=0x7ffd2d5be968, parent=<optimized out>) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__synchronized__/headers/folly/Synchronized.h:1272 #14 folly::SynchronizedBase<folly::Synchronized<watchman::Publisher::state, folly::SharedMutexImpl<false, void, std::atomic, folly::SharedMutexPolicyDefault> >, (folly::detail::SynchronizedMutexLevel)2>::rlock (this=<optimized out>) at buck-out/v2/gen/fbcode/110b607930331a92/folly/__synchronized__/headers/folly/Synchronized.h:229 #15 watchman::Publisher::hasSubscribers (this=<optimized out>) at fbcode/watchman/PubSub.cpp:117 #16 0x0000000002ecac20 in watchman::Log::log<char const (&) [59]> (this=<optimized out>, level=level@entry=watchman::ABORT, args=...) at buck-out/v2/gen/fbcode/110b607930331a92/watchman/__logging__/headers/watchman/Logging.h:42 #17 0x0000000002ec9b24 in watchman::log<char const (&) [59]> (level=watchman::ABORT, args=...) at buck-out/v2/gen/fbcode/110b607930331a92/watchman/__logging__/headers/watchman/Logging.h:121 #18 (anonymous namespace)::terminationHandler () at fbcode/watchman/SignalHandler.cpp:165 #19 0x00007f1c3b0c7b3a in __cxxabiv1::__terminate (handler=<optimized out>) at ../../.././libstdc++-v3/libsupc++/eh_terminate.cc:48 #20 0x00007f1c3b0c7ba5 in std::terminate () at ../../.././libstdc++-v3/libsupc++/eh_terminate.cc:58 #21 0x0000000002d8cde1 in std::thread::~thread (this=0x7f1c3ac2ef90) at fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/std_thread.h:152 #22 0x00007f1c3b3cc8f8 in __run_exit_handlers (status=1, listp=0x7f1c3b598658 <__exit_funcs>, run_list_atexit=<optimized out>, run_dtors=<optimized out>) at exit.c:113 #23 0x00007f1c3b3cca0a in __GI_exit (status=<optimized out>) at exit.c:143 #24 0x00007f1c3b3b165e in __libc_start_call_main (main=0x2d11220 <main(int, char**)>, argc=2, argv=0x7ffd2d5bec78) at ../sysdeps/nptl/libc_start_call_main.h:74 #25 0x00007f1c3b3b1718 in __libc_start_main_impl (main=0x2d11220 <main(int, char**)>, argc=2, argv=0x7ffd2d5bec78, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd2d5bec68) at ../csu/libc-start.c:409 #26 0x0000000002d0e181 in _start () at ../sysdeps/x86_64/start.S:116 Reviewed By: xavierd Differential Revision: D40166374 fbshipit-source-id: 7017e20234e5e0a9532eb61a63ac49ac0020d443
I believe that this is actually a Linux kernel bug that watchman is triggering. I'm reporting it here because maybe there's a way to work around it in watchman. I'm also going to report it to LKML; I'll comment with a link to the thread once it's posted.
Here's a script:
https://gist.github.com/dturner-tw/9336584
If you run it on Linux, you'll soon (after a few hundred lines of output) see something like:
{u'version': u'2.9.3', u'error': u'unable to resolve root /tmp/tmpL0H6OT: watch(/tmp/tmpL0H6OT): inotify_init error: Too many open files'}
Here's where it gets really fun: I kill the script. Then I kill watchman. Then I delete my /tmp/.watchman.{username}* files. So I should be ready to go again. But no, next time I run the script, I get the "Too many open files" message much more quickly. And in fact, inotify is messed up generally (not just for watchman):
$ tail -f /etc/hosts >/dev/null
tail: inotify cannot be used, reverting to polling: Too many open files
(I sometimes need to do that in two or three separate windows to trigger the error)
Only rebooting will make inotify work correctly again.
Some other diagnostics:
$ cat /proc/sys/fs/inotify/max_user_instances
128
$ ls -l /proc//fd/ 2>/dev/null | grep -c anon_inode:inotify
18
$ sudo ls -l /proc//fd/ 2>/dev/null | grep -c anon_inode:inotify
32
System info:
uname -a
Linux stross 3.11.0-17-generic #18 31~precise1-Ubuntu SMP Tue Feb 4 21:25:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
I also tried on an Ubuntu 3.8.0-36-generic kernel, and on a stock 3.13.5 that I built from source (this one in virtualbox).
The text was updated successfully, but these errors were encountered: