-
-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optional daemonic threads for trio.to_thread.run_sync #2046
Comments
Can you elaborate on the actual downsides? The threads don't need cleanup, and how do they not play will with subinterpreters? (Also, is there any reason to care about subinterpreter support in the first place?) |
Some threads can still acquire resources that need to be waited for on interpreter shutdown re subinterpreters:
I'd really like subinterpreter support to be able to use multi-cpu hypercorn without multiprocessing where possible |
Do you have an example in mind? How does this cause problems for Trio's current use of daemonic threads, and what would the alternative be?
@graingert linked some more context in chat: https://vstinner.github.io/threading-shutdown-race-condition.html#forbid-daemon-threads-in-subinterpreters @richardsheridan also pointed to https://bugs.python.org/issue40234, and I think Graham Dumpleton's response is particularly relevant. I respect @vstinner a lot but daemon threads are a totally standard thing that every threading system supports, including Python's, and like Graham points out, that's because they're kind of important? So I think Victor and the subinterpreter folks are going to have to hold their nose and figure this one out, even if it sucks. In any case, subinterpreters are not exactly a real thing currently, and multi-core subinterpreter support is definitely not a real thing, and personally I still think the odds are heavily stacked against it ever becoming a real thing. And if it ever does become a real thing, then that will just increase the pressure on subinterpreters to support stuff like daemon threads, or at least provide new APIs to handle the cases where we use daemon threads currently. So... I don't think we should make our code more complicated and fragile now to avoid a theoretical future problem that might never even happen – if daemon threads in subinterpreters ever become a real problem for Trio, we can always revisit. |
the problem is most sharp with anyio where asyncio users are used to the non-daemonic, cleaned up atexit asyncio.to_thread and trio users expect the opposite |
Currently, Py_EndInterpreter() causes a fatal error if a daemon thread is still running while a subinterpreter exits:
It makes Py_EndInterpreter() not reliable at all: it depends if a daemon thread completes or not. To support daemon threads in the main interpreter, ceval.c must exit immediately daemon threads as soon as they try to acquire the GIL: see tstate_must_exit() in Python/ceval_gil.h. Currently, this code is specific to the main interpreter, and so don't solve Py_EndInterpreter() issue. Another concern is that Python doesn't solve if a daemon thread is still using an object or not when Py_EndInterpreter() is called. Pseudo-code:
If Py_EndInterpreter() destroys the bytearray object of the daemon thread, read() can write into freed memory: you can a crash. asyncio still has a corner case issue in overlapped_dealloc():
If Py_EndInterpreter() leaves the bytearray object alive: you leak memory. Daemon threads provide a weird contract: don't care if Python exits, cleanup everything magically, and hope that it will work. It doesn't work. Magic doesn't exist. I would like to deny daemon threads in subinterpreters to make Py_EndInterpreter() safe again. If tomorrow, someone finds a clever trick to implement them: good. But today, there is a major design flaw. |
By the way, I'm against daemon threads in general, also in the main interpreter. There is no way to make such beasts fully safe. It's full of corner cases and make crashes at exit way more likely. |
@vstinner Hmm, I do see what you mean. In the main interpreter, you can absolutely make them fully safe, because the OS will help you -- process exit atomically destroys all threads and objects simultaneously, so there's no way for things to leak or get used-after-free. That's the basic problem that subinterpreters have always had... you're trying to reimplement subprocess semantics in userspace, but it's never going to work as well as actual OS subprocesses, because the kernel has resources that you don't. If you implement subinterpreters as subprocesses, then daemon threads will work fine :-). Or, I guess you could give up on |
As a concrete example of why OS-level daemon threads are necessary: whenever Now, what happens if you want to cancel the So here's the problem: whenever your program finishes, there might be stalled |
asyncio uses: where which since https://bugs.python.org/issue39812 does not use daemon threads |
No, the OS doesn't help. Python doesn't attempt to kill daemon threads in Py_Finalize(). It uses tstate_must_exit() to call pthread_exit() if a daemon thread tries to (re)acquire the GIL whereas Python is exiting / exited. This work has been done because we got many crashes related to daemon threads (without subinterpeters). The problem of tstate_must_exit() is that it currently uses a global variable to check if Python is exiting, and this function is not compatible with subinterpreters. It would be possible to make it compatible with subinterpreters. It's just that nobody decided to work on this issue yet. Note: pthread_exit() comes with its own set of issues: see https://bugs.python.org/issue42969 and https://bugs.python.org/issue44434 There are technical reasons why daemon threads are hard to be implemented correctly. |
pthread_kill() can be used to kill a thread, but I'm not sure that there is a portable way to kill a thread, so Python doesn't rely on it. I think that the most portable option is to spawn a child process and kill it to cancel a task. A better solution would be to use an asynchronous implementation of a DNS resolver (there are some written in Python). Or implement an asynchronous name resolver in the C library. |
Alas,
Yeah, but no-one would accept spawning child processes for every hostname lookup :-). I guess you could get it down to 1 subprocess for all the DNS lookups in a single subinterpreter but... by the time you make that reliable, you might as well just use the subprocess for the entire subinterpreter instead?
Absolutely, in theory. In practice, it turns out that this isn't really viable, because Custom DNS resolvers are totally useful as an option -- they can work great in some configurations. But you can't count on them. Pretty much everyone who's ever built a networking library has gone down the path of getting cranky about
This would be fabulous of course, but it's not realistic -- no major OS offers this today, none of them have plans to, and it takes years for new OS features to percolate out to users.
Yeah, I totally believe this. I'm not claiming daemon threads are like, objectively a good thing! Everything I just wrote about is a fractal of interlocking sadness and frustration. Just... daemon threads are what the industry has collectively settled on as the frustrating feature that has to be supported to make this stuff work. Sorry to be the bearer of bad news :-/ |
Daemon threads (which pthreads calls detached threads) also seem to be a feature put into posix "because it would be nice" that didn't think through the full implications... Why Python supported the concept is presumably simply because it existed in pthreads and sounded nice. The implications weren't really thought through. Code executing in your address space that you can never control or stop once you start it? What could possibly go wrong? :P The concept makes safe sense only in the simplest of cases in C where the code is constrained and not complex and calling off into other libraries that could potentially have concurrency and global state needs. A Python interpreter is by definition complex and has all of those. ie: if you want a safe as possible daemon thread for calling libc I'm glad you brought up what golang does, because they're where I would've looked to next as they're understandably libc averse. The comments atop https://golang.org/src/net/net.go explain their current logic. It is indeed a pain and libc getaddrinfo simply cannot be avoided in significant common real world scenarios. The cgo getaddrinfo calling code limits the number of getaddrinfo threads that can run at once to avoid cascading failure (blocking the channel when there are too many). https://golang.org/src/net/net.go?s=21309:21329#L658 acquireThread called from https://golang.org/src/net/cgo_unix.go getaddrinfo calling functions. But that's about it. Effectively the same thing that asyncio does by using a threadpool per graingert's links above. Simplest: Just don't use daemon threads for Python getaddrinfo calls. Use normal threads. Ideally pooled with a hard limit on number pending before even adding to the pending pool blocks. Ordinarily getaddrinfo calls aren't supposed to take a long time. They can, but this is rare. Don't optimize for the rare instant application termination desire in this case, just allow the "hang" to happen. If this turns out annoying to users, they can use strace or ltrace or their platform equivalents to see what's up and point the finger at "it's always DNS (or really the abstraction one layer up known as getaddrinfo logic)". If you need a program to exit immediately without waiting for any cleanup, that's what |
I mean, yes, that is the basic problem with threads :-). But what does it have to do with the daemon/joinable distinction? Joinable threads have a join handle, but that's just an event + an int. It doesn't let you affect the thread at all, and you can emulate it trivially with a daemon thread. (In fact IIRC that's exactly what CPython does -- joinable AFAIK the only fundamental difference between joinable and detached threads is that if your main thread exits without waiting for the thread, then the runtime will automatically join any joinable threads, but if there are detached threads then it just
It's not so much that it's easy, as that Trio already has its own high-quality and deeply-integrated threading code, and tacking on some external thread pool for a subset of operations would just make things jankier for no benefit. In fact, glibc and Windows both have built-in APIs for shoving If hypothetically we did want to port Trio to work with subinterpreters, I guess the most natural way to handle this would be to move our thread cache into C, so we could keep a single process-global set of detached worker threads. Then individual subinterpreters could "check out" a thread whenever they had a job to do, do the (Or just don't have subinterpreters, that also solves the problem ;-).)
These are good excuses if we want to make excuses, but... right now things just work? Why would we choose to de-optimize and force people to run (Also FWIW I've been using Linux daily for almost 25 years, and I'm only vaguely aware of the existence of Ctrl-\ :-). I think you might overestimate how widespread that knowledge is...) |
daemon threads skip cleanup at interpreter shutdown and don't play well with subinterpreters
trio should maintain two pools of threads, daemonic and non-daemonic so trio can push that choice onto users
eg:
trio.to_thread.run_sync(fn, daemon=False)
The text was updated successfully, but these errors were encountered: