-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting warning message "Using fork() can cause Polars to deadlock in the child process" #20255
Comments
Fork is outright dangerous. As it assumes no other library holds a lock. It is an insane default and in Python 3.14 they will default to spawn. This isn't Polars specific. Numpy also has some parallelism in blas and can also be corrupted. (Though with a lower probability). You can ignore it. Btw, Polars should not be parallelized by the users. It handles it itself. https://pytorch.org/docs/stable/notes/multiprocessing.html#avoiding-and-fighting-deadlocks |
From your answer I guess we are in (1), so I just need to avoid any multiprocessing / multithreading external library working with Polars. |
We're quite explicit in the fact that we're multithreaded. (And this whole warning is also quite explicit, isn't it :) )
Forking a process. It can deadlock any process that holds a mutex. Multithreading is fine. Multiprocessing with fork is dangerous. It is not something we can fix.
With any library that does anything multithreaded. Even numpy: Don't fork after threads are created: python/cpython#96971 (comment) |
I'm also facing the same issue but can't figure out what causes it, the only part which seems more related to explicitly using multithreading is calling collect_all, what should we do about this warning? |
The error message is pretty clear:
E.g. silence it via the If you get a deadlock, you know what caused it. |
The warning could be raised in a situation that I think is a false positive.
Where the warning is more invasive is that, the hook on |
Yeah, I agree that this is too broad for of a warning to Polars. I will revert it. |
Thanks for considering this input. |
This is a request for making this message a bit more explicit. We are trying to switch a complex ML pipeline gradually from pandas to Polars. At the moment we are testing to switch a subset of the features computation. This has worked perfectly fine!
However at the start of the processing pipeline (so much earlier the new polars block), we got the following warning:
We wonder exactly what should we do about that, if we can ignore it, and what we have to avoid to do. We see a lot of discussion about this specific warning but to the best of our knowledge there is no a clear answer on how to behave. To me it is not clear which consequences we have switching from fork to spawn so any detailed explanation is well accepted!
We found out that we should only take care /avoid to use multiprocessing and Polars together. If this is the case I would highlight it in the message, if there are other things we should take care of, we would like to know them
We see this specific error only in a linux os.
The text was updated successfully, but these errors were encountered: