[Python][FlightRPC] Interpreter deadlock when using GeneratorStream and Acero + Python UDFs #40004
Labels
Component: FlightRPC
Component: Python
Critical Fix
Bugfixes for security vulnerabilities, crashes, or invalid data.
Type: bug
Milestone
Describe the bug, including details regarding any error messages, version, and platform.
Hello,
I'm on Arrow 15.0, Python 3.11. I have a Flight RPC service that does roughly the following:
flight.GeneratorStream()
with the reader created using Declaration.to_reader()GeneratorStream
instead offlight.RecordBatchStream
because there is likelyhood thatGetFlightInfo
could produce multiple pieces to concatenate into DoGet stream.This service is then called from Java land which quite reliably triggers full interpreter deadlock of the server process.
The thread dump of the stuck process: https://gist.github.com/lupko/c4491df7a36247b48ba0248c2d5f9ae6
I have been playing around and when I got the
GeneratorStream
out of the picture (e.g. used RecordBatchStream instead), there were no deadlocks.Following the traces from thread dump, I believe the problem is here:
arrow/python/pyarrow/_flight.pyx
Line 2016 in de3cdc0
So in the end I think the situation is: The call does not release GIL before it goes to get
Next()
. But it will never getNext()
because to do so Acero needs to run Python UDF which needs GIL as well.I did try fix on a local build - wrapping the call in
with nogil
- and all is good. I will create PR shortly.Component(s)
FlightRPC, Python
The text was updated successfully, but these errors were encountered: