Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Kernel crash when calling flight_client.do_get() #37852

Closed
Jayclifford345 opened this issue Sep 25, 2023 · 3 comments
Closed

Python Kernel crash when calling flight_client.do_get() #37852

Jayclifford345 opened this issue Sep 25, 2023 · 3 comments

Comments

@Jayclifford345
Copy link

Describe the bug, including details regarding any error messages, version, and platform.

We are currently facing a Python Kernal crash within a VScode notebook which is being caused by:
flight_client.do_get()

We are currently trying to return a dataset which has 165 columns using Pyarrow Flight. This works successfully with columns that return a smaller column number. If we return the full range we are met with the following crash:

14:43:41.160 [info] Handle Execution of Cells 16 for ~\OneDrive - Vertical Aerospace Group Ltd\Documents\Test Data Access\vertical-data\examples.ipynb
14:43:41.166 [info] Kernel acknowledged execution of cell 16 @ 1695390221162
14:43:43.635 [info] End cell 16 execution @ 1695390223632, started @ 1695390221162, elapsed time = 2.47s
14:44:45.415 [error] Disposing session as kernel process died ExitCode: 3221225477, Reason: 
14:44:45.415 [info] Dispose Kernel process 12772.
14:44:45.473 [info] End cell 16 execution @ undefined, started @ 1695390223632, elapsed time = -1695390223.632s

This issue appears to predominantly occur within Pyarrow running natively on Windows. This functionality is required as I will eventually be interfaced with matlab.

We are interested to understand why the crash occurs at flight_client.do_get() rather than during the return of the dataset. Here is the function for completeness:

    def query(self, query, language="sql", mode="all", database=None,**kwargs ):
        """
        Query data from InfluxDB.

        :param query: The query string.
        :type query: str
        :param language: The query language; "sql" or "influxql" (default is "sql").
        :type language: str
        :param mode: The mode of fetching data (all, pandas, chunk, reader, schema).
        :type mode: str
        :param database: The database to query from. If not provided, uses the database provided during initialization.
        :type database: str
        :param kwargs: Additional arguments for the query.
        :return: The queried data.
        """
        

        if database is None:
            database = self._database
        
        try:
            headers = [(b"authorization", f"Bearer {self._token}".encode('utf-8'))]
    
            # Create an authorization header
            _options = FlightCallOptions(headers=headers, **kwargs)
            ticket_data = {"database": database, "sql_query": query, "query_type": language}
            ticket = Ticket(json.dumps(ticket_data).encode('utf-8'))
            flight_reader = self._flight_client.do_get(ticket, _options)

            mode_func = {
                "all": flight_reader.read_all,
                "pandas": flight_reader.read_pandas,
                "chunk": lambda: flight_reader,
                "reader": flight_reader.to_reader,
                "schema": lambda: flight_reader.schema
            }.get(mode, flight_reader.read_all)

            return mode_func() if callable(mode_func) else mode_func
        except Exception as e:
            raise e

Many thanks in advance for any help that can be provided please let us know if you need any more information.

Component(s)

Python

@lidavidm
Copy link
Member

Do you have the dataset or a self-contained reproduction?

@Jayclifford345
Copy link
Author

Hi @lidavidm,
I am ever so sorry for the late reply. Let me see if I can get you this reproducible.

@Jayclifford345
Copy link
Author

Hi all, just going to close this issue for now as I cannot reproduce it currently with a basic Flight Server. Going to investigate this more internally first and see if we can find a better root cause.

@Jayclifford345 Jayclifford345 closed this as not planned Won't fix, can't repro, duplicate, stale Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants