-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming with query_row_block_stream crashes after few reads #399
Comments
It's crashing because your connection is breaking: 'Connection broken: IncompleteRead(134 bytes read)'. This could be caused by something like an idle timeout anywhere in your chain. How much time is processing taking per chunk? |
The average processing time for a single block is ~10 seconds. |
After digging into this I believe the problem is that ClickHouse server times out when pushing more data because the client has not read all of the data off the socket. When trying to reproduce this I get the following error in the ClickHouse logs:
The socket is still busy/full when ClickHouse tries to send the error:
The end result is that if reading data falls more than 30 seconds behind ClickHouse sending the data, ClickHouse will close the connection, causing the error you see. There's not an easy fix directly in However, in the next release I'm looking at adding an intermediate buffer with a configurable size to temporarily store the HTTP data until requested by the stream processing. So if your total query size is something like 50MB, and the new intermediate buffer is configured at 100MB, you should not have this issue. But there will definitely be a tradeoff between using the additional memory and ensuring that your connection isn't closed while processing. |
@genzgd thank you for the investigation. I understand it more precisely now. client = clickhouse_connect.get_client(
...
settings={"max_block_size": 30 / seconds_per_row}
) |
|
I see. |
Unfortunately, no, there is no server side cursor. If you can break your query up into chunks based on the primary key, you could read each chunk into memory (using just the client |
Describe the bug
The streaming read is crashing after some time if there any processing in between reads.
Steps to reproduce
Expected behaviour
I expect it to read the whole dataset. If I disable processing, the dataset is read fine (40 mln records).
This leads me to think that it is not related to actual data and response but something inside the implementation.
Code example
clickhouse-connect and/or ClickHouse server logs
Configuration
Environment
ClickHouse server
CREATE TABLE
statements for tables involved:The text was updated successfully, but these errors were encountered: