-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No data about SELECT queries in Superset in ClickHouse table system.processes #62604
Comments
The most probably this query returned data to the superset instantly. And superset shows its own processing.
|
@den-crane - It helps, thank you! This solution generates some additional problems but they could be resolved. I mean user of our app doesn't want to add column "sleep(1)" to SELECT query, so I modified sql query in Superset in method superset\sql_lab.py / execute_sql_statement(). And next I should remove additional column "sleep(1)" from resultset in Superset / SQLLab. Second solution was to get active queries from Superset table "query". But this solution has a lot of issues - many expired sql queries in table in "running" status; we should join queries from Superset table with ClickHouse table system.query_log to get query_id to have possibility to kill query but rows not identical etc. So, we have a workaround. But original problem still exists. Should I close the issue? |
I don't really understand what is the problem. Why do you need queries in system.processes ? |
We create a solution where users could run any queries to ClickHouse DB via Superset. And admin could be able to see all queries and kill some of them (long queries). And we saw that admin can not see SELECT queries and we need some approach to fix it. |
That wasn't a workaround, it was to help visualise that the problem is not at the ClickHouse level. Please check the logs of those supposedly running queries to confirm. If the query has already finished there is obviously no way, or need, to cancel it. |
@Algunenano - No, the problem is not about "query has already finished there is obviously no way, or need, to cancel it". The query was really long (>30sec) and that SELECT query I didn't see in system.processes. Fortunately I applied solution from @den-crane (add column sleep(1) to SELECT query and next removed that column from result). As I understand, the problem will not be fixed on Superset (apache/superset#28007 (comment)) neither on ClickHouse (#62604 (comment)) side. Fortunately we have a workaround. So, ticket could be closed. |
If adding a If the query doesn't appear in system.processes while it's still running that would be a major bug on CH side. |
I tested the same query in DBeaver and in Superset.
|
@berlicon Please share I guess the difference that DBeaver fetches data slowly using HTTP protocol and this is accounted as query execution, but Superset fetches instantly using TCP protocol (because of another column format, another blocksize). It should be visible in system.query_log.profileEvents |
it looks like this
|
@den-crane
DBeaver:
|
hint: I am not going to look into xlsx files. Too cumbersome. The last time i used MS. Excel was 1996. |
I don't see anything new. A query (without sleep) from the Superset is fast.
|
In ClickHouse there is a table system.processes which contains info about all queries running right now. And we can kill any too long query.
See:
https://clickhouse.com/docs/en/operations/system-tables/processes
https://clickhouse.com/docs/en/sql-reference/statements/kill#kill-query
https://clickhouse.com/docs/en/operations/system-tables/query_log
This table works great. I can run any long (>5sec) query any type (select, insert, update, delete, alter) then I can see that query in system.processes and kill that query by query_id. I checked this functionality in DBeaver. Also I tested that in Superset / SQLLab. All queries running in SQLLab I can see in system.processes except of SELECT queries. I want to see all queries from Superset, not only ins/upd/del but also SELECT queries.
How to reproduce the bug:
create table eso.t2(v String) ENGINE = MergeTree() order by v
insert into eso.t2(v) values(generateUUIDv4())
insert into eso.t2(v) SELECT v from eso.t2 -- repeat that row 20+ times to have 1m+ rows in table eso.t2
SELECT v, generateUUIDv4() as uid from eso.t2 limit 500000
SELECT query, * FROM system.processes where query not like '%processes%'
Success case: I run in Superset query:
insert into eso.t2(v) SELECT v from eso.t2
... and can see that query in system.processes.
The text was updated successfully, but these errors were encountered: