-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query's query_hash
is incorrect for queries with parameters
#1909
Comments
Thank you for the elaborate description. Another possible solution is to have the WDYT? |
@arikfr I'm not sure I understand how that will work. I assume you mean the def enqueue_query(query, data_source, user_id, scheduled_query=None, metadata={}):
query_hash = gen_query_hash(query)
logging.info("Inserting job for %s with metadata=%s", query_hash, metadata) OTOH, perhaps I'm misunderstanding something about the architecture. I realize there are quite a few moving parts here and I'm still working my way through the code. Anyway, I want to propose another approach. Ultimately, the Query and QueryResult are linked via queries = db.session.query(Query).filter(
Query.query_hash == query_hash,
Query.data_source == data_source)
for q in queries:
q.latest_query_data = query_result
db.session.add(q) Suppose we altered the interfaces a bit to pass along a queries = db.session.query(Query).filter(
Query.query_hash == query_hash or Query.id == query_id, # Change here
Query.data_source == data_source)
for q in queries:
q.latest_query_data = query_result
db.session.add(q) I believe this would preserve existing behavior (in that "duplicate" queries are updated with the latest data) and that the source parameterized query also gets linked. I'm happy to continue kicking this around. And, once we land on an approach, I'm happy to send out a PR. |
Exactly.
It definitely can be and in this case there is no reason to update the query's latest result reference. |
Ah, it's the point about shared dashboards that I was missing. Thanks for the explanation. In that case I think your solution is appropriate. IIUC, saving the query would also need to render the template with default values and generate the hash from that. Is that what you mean? |
Yes. |
Here's one other thing to consider, that I tried to call out in my initial report. If you create a brand new query with a parameter, such that the first query save is already templated, you can never share this query in a dashboard because Anyway, I can understand being somewhat queasy with this change, but I think it addresses a pretty clear bug. And, as you said, it's a pretty small change. I'll craft a PR. |
Thanks, @redshirtrob. |
Hi @redshirtrob & @arikfr I'm not super familiar with the shared dashboard you were talking about, but I tested the above PR through simple and parameterised queries and included those in a dashboard and so far it seems work as intended |
Issue Summary
When using parameters in queries, the fully rendered query's
hash_code
does not match thehash_code
stored in thequeries
table. Consequently, thelatest_query_data_id
does not get updated.If the query is created with a parameter, saved, run, used in a dashboard, and that dashboard is subsequently shared, the shared dashboard will crash because there will be no cached data for the query, i.e.
latest_query_data_id
is empty.Steps to Reproduce
Save the query
Observe the data stored in the queries table, i.e.
hash_code
andlatest_query_data_id
Execute the query
Observe the log messages. Note the logged
hash_code
does NOT match the table's value.latest_query_data_id
Save the query
Observe the new values in the database. Note that
query_hash
now matches the value from the log message in Step 5.Execute the query
Observe the log messages. Note the logged
hash_code
now matches the table's value.latest_query_data_id
has been updated.I expect parameterized queries to behave just like any other queries. When they succeed the data should be linked to the queries table, and the latest results should be available in any shared dashboards.
Technical details:
I believe the simplest solution is to pass along a
query_hash
field in themetadata
parameter totasks:queries.py::enqueue_query()
. This wayQueryExecutor
could choose themetadata
version when callingmodels.QueryResult.store_result()
. Thequery_hash
field inmetadata
should contain the same value as thehash_code
field in thequeries
table.My main concern with replacing the value of the
query_hash
formal parameter toenqueue_query
is that it's used in a bunch of places in theredis_pipeline
and I'm not sure what kind of unintended side-effects that might cause.The text was updated successfully, but these errors were encountered: