You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
In an example such as ClickHouse, there are many settings that can be applied at the server, session, or query level to optimize performance. However, the best setting for one query may not be ideal for others. Relying on server-level settings for query performance often leads to suboptimal results for certain queries, especially those with varying parameters. Many queries can benefit from runtime hints/settings to improve performance based on their specific needs (Telemetry might be another case).
Currently, Cube.js forces us to use SQL instead of table names in cube definitions as a workaround to apply settings. This causes subqueries to be formed where the setting is only applied to the subquery, rather than the entire query. As a result, the intended settings are not fully applied, leading to performance inefficiencies, particularly for real-time applications. There are also potentially databases where applying settings at the subquery level is not possible, making this workaround even more problematic.
Describe the solution you'd like
We would like Cube.js to support the ability to pass query hints or settings to the query builder, specific to a given driver. Ideally, these hints could be calculated dynamically during query_rewrite and then passed down to the query generation.
Additionally, allowing API callers to pass parameters that would control query hints would provide better control over query execution, especially when there is no direct customer exposure. For example setting of parameters may be calculated via a call to a library in CSharp which is our main language for example. Forcing individuals to use query_rewrite to do it all might be a showstopper in this case.
An alternative on this latter part to avoid this low level hinting would be to use schema to help schematize the options available for a given datastore to pass to the query generation layer. So when you query a cube linked to a datastore, a range of options would be available.
Describe alternatives you've considered
Using raw SQL instead of table names in cube definitions to allow for injection of settings using filter params. This workaround forces the query to become a subquery, limiting the scope of the applied settings to the subquery rather than the entire query. This not only leads to inefficiencies but may also not be possible in languages where settings cannot be applied to subqueries.
Manually setting certain parameters at the server or session level, though this does not offer the flexibility needed for queries with varied workloads and priorities.
Another workaround is to ask cube for the raw SQL and then alter it as required and execute the query yourself. This is useful ofcourse for edge case scenarios where you might do things like alter the query and export the result to s3 but it totally defeats the purpose of cube for regular query scenarios.
Additional context
Some specific use cases where query hints/settings would be beneficial:
Distributed queries in ClickHouse Cloud require settings to use more than one replica.
Longer lookback queries could benefit from allocating more threads, ensuring better latency for less frequent, long-running queries.
Overriding the join algorithm for specific queries to optimize execution plans.
Setting a query ID for end-to-end query tracing and monitoring.
Providing this flexibility would improve query execution performance, especially for real-time applications.
If there is existing functionality to do this, happy to move this to a question so that others can find the info there.
Thank you!
The text was updated successfully, but these errors were encountered:
Korg95
changed the title
Allow for Query Hints / Settings to be accessible to query builder for a given driver
[Feature request] Allow for Query Hints / Settings to be accessible to query builder for a given driver
Oct 24, 2024
Thanks for a very elaborate proposal, I really appreciate that. That being said, I have marked this as ClickHouse-specific as we do not (currently?) get many similar requests regarding other data sources.
I'm also thankful for the list of specific use cases. May you please provide an example of parameter setting for each one of them? Some minimal yet working SQL for each example would really help—I don't want to misunderstand what is really required, hence my ask.
My current understanding is that most probably the solution might along the lines of allowing to initialize a database driver in Cube with an additional SQL that is executed once per connection. Further customization would be possible via defining multiple data sources and using the right one in each cube.
Setting a query ID for end-to-end query tracing and monitoring.
+1 specifically to this use case. On each query submitted to ClickHouse, the ability to send a custom setting allows for tracing a request from the client to a ClickHouse query.
Is your feature request related to a problem? Please describe.
In an example such as ClickHouse, there are many settings that can be applied at the server, session, or query level to optimize performance. However, the best setting for one query may not be ideal for others. Relying on server-level settings for query performance often leads to suboptimal results for certain queries, especially those with varying parameters. Many queries can benefit from runtime hints/settings to improve performance based on their specific needs (Telemetry might be another case).
Currently, Cube.js forces us to use SQL instead of table names in cube definitions as a workaround to apply settings. This causes subqueries to be formed where the setting is only applied to the subquery, rather than the entire query. As a result, the intended settings are not fully applied, leading to performance inefficiencies, particularly for real-time applications. There are also potentially databases where applying settings at the subquery level is not possible, making this workaround even more problematic.
Describe the solution you'd like
We would like Cube.js to support the ability to pass query hints or settings to the query builder, specific to a given driver. Ideally, these hints could be calculated dynamically during query_rewrite and then passed down to the query generation.
Additionally, allowing API callers to pass parameters that would control query hints would provide better control over query execution, especially when there is no direct customer exposure. For example setting of parameters may be calculated via a call to a library in CSharp which is our main language for example. Forcing individuals to use query_rewrite to do it all might be a showstopper in this case.
An alternative on this latter part to avoid this low level hinting would be to use schema to help schematize the options available for a given datastore to pass to the query generation layer. So when you query a cube linked to a datastore, a range of options would be available.
Describe alternatives you've considered
Using raw SQL instead of table names in cube definitions to allow for injection of settings using filter params. This workaround forces the query to become a subquery, limiting the scope of the applied settings to the subquery rather than the entire query. This not only leads to inefficiencies but may also not be possible in languages where settings cannot be applied to subqueries.
Manually setting certain parameters at the server or session level, though this does not offer the flexibility needed for queries with varied workloads and priorities.
Another workaround is to ask cube for the raw SQL and then alter it as required and execute the query yourself. This is useful ofcourse for edge case scenarios where you might do things like alter the query and export the result to s3 but it totally defeats the purpose of cube for regular query scenarios.
Additional context
Some specific use cases where query hints/settings would be beneficial:
Providing this flexibility would improve query execution performance, especially for real-time applications.
If there is existing functionality to do this, happy to move this to a question so that others can find the info there.
Thank you!
The text was updated successfully, but these errors were encountered: