Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Allow for Query Hints / Settings to be accessible to query builder for a given driver #8838

Open
Korg95 opened this issue Oct 18, 2024 · 3 comments
Labels
driver:clickhouse Issues related to the ClickHouse driver enhancement New feature proposal

Comments

@Korg95
Copy link

Korg95 commented Oct 18, 2024

Is your feature request related to a problem? Please describe.

In an example such as ClickHouse, there are many settings that can be applied at the server, session, or query level to optimize performance. However, the best setting for one query may not be ideal for others. Relying on server-level settings for query performance often leads to suboptimal results for certain queries, especially those with varying parameters. Many queries can benefit from runtime hints/settings to improve performance based on their specific needs (Telemetry might be another case).

Currently, Cube.js forces us to use SQL instead of table names in cube definitions as a workaround to apply settings. This causes subqueries to be formed where the setting is only applied to the subquery, rather than the entire query. As a result, the intended settings are not fully applied, leading to performance inefficiencies, particularly for real-time applications. There are also potentially databases where applying settings at the subquery level is not possible, making this workaround even more problematic.

Describe the solution you'd like

We would like Cube.js to support the ability to pass query hints or settings to the query builder, specific to a given driver. Ideally, these hints could be calculated dynamically during query_rewrite and then passed down to the query generation.

Additionally, allowing API callers to pass parameters that would control query hints would provide better control over query execution, especially when there is no direct customer exposure. For example setting of parameters may be calculated via a call to a library in CSharp which is our main language for example. Forcing individuals to use query_rewrite to do it all might be a showstopper in this case.

An alternative on this latter part to avoid this low level hinting would be to use schema to help schematize the options available for a given datastore to pass to the query generation layer. So when you query a cube linked to a datastore, a range of options would be available.

Describe alternatives you've considered

Using raw SQL instead of table names in cube definitions to allow for injection of settings using filter params. This workaround forces the query to become a subquery, limiting the scope of the applied settings to the subquery rather than the entire query. This not only leads to inefficiencies but may also not be possible in languages where settings cannot be applied to subqueries.
Manually setting certain parameters at the server or session level, though this does not offer the flexibility needed for queries with varied workloads and priorities.

Another workaround is to ask cube for the raw SQL and then alter it as required and execute the query yourself. This is useful ofcourse for edge case scenarios where you might do things like alter the query and export the result to s3 but it totally defeats the purpose of cube for regular query scenarios.

Additional context
Some specific use cases where query hints/settings would be beneficial:

  • Distributed queries in ClickHouse Cloud require settings to use more than one replica.
  • Longer lookback queries could benefit from allocating more threads, ensuring better latency for less frequent, long-running queries.
  • Overriding the join algorithm for specific queries to optimize execution plans.
  • Setting a query ID for end-to-end query tracing and monitoring.

Providing this flexibility would improve query execution performance, especially for real-time applications.

If there is existing functionality to do this, happy to move this to a question so that others can find the info there.

Thank you!

@Korg95 Korg95 changed the title Allow for Query Hints / Settings to be accessible to query builder for a given driver [Feature request] Allow for Query Hints / Settings to be accessible to query builder for a given driver Oct 24, 2024
@igorlukanin igorlukanin added enhancement New feature proposal driver:clickhouse Issues related to the ClickHouse driver labels Nov 1, 2024
@igorlukanin
Copy link
Member

Hi @Korg95 👋

Thanks for a very elaborate proposal, I really appreciate that. That being said, I have marked this as ClickHouse-specific as we do not (currently?) get many similar requests regarding other data sources.

I'm also thankful for the list of specific use cases. May you please provide an example of parameter setting for each one of them? Some minimal yet working SQL for each example would really help—I don't want to misunderstand what is really required, hence my ask.

My current understanding is that most probably the solution might along the lines of allowing to initialize a database driver in Cube with an additional SQL that is executed once per connection. Further customization would be possible via defining multiple data sources and using the right one in each cube.

@Korg95
Copy link
Author

Korg95 commented Nov 4, 2024

Hi @igorlukanin thank you for your reply! Let me construct a proper response as per your asks to clarify

@alevene
Copy link

alevene commented Nov 7, 2024

Setting a query ID for end-to-end query tracing and monitoring.

+1 specifically to this use case. On each query submitted to ClickHouse, the ability to send a custom setting allows for tracing a request from the client to a ClickHouse query.

Example

SELECT
FROM cube.table
SETTINGS SQL_QUERY_ID = '33433698'

Once this is passed through as a setting, queries can be traced with the query_log

SELECT *
FROM clusterAllReplicas('{cluster}', system.query_log)
WHERE mapContains("Settings", 'SQL_QUERY_ID')
AND "Settings"['SQL_QUERY_ID'] = '33433698';

Without Cube having this capability, the client requires a post-processing step to tack this SETTINGS clause on to the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
driver:clickhouse Issues related to the ClickHouse driver enhancement New feature proposal
Projects
None yet
Development

No branches or pull requests

3 participants