Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize "LIMIT" queries for speed / memory with special TopK operator #7196

Closed
alamb opened this issue Aug 4, 2023 · 2 comments · Fixed by #7721
Closed

Optimize "LIMIT" queries for speed / memory with special TopK operator #7196

alamb opened this issue Aug 4, 2023 · 2 comments · Fixed by #7721
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Aug 4, 2023

Is your feature request related to a problem or challenge?

This pattern is common:

SELECT c1, c2
FROM t
ORDER BY c3
LIMIT 10

For example we have queries in IOx like the following (this is the same pattern @NGA-TRAN describes on #7162)

SELECT tag, value1, ...
FROM t
WHERE other_column = 'foo'
ORDER BY time
LIMIT 10

Describe the solution you'd like

If the data IS NOT already sorted, what happens today is a plan like

LIMIT(fetch=10)
  SORT(sort_exprs=[c3] fetch=10)
    SCAN(...)

And the Sort can take partial advantage of the fetch -- and it will be better after @gruuya 's change in #7180

We can probably do better still with a special operator like the following that uses some specialized structure (perhaps some type of heap)

TOPK(fetch=10, sort_exprs=[c3])
    SCAN(...)

Describe alternatives you've considered

If the data is already sorted the right way, DataFusion can just read first N values and stop as described on #7162

Additional context

No response

@alamb alamb added the enhancement New feature or request label Aug 4, 2023
@alamb alamb changed the title Optimize "LIMIT" queries Optimize "LIMIT" queries for speed / memory with special TopK opeartor Aug 4, 2023
@alamb
Copy link
Contributor Author

alamb commented Aug 4, 2023

FWIW I am not sure how much better a special topK operator will be than #7180

@alamb alamb changed the title Optimize "LIMIT" queries for speed / memory with special TopK opeartor Optimize "LIMIT" queries for speed / memory with special TopK operator Aug 7, 2023
@alamb
Copy link
Contributor Author

alamb commented Aug 11, 2023

I have a PR up with a proposal for a top K implementation if anyone would like to look: #7250

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
1 participant