Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(planner): Enable new planner by default #5649

Closed
wants to merge 2 commits into from

Conversation

leiysky
Copy link
Contributor

@leiysky leiysky commented May 28, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Change default value of setting item enable_planner_v2 to 1.

To achieve this, we have to modify some test cases, to make it compatible with new planner framework.

Changelog

  • New Feature

Related Issues

Close #3747

@vercel
Copy link

vercel bot commented May 28, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) May 30, 2022 at 3:22PM (UTC)

@mergify
Copy link
Contributor

mergify bot commented May 28, 2022

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label May 28, 2022
@leiysky leiysky force-pushed the enable-new-planner branch from bafab31 to 6233bd1 Compare May 28, 2022 16:14
@leiysky leiysky force-pushed the enable-new-planner branch from c1e439b to 26d5e5e Compare May 29, 2022 05:57
@leiysky
Copy link
Contributor Author

leiysky commented May 29, 2022

@BohuTANG Is there any difference between executing in standalone test and cluster test(e.g. with different connector)? Seems it won't use new planner in cluster test.

@sundy-li
Copy link
Member

sundy-li commented May 29, 2022

Seems it won't use new planner in cluster test.

@leiysky There is && context.get_cluster().is_empty() in mysql_interactive_worker.rs

@leiysky
Copy link
Contributor Author

leiysky commented May 29, 2022

@leiysky There is && context.get_cluster().is_empty() in mysql_interactive_worker.rs

Thanks, I didn't notice that.

@leiysky leiysky force-pushed the enable-new-planner branch from 6769d4b to ac4acbe Compare May 30, 2022 00:55
@leiysky leiysky marked this pull request as ready for review May 30, 2022 07:15
@leiysky leiysky requested review from xudong963 and zhang2014 May 30, 2022 07:15
@BohuTANG
Copy link
Member

Hi @leiysky

Found many test cases are modified for the new planner.
If the new planner settings are enabled by default, what's the SQL will be affected that the user who is running databend-query in production?

@leiysky
Copy link
Contributor Author

leiysky commented May 30, 2022

Hi @leiysky

Found many test cases are modified for the new planner. If the new planner settings are enabled by default, what's the SQL will be affected that the user who is running databend-query in production?

Actually, you can find that most of the disabled test cases are the cases expected to fail. The reason why we have to disable them is we don't parse DfHint in new parser thus we cannot define failure via DfHint. We have a plan to migrate the test suite of query statements to sqllogic test, so we can define failure test without DfHint then.

The major difference between new planner and old planner are:

  • Quoted identifiers are quoted with "ident", thus users cannot use double quote for string literal anymore
  • The explain result is much different with old planner(all the EXPLAIN tests are removed)

@BohuTANG
Copy link
Member

Ok, if some users who already used databend in production, and upgrade to the new version(enable the new planner), does it also works for them? For example: explain and distributed query

@leiysky
Copy link
Contributor Author

leiysky commented May 30, 2022

Ok, if some users who already used databend in production, and upgrade to the new version(enable the new planner), does it also works for them? For example: explain and distributed query

EXPLAIN will work, but distributed query won't. The new planner can work only with new processor enabled, and seems new processor hasn't supported distributed query yet.

However the new processor has been enabled by default, so it doesn't matter if we enable the new planner by default. Users can utilize with distributed query by disabling new processor, the planner will fallback to old planner as well.

@leiysky leiysky force-pushed the enable-new-planner branch from 0ffc669 to f96bbc5 Compare May 30, 2022 10:22
@BohuTANG BohuTANG requested a review from sundy-li May 30, 2022 10:24
@leiysky leiysky force-pushed the enable-new-planner branch from f96bbc5 to 09a93f5 Compare May 30, 2022 10:29
@leiysky leiysky force-pushed the enable-new-planner branch from 09a93f5 to 11106ef Compare May 30, 2022 13:02
@BohuTANG
Copy link
Member

I am testing this PR with Ontime query:
https://databend.rs/doc/performance/ec2-s3-performance

For Q4, it does not work:

mysql> SELECT
    ->     IATA_CODE_Reporting_Airline AS Carrier,
    ->     count()
    -> FROM ontime
    -> WHERE (DepDelay > 10) AND (Year = 2007)
    -> GROUP BY Carrier
    -> ORDER BY count() DESC;
ERROR 1105 (HY000): Code: 1065, displayText = error: 
  --> SQL:7:10
  |
4 | FROM ontime
5 | WHERE (DepDelay > 10) AND (Year = 2007)
6 | GROUP BY Carrier
7 | ORDER BY count() DESC
  |          ^^^^^^^ can only order by column

@leiysky
Copy link
Contributor Author

leiysky commented May 31, 2022

I am testing this PR with Ontime query:

https://databend.rs/doc/performance/ec2-s3-performance

For Q4, it does not work:


mysql> SELECT

    ->     IATA_CODE_Reporting_Airline AS Carrier,

    ->     count()

    -> FROM ontime

    -> WHERE (DepDelay > 10) AND (Year = 2007)

    -> GROUP BY Carrier

    -> ORDER BY count() DESC;

ERROR 1105 (HY000): Code: 1065, displayText = error: 

  --> SQL:7:10

  |

4 | FROM ontime

5 | WHERE (DepDelay > 10) AND (Year = 2007)

6 | GROUP BY Carrier

7 | ORDER BY count() DESC

  |          ^^^^^^^ can only order by column



This is by design, we only allow to order by a column for now, to keep it more readable. You can give count() an alias in SELECT clause.

@BohuTANG BohuTANG marked this pull request as draft May 31, 2022 03:53
@BohuTANG
Copy link
Member

I convert to the draft.
There are some issues found by @wubx , he will file them soon.

@BohuTANG
Copy link
Member

From some tests, I don't think it's a good time to set the new planner by default to main branch, we have some users running databend on production with our release build.
I suggest we set it by default in a separate branch named planner_v2 of databend repo, so we can develop on it and sync it with the main.

What do you think? @leiysky @sundy-li

@leiysky
Copy link
Contributor Author

leiysky commented May 31, 2022

From some tests, I don't think it's a good time to set the new planner by default to main branch, we have some users running databend on production with our release build.

I suggest we set it by default in a separate branch named planner_v2 of databend repo, so we can develop on it and sync it with the main.

What do you think? @leiysky @sundy-li

I think it's fine not to enable the new planner by default for now. And there is no need to open a new branch for it.

@wubx
Copy link
Member

wubx commented May 31, 2022

ontime performance test very easy OOM.

Serious performance degradation.

New Planner Q1: 50.206s
Old Planner Q1: 0.2295s

SELECT DayOfWeek, count(*) AS c FROM ontime WHERE Year >= 2000 AND Year <= 2008 GROUP BY DayOfWeek ORDER BY c DESC;

@leiysky
Copy link
Contributor Author

leiysky commented May 31, 2022

ontime performance test very easy OOM.

Serious performance degradation.

New Planner Q1: 50.206s

Old Planner Q1: 0.2295s


SELECT DayOfWeek, count(*) AS c FROM ontime WHERE Year >= 2000 AND Year <= 2008 GROUP BY DayOfWeek ORDER BY c DESC;

I'll take a look.

@sundy-li
Copy link
Member

We can have another stateless-test to test the new planner by default.
It might take time to validate the feature testing & performance.

@fkuner fkuner self-requested a review June 22, 2022 09:39
@mergify
Copy link
Contributor

mergify bot commented Jun 28, 2022

This pull request's description is not fulfill the requirements. @leiysky please update it 🙏.

The description should contain the following:

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

## Summary

  Summary about this PR

## Changelog

- New Feature
- Bug Fix
- Improvement
- Performance Improvement
- Build/Testing/CI
- Documentation
- Not for changelog (changelog entry is not required)

## Related Issues

Fixes #issue

@Xuanwo
Copy link
Member

Xuanwo commented Jul 6, 2022

Hi, is this PR still going on? How about closing it until we are ready to start again?

@TCeason
Copy link
Collaborator

TCeason commented Jul 6, 2022

Hi, is this PR still going on? How about closing it until we are ready to start again?

Looks like in #6440?

@Xuanwo
Copy link
Member

Xuanwo commented Jul 18, 2022

It should be resolved by #6644. Let's close.

@Xuanwo Xuanwo closed this Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-review pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate to new planner framework
10 participants