-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature](shuffle) enable strict consistency dml by default #32958
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
TPC-H: Total hot run time: 37802 ms
|
TPC-DS: Total hot run time: 182340 ms
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
run buildall |
run buildall |
1 similar comment
run buildall |
TPC-H: Total hot run time: 38533 ms
|
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
run buildall |
run buildall |
5 similar comments
run buildall |
run buildall |
run buildall |
run buildall |
run buildall |
TPC-H: Total hot run time: 41764 ms
|
TPC-DS: Total hot run time: 186417 ms
|
@@ -636,6 +636,12 @@ public class Config extends ConfigBase { | |||
varType = VariableAnnotation.EXPERIMENTAL) | |||
public static boolean enable_single_replica_load = false; | |||
|
|||
@ConfField(mutable = true, masterOnly = true, description = { | |||
"对于 DUPLICATE KEY 表启用 shuffle 的最小 tablet 数量", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain more about when should user set this config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enable shuffle can bring both positive effect and negative effect.
Shuffle must be enabled for UNIQ
and AGG
tables for data consistency reasons.
But for DUP
tables, we can do some trade-offs.
If shuffle is disabled, the load will be faster but it will use more memory.
If shuffle is enabled, the load will be slower but it will use less memory.
When loading into table with more buckets, it's more likely to be memory constrained.
So we choose to enable shuffle only for tables with certain amount buckets.
TPC-DS: Total hot run time: 186949 ms
|
run buildall |
TPC-H: Total hot run time: 40818 ms
|
TPC-DS: Total hot run time: 185397 ms
|
run buildall |
TPC-DS: Total hot run time: 186939 ms
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
run buildall |
PR approved by at least one committer and no changes requested. |
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Proposed changes
Set
enable_strict_consistenct_dml
totrue
by default.Enable shuffle can bring both positive effect and negative effect.
Shuffle must be enabled for
UNIQ
andAGG
tables for data consistency reasons.But for
DUP
tables, we can do some trade-offs.If shuffle is disabled, the load will be faster but it will use more memory.
If shuffle is enabled, the load will be slower but it will use less memory.
When loading into table with more buckets, it's more likely to be memory constrained.
So we choose to enable shuffle only for tables with certain amount buckets.
The threshold is set by
min_tablets_for_dup_table_shuffle
in be.conf.Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...