-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[opt](function)Optimize the performance of the pad function under UTF-8. #40162
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 38528 ms
|
TPC-DS: Total hot run time: 192837 ms
|
ClickBench: Total hot run time: 32.31 s
|
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 38413 ms
|
TPC-DS: Total hot run time: 187491 ms
|
ClickBench: Total hot run time: 31.94 s
|
run buildall |
1 similar comment
run buildall |
TPC-H: Total hot run time: 38523 ms
|
TPC-DS: Total hot run time: 189125 ms
|
ClickBench: Total hot run time: 32.36 s
|
res_chars, res_offsets); | ||
auto [real_len, skip_chars] = simd::VStringFunctions::skip_leading_utf8( | ||
(const char*)str_data, (const char*)str_data + str_len, len); | ||
if (len <= skip_chars) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems wired. skip_chars
seems always <= len
be/src/util/simd/vstring_function.h
Outdated
@@ -187,6 +187,19 @@ class VStringFunctions { | |||
return p; | |||
} | |||
|
|||
static inline std::pair<size_t, size_t> skip_leading_utf8(const char* begin, const char* end, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment and change the func name. we should know what the functinon do. if just one place call the func, better be a lambda
be/src/util/simd/vstring_function.h
Outdated
@@ -187,6 +187,19 @@ class VStringFunctions { | |||
return p; | |||
} | |||
|
|||
static inline std::pair<size_t, size_t> skip_leading_utf8(const char* begin, const char* end, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment and change the func name. we should know what the functinon do. if just one place call the func, better be a lambda
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
} | ||
|
||
template <bool str_const, bool len_const, bool pad_const> | ||
void execute_utf8(const ColumnString::Offsets& strcol_offsets, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'execute_utf8' exceeds recommended size/complexity thresholds [readability-function-size]
void execute_utf8(const ColumnString::Offsets& strcol_offsets,
^
Additional context
be/src/vec/functions/function_string.h:1574: 82 lines including whitespace and comments (threshold 80)
void execute_utf8(const ColumnString::Offsets& strcol_offsets,
^
TPC-H: Total hot run time: 37769 ms
|
TPC-DS: Total hot run time: 193079 ms
|
ClickBench: Total hot run time: 31.85 s
|
run p0 |
simd::VStringFunctions::iterate_utf8_with_limit_length( | ||
(const char*)str_data, (const char*)str_data + str_len, len); | ||
// If iterate_char_len is greater than len, it indicates that the string's length exceeds len, so truncation. | ||
if (iterate_char_len >= len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not greater, only equal need to do truncate
run buildall |
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 38095 ms
|
TPC-DS: Total hot run time: 192290 ms
|
ClickBench: Total hot run time: 32.44 s
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
…-8. (#40162) ## Proposed changes 1. Removed the calculation of str_index. 2. If pad is constant, calculate pad_index only once. 3. Do not insert res_chars inside the loop; instead, insert them all together after the loop completes. ``` mysql [test]>select count(lpad(Title, 100, "abc")) from hits_10m; +--------------------------------+ | count(lpad(Title, 100, 'abc')) | +--------------------------------+ | 10000000 | +--------------------------------+ 1 row in set (3.97 sec) mysql [test]>select count(lpad(Title, 100, "abc")) from hits_10m; +--------------------------------+ | count(lpad(Title, 100, 'abc')) | +--------------------------------+ | 10000000 | +--------------------------------+ 1 row in set (2.87 sec) ``` <!--Describe your changes.-->
Proposed changes