-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[refine](bits) refine bytes_mask_to_bits_mask code #38360
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 40187 ms
|
TPC-DS: Total hot run time: 172284 ms
|
ClickBench: Total hot run time: 31.15 s
|
run p0 |
run buildall |
TPC-H: Total hot run time: 39783 ms
|
TPC-DS: Total hot run time: 171811 ms
|
ClickBench: Total hot run time: 31.14 s
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
#if defined(__ARM_NEON) && defined(__aarch64__) | ||
#include <arm_neon.h> | ||
#endif | ||
|
||
#include "util/sse_util.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'util/sse_util.hpp' file not found [clang-diagnostic-error]
#include "util/sse_util.hpp"
^
run buildall |
TPC-H: Total hot run time: 42211 ms
|
TPC-DS: Total hot run time: 169133 ms
|
ClickBench: Total hot run time: 30.05 s
|
run buildall |
TPC-H: Total hot run time: 41994 ms
|
TPC-DS: Total hot run time: 170285 ms
|
ClickBench: Total hot run time: 29.57 s
|
run p1 |
be/src/util/simd/bits.h
Outdated
/// todo(zeno) Compile add avx512 parameter, modify it to bytes64_mask_to_bits64_mask | ||
/// Transform 32-byte mask to 32-bit mask | ||
inline uint32_t bytes32_mask_to_bits32_mask(const uint8_t* data) { | ||
consteval inline auto bits_mask_length() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need add inline
keyword
run buildall |
TPC-H: Total hot run time: 41610 ms
|
TPC-DS: Total hot run time: 170398 ms
|
ClickBench: Total hot run time: 29.75 s
|
run buildall |
run buildall |
TPC-H: Total hot run time: 42257 ms
|
TPC-DS: Total hot run time: 170670 ms
|
ClickBench: Total hot run time: 30.42 s
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
## Proposed changes The previous code only considered the x86 architecture, and _mm_movemask_epi8 does not have a corresponding instruction in ARM. According to the article below, we need to abstract the overall logic. For ARM, optimize using the content mentioned in the following article: filter function origin 0.711375 seconds 0.7154 seconds 0.71782 seconds 0.715296 seconds filter function arm opt 0.559854 seconds 0.559854 seconds 0.559854 seconds 0.559854 seconds [link](https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon?CommentId=af187ac6-ae00-4e4d-bbf0-e142187aa92e)
The previous code only considered the x86 architecture, and _mm_movemask_epi8 does not have a corresponding instruction in ARM. According to the article below, we need to abstract the overall logic. For ARM, optimize using the content mentioned in the following article: filter function origin 0.711375 seconds 0.7154 seconds 0.71782 seconds 0.715296 seconds filter function arm opt 0.559854 seconds 0.559854 seconds 0.559854 seconds 0.559854 seconds [link](https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon?CommentId=af187ac6-ae00-4e4d-bbf0-e142187aa92e)
#38360 The previous code only considered the x86 architecture, and _mm_movemask_epi8 does not have a corresponding instruction in ARM. According to the article below, we need to abstract the overall logic. For ARM, optimize using the content mentioned in the following article: filter function origin 0.711375 seconds 0.7154 seconds 0.71782 seconds 0.715296 seconds filter function arm opt 0.559854 seconds 0.559854 seconds 0.559854 seconds 0.559854 seconds
Proposed changes
The previous code only considered the x86 architecture, and _mm_movemask_epi8 does not have a corresponding instruction in ARM. According to the article below, we need to abstract the overall logic.
For ARM, optimize using the content mentioned in the following article:
filter function origin 0.711375 seconds 0.7154 seconds 0.71782 seconds 0.715296 seconds
filter function arm opt 0.559854 seconds 0.559854 seconds 0.559854 seconds 0.559854 seconds
link