-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(expr): add initial microbenchmark for expressions #6856
Conversation
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Codecov Report
@@ Coverage Diff @@
## main #6856 +/- ##
==========================================
- Coverage 73.25% 73.24% -0.01%
==========================================
Files 1032 1032
Lines 164648 164648
==========================================
- Hits 120610 120600 -10
- Misses 44038 44048 +10
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Signed-off-by: Runji Wang <wangrunji0408@163.com>
b927dfe
to
d212a60
Compare
I was once thinking this can be done by the compiler. 🥵 However, considering the control flow (side effect) introduced by |
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.
What's changed and what's your intention?
This PR introduces an initial microbenchmark for expressions.
Here are the bench results of some simple operations on two I32 chunks with size 1024:
... and these are the corresponding results on raw
Vec<i32>
s:The comparison looks really terrible. Even considering that our evaluation has null and overflow check, while the latter does not, the gap shouldn't be as large as this. This result shows that our implementation is far from optimal and has a lot of room for improvement.
To further show where the overhead comes from, this PR introduces more benches based on the raw add:
We can see that the main costs come from:
What blows my mind is the cost from the identical type castOn the other hand, when there was pure computation, the compiler could auto-vectorize it with SIMD. That's why the raw add oni32.try_into::<i32>()
. It seems that the compiler did not realize it is infallible and then eliminate the error handling process.Vec<i32>
performed super fast.This analysis points out the direction of next step optimization: for primitive types, we can first apply the operation on all values without caring about null values or overflow. The benefits from SIMD are much more than saving computations from null values or errors. This conclusion has also been proven in RisingLight: risinglightdb/risinglight#700.
Checklist
./risedev check
(or alias,./risedev c
)Refer to a related PR or issue link (optional)
#3524