perf(expr): add initial microbenchmark for expressions #6856

wangrunji0408 · 2022-12-12T07:12:29Z

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR introduces an initial microbenchmark for expressions.

Here are the bench results of some simple operations on two I32 chunks with size 1024:

expr/add/i32            time:   [3.3661 µs 3.3726 µs 3.3803 µs]
expr/mul/i32            time:   [3.4544 µs 3.4628 µs 3.4707 µs]
expr/eq/i32             time:   [3.7170 µs 3.7255 µs 3.7346 µs]
expr/sum/i32            time:   [1.2756 µs 1.2800 µs 1.2860 µs]

... and these are the corresponding results on raw Vec<i32>s:

raw/add/i32             time:   [89.131 ns 89.174 ns 89.231 ns]
raw/sum/i32             time:   [54.860 ns 54.937 ns 55.032 ns]

The comparison looks really terrible. Even considering that our evaluation has null and overflow check, while the latter does not, the gap shouldn't be as large as this. This result shows that our implementation is far from optimal and has a lot of room for improvement.

To further show where the overhead comes from, this PR introduces more benches based on the raw add:

raw/add/i32             time:   [91.600 ns 93.113 ns 94.833 ns]
raw/add/i32/zip_eq      time:   [585.07 ns 586.54 ns 588.01 ns]
raw/add/Option<i32>/zip_eq
                        time:   [938.88 ns 940.36 ns 942.10 ns]
raw/add/Option<i32>/zip_eq,checked
                        time:   [2.1017 µs 2.1104 µs 2.1201 µs]
raw/add/Option<i32>/zip_eq,cast,checked
                        time:   [2.3200 µs 2.3598 µs 2.4268 µs]
raw/add/Option<i32>/zip_eq,checked,cast,collect_array
                        time:   [3.1290 µs 3.1444 µs 3.1605 µs]
expr/add/i32            time:   [3.3661 µs 3.3726 µs 3.3803 µs]

We can see that the main costs come from:

zip_eq
option
error handling
building array

~~What blows my mind is the cost from the identical type cast i32.try_into::<i32>(). It seems that the compiler did not realize it is infallible and then eliminate the error handling process.~~ On the other hand, when there was pure computation, the compiler could auto-vectorize it with SIMD. That's why the raw add on Vec<i32> performed super fast.

This analysis points out the direction of next step optimization: for primitive types, we can first apply the operation on all values without caring about null values or overflow. The benefits from SIMD are much more than saving computations from null values or errors. This conclusion has also been proven in RisingLight: risinglightdb/risinglight#700.

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

#3524

Signed-off-by: Runji Wang <wangrunji0408@163.com>

codecov · 2022-12-12T07:56:09Z

Codecov Report

Merging #6856 (91db0e5) into main (ae398b3) will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #6856      +/-   ##
==========================================
- Coverage   73.25%   73.24%   -0.01%     
==========================================
  Files        1032     1032              
  Lines      164648   164648              
==========================================
- Hits       120610   120600      -10     
- Misses      44038    44048      +10

Flag	Coverage Δ
rust	`73.24% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/batch/src/executor/group_top_n.rs	`68.42% <0.00%> (-6.44%)`	⬇️
src/connector/src/source/filesystem/file_common.rs	`80.44% <0.00%> (-0.45%)`	⬇️
src/stream/src/executor/aggregation/minput.rs	`96.49% <0.00%> (+0.10%)`	⬆️
...frontend/src/scheduler/hummock_snapshot_manager.rs	`58.79% <0.00%> (+0.50%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Signed-off-by: Runji Wang <wangrunji0408@163.com>

BugenZhao · 2022-12-12T08:17:20Z

for primitive types, we can first apply the operation on all values without caring about null values or overflow. The benefits from SIMD are much more than saving computations from null values or errors

I was once thinking this can be done by the compiler. 🥵 However, considering the control flow (side effect) introduced by checked, it sounds reasonable for not being optimized.

src/expr/benches/expr.rs

Signed-off-by: Runji Wang <wangrunji0408@163.com>

src/expr/benches/expr.rs

lmatz

LGTM!

wangrunji0408 added 4 commits December 12, 2022 13:55

add bench for expr

0a76da8

Signed-off-by: Runji Wang <wangrunji0408@163.com>

add more baselines

92b9271

Signed-off-by: Runji Wang <wangrunji0408@163.com>

bench sum agg

35b29cf

Signed-off-by: Runji Wang <wangrunji0408@163.com>

fix error after rebase

e39ee8d

Signed-off-by: Runji Wang <wangrunji0408@163.com>

wangrunji0408 requested review from lmatz, TennyZhuang and BugenZhao December 12, 2022 07:12

github-actions bot added the type/perf label Dec 12, 2022

fix clippy and add bench for zip_eq

d212a60

Signed-off-by: Runji Wang <wangrunji0408@163.com>

wangrunji0408 force-pushed the wrj/expr-bench branch from b927dfe to d212a60 Compare December 12, 2022 08:10

BugenZhao reviewed Dec 12, 2022

View reviewed changes

src/expr/benches/expr.rs Show resolved Hide resolved

src/expr/benches/expr.rs Show resolved Hide resolved

wangrunji0408 added 2 commits December 12, 2022 16:22

add bench for collecting to array

980dcda

Signed-off-by: Runji Wang <wangrunji0408@163.com>

fix zip_eq

03a12bd

Signed-off-by: Runji Wang <wangrunji0408@163.com>

liurenjie1024 reviewed Dec 12, 2022

View reviewed changes

src/expr/benches/expr.rs Show resolved Hide resolved

lmatz approved these changes Dec 12, 2022

View reviewed changes

BugenZhao approved these changes Dec 12, 2022

View reviewed changes

wangrunji0408 added the mergify/can-merge label Dec 12, 2022

Merge branch 'main' into wrj/expr-bench

91db0e5

wangrunji0408 mentioned this pull request Dec 12, 2022

Tracking: Optimize the performance of expressions #6868

Closed

mergify bot merged commit f2975a3 into main Dec 12, 2022

mergify bot deleted the wrj/expr-bench branch December 12, 2022 10:30

wangrunji0408 mentioned this pull request Dec 13, 2022

perf(array): optimize performance of literal expression #6877

Merged

3 tasks

kwannoel mentioned this pull request Feb 8, 2023

replace zip_eq with optimizable version #7779

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(expr): add initial microbenchmark for expressions #6856

perf(expr): add initial microbenchmark for expressions #6856

wangrunji0408 commented Dec 12, 2022 •

edited

Loading

codecov bot commented Dec 12, 2022 •

edited

Loading

BugenZhao commented Dec 12, 2022

lmatz left a comment

perf(expr): add initial microbenchmark for expressions #6856

perf(expr): add initial microbenchmark for expressions #6856

Conversation

wangrunji0408 commented Dec 12, 2022 • edited Loading

What's changed and what's your intention?

Checklist

Refer to a related PR or issue link (optional)

codecov bot commented Dec 12, 2022 • edited Loading

Codecov Report

BugenZhao commented Dec 12, 2022

lmatz left a comment

Choose a reason for hiding this comment

wangrunji0408 commented Dec 12, 2022 •

edited

Loading

codecov bot commented Dec 12, 2022 •

edited

Loading