-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking: Optimize the performance of expressions #6868
Comments
Just curious, should we then expect I suppose yes, as we are evaluating the expression tree array-wise? (maybe we could add these microbench cases with inputref to confirm) |
Yes, as the overhead of InputRef itself is negligible (should be only an Arc clone). |
This PR completes the micro-benchmark framework for expressions. It can be run with: ```sh cd src/expr # run all benches cargo bench --bench expr -- --quick # list all benches cargo bench --bench expr -- --list # run specified benches cargo bench --bench expr -- --quick "add\(int32,int32\)" ``` The detailed bench results have been updated to #6868. Here is a statistical overview of all expressions: <img width="470" alt="截屏2022-12-20 21 13 24" src="https://user-images.githubusercontent.com/15158738/208812177-529f62aa-4590-4dbc-b93b-9d8f1151418c.png"> To enumerate all valid expressions, we utilize the function signature maps defined in the frontend. We moved them into the expr crate in order to avoid dependency on the frontend thus reducing the compilation time. Approved-By: lmatz
This PR completes the micro-benchmark framework for expressions. It can be run with: ```sh cd src/expr # run all benches cargo bench --bench expr -- --quick # list all benches cargo bench --bench expr -- --list # run specified benches cargo bench --bench expr -- --quick "add\(int32,int32\)" ``` The detailed bench results have been updated to #6868. Here is a statistical overview of all expressions: <img width="470" alt="截屏2022-12-20 21 13 24" src="https://user-images.githubusercontent.com/15158738/208812177-529f62aa-4590-4dbc-b93b-9d8f1151418c.png"> To enumerate all valid expressions, we utilize the function signature maps defined in the frontend. We moved them into the expr crate in order to avoid dependency on the frontend thus reducing the compilation time. Approved-By: lmatz
* fix: clean the verbose logs of "failed to send message to actor" (#6973) As title. ``` // println!("{:?}", &chunk); StreamChunk { cardinality: 4, capacity: 4, .. } // println!("{:#?}", &chunk); StreamChunk { cardinality: 4, capacity: 4, data: +----+---+---+ | + | 1 | 6 | | - | 2 | | | U- | 3 | 7 | | U+ | 4 | | +----+---+---+ } ``` Approved-By: BugenZhao Co-Authored-By: Eric Fu <eric@singularity-data.com> * feat(stream): Make scale DAG aware (#7013) **This section will be used as the commit message. Please do not leave this empty!** Please explain **IN DETAIL** what the changes are in this PR and why they are needed: - Let `Reschedule` accept more than one downstream fragment. - Let the dispatcher id calculated from the exchange operator id together with the upstream and downstream id. Approved-By: BugenZhao Co-Authored-By: Dylan Chen <zilin@singularity-data.com> * chore: log AST in sqlparser (#7012) Log AST in sqlparser for better debugging. Also make it easier to configure logs. Approved-By: lmatz Co-Authored-By: xxchan <xxchan22f@gmail.com> * feat(streaming): do not backfill for empty table (#7009) If the snapshot is empty, we don't need to backfill and can immediately finish the progress. This can speed up some tests. ``` dev=> create materialized view mv2 as select * from t; CREATE_MATERIALIZED_VIEW Time: 1033.834 ms (00:01.034) dev=> delete from t; DELETE 1 Time: 9.869 ms dev=> create materialized view mv3 as select * from t; CREATE_MATERIALIZED_VIEW Time: 18.550 ms ``` Note that every executor requires a barrier for the first message. So if there are few records in the table (but not empty), we cannot adapt this optimization. The further plan might be to issue next checkpoints more frequently for this case. Approved-By: chenzl25 Co-Authored-By: Bugen Zhao <i@bugenzhao.com> * refactor(logging): be aware of RUST_LOG env (#7016) This PR supports overwriting log filters predefined in `init_risingwave_logger`, by specifying RUST_LOG environment variable. One use case is I want to suppress certain logs in CI, to avoid large log size. Approved-By: BugenZhao Approved-By: xxchan Co-Authored-By: zwang28 <84491488@qq.com> * perf(bitmap): change the buffer unit from `u8` to `usize` (#7030) * bitmap: use pointer-sized element for underlying buffer Signed-off-by: Runji Wang <wangrunji0408@163.com> * use `Box<[usize]>` to save 8 bytes Signed-off-by: Runji Wang <wangrunji0408@163.com> * rename functions and add docs Signed-off-by: Runji Wang <wangrunji0408@163.com> * add bench for bitmap Signed-off-by: Runji Wang <wangrunji0408@163.com> Signed-off-by: Runji Wang <wangrunji0408@163.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * refactor(optimizer): rename optimizer rule filename. (#7038) rename optimizer rule * chore(connector): fix log level (#7043) `info -> debug` otherwise print every record. Approved-By: tabVersion * fix: check sink properties and validate them in advance to avoid panic or recovery (#7041) Check whether sink properties provided in frontend and simply validate them when building executors to avoid panic or recovery. Approved-By: BugenZhao * fix(batch): match `probe_row` once for `SemiJoin` with `non_equi` predicates (#7033) ### Problem - For a probe key, after chunk is spilled, we may continue appending rows for processing. - This happens even though the probe key already has found matching build row. - These rows should be discarded instead / not appended. - As a result, when processing the next spilled chunk, they are also included inside, and finally returned in the results as duplicate rows. - As you can see in the [expected results before fix](https://github.com/risingwavelabs/risingwave/pull/7033/commits/ec6eac5fbac700ed9c03662592a30886a198d248), we have 5 probe rows, but results return 8 output. ### Solution - if probe row match, break and do matching for next probe row. Then we won't process duplicate matches for matched probe rows. - If probe row not match, continue appending. The appended rows will be contained in buffered chunk and processed later. Approved-By: chenzl25 * perf(expr): complete expression benchmark framework (#6995) This PR completes the micro-benchmark framework for expressions. It can be run with: ```sh cd src/expr # run all benches cargo bench --bench expr -- --quick # list all benches cargo bench --bench expr -- --list # run specified benches cargo bench --bench expr -- --quick "add\(int32,int32\)" ``` The detailed bench results have been updated to #6868. Here is a statistical overview of all expressions: <img width="470" alt="截屏2022-12-20 21 13 24" src="https://user-images.githubusercontent.com/15158738/208812177-529f62aa-4590-4dbc-b93b-9d8f1151418c.png"> To enumerate all valid expressions, we utilize the function signature maps defined in the frontend. We moved them into the expr crate in order to avoid dependency on the frontend thus reducing the compilation time. Approved-By: lmatz * perf(expr): prebuild the AC for `to_char` (#7048) As title, put it on lazy-static to avoid building every time. bench | Before time(us) | After time(us) | Change(%) -- | -- | -- | -- to_char(timestamp,varchar) | 11060.000 | 283.900 | -97.4% Approved-By: TennyZhuang * fix(alloc): missing padding in realloc (#7046) as title Approved-By: TennyZhuang * fix(parser): fix parsing nested wildcard struct field access (#7024) Fix #7011: nested wildcard struct field access panics when there are additional parentheses. Also did some minor style refactoring and added some comments. Approved-By: st1page Approved-By: yezizp2012 * fix: fix parser test of wildcard struct field with additional parentheses (#7052) Fix parser test of wildcard struct field with additional parentheses: https://buildkite.com/risingwavelabs/main-cron/builds/282#01854176-ffb0-4018-aa7b-fe128a411a41 Approved-By: xxchan * feat(optimizer): support union merge rule (#7037) **This section will be used as the commit message. Please do not leave this empty!** Please explain **IN DETAIL** what the changes are in this PR and why they are needed: - Merge all binary unions into multi inputs union. Approved-By: st1page * fix: fix period non-zero panic in election (#7065) Fix #7061 Approved-By: lmatz * perf(expr): optimize lower/upper/trim/md5 (#7047) This PR optimizes `lower`/`upper`/`trim`/`md5` operations by avoiding generating String. bench | Before time(us) | After time(us) | Change(%) -- | -- | -- | -- md5(varchar) | 447.040 | 338.540 | -24.3% ltrim(varchar,varchar) | 38.893 | 20.666 | -46.9% rtrim(varchar,varchar) | 38.870 | 22.311 | -42.6% trim(varchar,varchar) | 38.327 | 20.831 | -45.6% lower(varchar) | 36.172 | 10.851 | -70.0% upper(varchar) | 34.607 | 10.946 | -68.4% Approved-By: lmatz * perf(expr): reduce format parsing in to_char (#7051) Reduce duplicated format parsing during evaluation of `to_char`. The perf improve by about 20% with a constant format string "YYYY/MM/DD HH24:MI:SS", but downgraded with the current benchmark framework due to #7050 . Approved-By: lmatz Approved-By: wangrunji0408 * perf(expr): cover the fast path for `to_char(timestamp)` (#7056) This PR changes the input of bench `to_char(timestamp,varchar)`. It makes the second argument a constant format string to cover the fast path of evaluation. ``` to_char(timestamp,varchar) time: [245.77 µs 245.80 µs 245.89 µs] change: [-70.241% -70.126% -70.010%] (p = 0.07 > 0.05) ``` Approved-By: lmatz * perf(expr): vectorize infallible operations (10x speedup) (#7055) This PR vectorizes the following infallible operations: - `and/or/not` - `bitwise_{and/or/xor/not}` - `is[_not]_{true/false}` - `eq/ne/gt/lt/ge/le` - `is[_not]_distinct_from` - `round/ceil/floor` making them 10x faster on average, and up to 50x speed up. The distribution curve of all operation times before and after this PR: <img width="491" alt="perf-stat" src="https://user-images.githubusercontent.com/15158738/209499628-1631c779-b6d5-49be-b842-8dcf89a23e1a.png"> <details> <summary>Click to show full results</summary> bench | Before time(us) | After time(us) | Change(%) | Speedup -- | -- | -- | -- | -- and(boolean,boolean) | 11.110 | 0.417 | -96.2% | 25.6 bitwise_and(int16,int16) | 5.284 | 0.142 | -97.3% | 36.2 bitwise_and(int16,int32) | 4.540 | 0.170 | -96.3% | 25.7 bitwise_and(int16,int64) | 4.535 | 0.257 | -94.3% | 16.7 bitwise_and(int32,int16) | 4.509 | 0.183 | -95.9% | 23.6 bitwise_and(int32,int32) | 4.495 | 0.185 | -95.9% | 23.3 bitwise_and(int32,int64) | 4.549 | 0.232 | -94.9% | 18.6 bitwise_and(int64,int16) | 4.489 | 0.241 | -94.6% | 17.6 bitwise_and(int64,int32) | 4.500 | 0.254 | -94.4% | 16.7 bitwise_and(int64,int64) | 4.508 | 0.268 | -94.0% | 15.8 bitwise_not(int16) | 4.363 | 0.116 | -97.3% | 36.5 bitwise_not(int32) | 4.353 | 0.164 | -96.2% | 25.6 bitwise_not(int64) | 4.358 | 0.219 | -95.0% | 18.9 bitwise_or(int16,int16) | 5.278 | 0.142 | -97.3% | 36.3 bitwise_or(int16,int32) | 4.603 | 0.171 | -96.3% | 25.9 bitwise_or(int16,int64) | 4.556 | 0.242 | -94.7% | 17.8 bitwise_or(int32,int16) | 4.531 | 0.169 | -96.3% | 25.8 bitwise_or(int32,int32) | 4.495 | 0.182 | -95.9% | 23.7 bitwise_or(int32,int64) | 4.532 | 0.250 | -94.5% | 17.1 bitwise_or(int64,int16) | 4.493 | 0.237 | -94.7% | 17.9 bitwise_or(int64,int32) | 4.600 | 0.251 | -94.5% | 17.3 bitwise_or(int64,int64) | 4.495 | 0.254 | -94.3% | 16.7 bitwise_xor(int16,int16) | 5.292 | 0.137 | -97.4% | 37.6 bitwise_xor(int16,int32) | 4.561 | 0.172 | -96.2% | 25.5 bitwise_xor(int16,int64) | 4.526 | 0.238 | -94.7% | 18.0 bitwise_xor(int32,int16) | 4.486 | 0.179 | -96.0% | 24.0 bitwise_xor(int32,int32) | 4.508 | 0.200 | -95.6% | 21.6 bitwise_xor(int32,int64) | 4.574 | 0.266 | -94.2% | 16.2 bitwise_xor(int64,int16) | 4.596 | 0.254 | -94.5% | 17.1 bitwise_xor(int64,int32) | 4.502 | 0.292 | -93.5% | 14.4 bitwise_xor(int64,int64) | 4.502 | 0.276 | -93.9% | 15.3 ceil(decimal) | 7.564 | 3.507 | -53.6% | 1.2 ceil(float64) | 4.358 | 0.208 | -95.2% | 20.0 equal(boolean,boolean) | 7.444 | 0.136 | -98.2% | 53.5 equal(date,date) | 7.079 | 0.491 | -93.1% | 13.4 equal(date,timestamp) | 7.572 | 1.033 | -86.4% | 6.3 equal(decimal,decimal) | 12.715 | 8.304 | -34.7% | 0.5 equal(decimal,float32) | 12.879 | 5.474 | -57.5% | 1.4 equal(decimal,float64) | 12.945 | 5.349 | -58.7% | 1.4 equal(decimal,int16) | 11.359 | 4.574 | -59.7% | 1.5 equal(decimal,int32) | 11.384 | 4.472 | -60.7% | 1.5 equal(decimal,int64) | 11.410 | 4.615 | -59.6% | 1.5 equal(float32,decimal) | 12.972 | 5.566 | -57.1% | 1.3 equal(float32,float32) | 7.219 | 0.678 | -90.6% | 9.6 equal(float32,float64) | 7.337 | 0.853 | -88.4% | 7.6 equal(float32,int16) | 6.858 | 0.841 | -87.7% | 7.2 equal(float32,int32) | 6.826 | 0.712 | -89.6% | 8.6 equal(float32,int64) | 7.003 | 0.667 | -90.5% | 9.5 equal(float64,decimal) | 12.835 | 5.507 | -57.1% | 1.3 equal(float64,float32) | 7.271 | 0.869 | -88.0% | 7.4 equal(float64,float64) | 7.220 | 0.922 | -87.2% | 6.8 equal(float64,int16) | 6.772 | 0.878 | -87.0% | 6.7 equal(float64,int32) | 6.752 | 0.757 | -88.8% | 7.9 equal(float64,int64) | 6.724 | 0.723 | -89.2% | 8.3 equal(int16,decimal) | 11.437 | 4.623 | -59.6% | 1.5 equal(int16,float32) | 6.785 | 0.746 | -89.0% | 8.1 equal(int16,float64) | 6.767 | 0.709 | -89.5% | 8.6 equal(int16,int16) | 7.911 | 0.476 | -94.0% | 15.6 equal(int16,int32) | 7.087 | 0.494 | -93.0% | 13.4 equal(int16,int64) | 7.105 | 0.565 | -92.0% | 11.6 equal(int32,decimal) | 11.105 | 4.449 | -59.9% | 1.5 equal(int32,float32) | 6.749 | 0.641 | -90.5% | 9.5 equal(int32,float64) | 6.714 | 0.596 | -91.1% | 10.3 equal(int32,int16) | 7.086 | 0.492 | -93.1% | 13.4 equal(int32,int32) | 7.094 | 0.489 | -93.1% | 13.5 equal(int32,int64) | 7.118 | 0.546 | -92.3% | 12.0 equal(int64,decimal) | 11.331 | 4.622 | -59.2% | 1.5 equal(int64,float32) | 6.732 | 0.593 | -91.2% | 10.3 equal(int64,float64) | 6.808 | 0.567 | -91.7% | 11.0 equal(int64,int16) | 7.103 | 0.572 | -92.0% | 11.4 equal(int64,int32) | 7.089 | 0.550 | -92.2% | 11.9 equal(int64,int64) | 7.086 | 0.556 | -92.1% | 11.7 equal(interval,interval) | 8.289 | 2.381 | -71.3% | 2.5 equal(interval,time) | 7.969 | 1.874 | -76.5% | 3.3 equal(time,interval) | 7.669 | 2.039 | -73.4% | 2.8 equal(time,time) | 7.523 | 0.546 | -92.7% | 12.8 equal(timestamp,date) | 7.622 | 1.040 | -86.4% | 6.3 equal(timestamp,timestamp) | 8.041 | 1.159 | -85.6% | 5.9 equal(timestampz,timestampz) | 7.099 | 0.545 | -92.3% | 12.0 equal(varchar,varchar) | 12.215 | 12.335 | 1.0% | -0.0 floor(decimal) | 7.532 | 2.890 | -61.6% | 1.6 floor(float64) | 4.357 | 0.211 | -95.2% | 19.6 greater_than_or_equal(boolean,boolean) | 7.376 | 0.174 | -97.6% | 41.3 greater_than_or_equal(date,date) | 7.082 | 0.509 | -92.8% | 12.9 greater_than_or_equal(date,timestamp) | 7.970 | 2.087 | -73.8% | 2.8 greater_than_or_equal(decimal,decimal) | 12.622 | 8.342 | -33.9% | 0.5 greater_than_or_equal(decimal,float32) | 13.727 | 6.046 | -56.0% | 1.3 greater_than_or_equal(decimal,float64) | 13.906 | 5.957 | -57.2% | 1.3 greater_than_or_equal(decimal,int16) | 12.191 | 4.627 | -62.0% | 1.6 greater_than_or_equal(decimal,int32) | 12.040 | 4.524 | -62.4% | 1.7 greater_than_or_equal(decimal,int64) | 12.128 | 4.636 | -61.8% | 1.6 greater_than_or_equal(float32,decimal) | 13.212 | 6.129 | -53.6% | 1.2 greater_than_or_equal(float32,float32) | 7.594 | 0.697 | -90.8% | 9.9 greater_than_or_equal(float32,float64) | 7.631 | 0.874 | -88.6% | 7.7 greater_than_or_equal(float32,int16) | 7.751 | 1.160 | -85.0% | 5.7 greater_than_or_equal(float32,int32) | 7.620 | 1.021 | -86.6% | 6.5 greater_than_or_equal(float32,int64) | 7.632 | 0.984 | -87.1% | 6.8 greater_than_or_equal(float64,decimal) | 13.328 | 5.996 | -55.0% | 1.2 greater_than_or_equal(float64,float32) | 7.758 | 0.963 | -87.6% | 7.1 greater_than_or_equal(float64,float64) | 7.607 | 0.930 | -87.8% | 7.2 greater_than_or_equal(float64,int16) | 7.764 | 1.217 | -84.3% | 5.4 greater_than_or_equal(float64,int32) | 7.673 | 1.102 | -85.6% | 6.0 greater_than_or_equal(float64,int64) | 7.621 | 1.035 | -86.4% | 6.4 greater_than_or_equal(int16,decimal) | 11.404 | 4.960 | -56.5% | 1.3 greater_than_or_equal(int16,float32) | 6.930 | 0.832 | -88.0% | 7.3 greater_than_or_equal(int16,float64) | 6.948 | 0.860 | -87.6% | 7.1 greater_than_or_equal(int16,int16) | 7.886 | 0.458 | -94.2% | 16.2 greater_than_or_equal(int16,int32) | 7.125 | 0.491 | -93.1% | 13.5 greater_than_or_equal(int16,int64) | 7.102 | 0.569 | -92.0% | 11.5 greater_than_or_equal(int32,decimal) | 11.355 | 4.701 | -58.6% | 1.4 greater_than_or_equal(int32,float32) | 6.858 | 0.707 | -89.7% | 8.7 greater_than_or_equal(int32,float64) | 6.852 | 0.761 | -88.9% | 8.0 greater_than_or_equal(int32,int16) | 7.129 | 0.509 | -92.9% | 13.0 greater_than_or_equal(int32,int32) | 7.176 | 0.506 | -93.0% | 13.2 greater_than_or_equal(int32,int64) | 7.121 | 0.566 | -92.1% | 11.6 greater_than_or_equal(int64,decimal) | 11.442 | 4.946 | -56.8% | 1.3 greater_than_or_equal(int64,float32) | 6.861 | 0.692 | -89.9% | 8.9 greater_than_or_equal(int64,float64) | 6.861 | 0.728 | -89.4% | 8.4 greater_than_or_equal(int64,int16) | 7.163 | 0.578 | -91.9% | 11.4 greater_than_or_equal(int64,int32) | 7.094 | 0.563 | -92.1% | 11.6 greater_than_or_equal(int64,int64) | 7.131 | 0.560 | -92.1% | 11.7 greater_than_or_equal(interval,interval) | 8.229 | 2.400 | -70.8% | 2.4 greater_than_or_equal(interval,time) | 8.928 | 2.687 | -69.9% | 2.3 greater_than_or_equal(time,interval) | 7.858 | 2.796 | -64.4% | 1.8 greater_than_or_equal(time,time) | 7.745 | 0.669 | -91.4% | 10.6 greater_than_or_equal(timestamp,date) | 7.252 | 1.433 | -80.2% | 4.1 greater_than_or_equal(timestamp,timestamp) | 8.461 | 2.235 | -73.6% | 2.8 greater_than_or_equal(timestampz,timestampz) | 7.088 | 0.547 | -92.3% | 12.0 greater_than_or_equal(varchar,varchar) | 12.385 | 12.439 | 0.4% | -0.0 greater_than(boolean,boolean) | 6.948 | 0.175 | -97.5% | 38.6 greater_than(date,date) | 6.702 | 0.488 | -92.7% | 12.7 greater_than(date,timestamp) | 7.562 | 2.103 | -72.2% | 2.6 greater_than(decimal,decimal) | 11.878 | 8.252 | -30.5% | 0.4 greater_than(decimal,float32) | 12.946 | 6.008 | -53.6% | 1.2 greater_than(decimal,float64) | 12.897 | 6.000 | -53.5% | 1.1 greater_than(decimal,int16) | 12.214 | 4.701 | -61.5% | 1.6 greater_than(decimal,int32) | 12.067 | 4.527 | -62.5% | 1.7 greater_than(decimal,int64) | 12.238 | 4.724 | -61.4% | 1.6 greater_than(float32,decimal) | 12.488 | 6.309 | -49.5% | 1.0 greater_than(float32,float32) | 7.017 | 0.818 | -88.3% | 7.6 greater_than(float32,float64) | 7.145 | 1.071 | -85.0% | 5.7 greater_than(float32,int16) | 7.701 | 1.176 | -84.7% | 5.5 greater_than(float32,int32) | 7.635 | 1.028 | -86.5% | 6.4 greater_than(float32,int64) | 7.552 | 0.989 | -86.9% | 6.6 greater_than(float64,decimal) | 12.434 | 6.205 | -50.1% | 1.0 greater_than(float64,float32) | 7.115 | 1.114 | -84.3% | 5.4 greater_than(float64,float64) | 7.162 | 1.161 | -83.8% | 5.2 greater_than(float64,int16) | 7.722 | 1.217 | -84.2% | 5.3 greater_than(float64,int32) | 7.700 | 1.095 | -85.8% | 6.0 greater_than(float64,int64) | 7.598 | 1.052 | -86.1% | 6.2 greater_than(int16,decimal) | 11.414 | 5.041 | -55.8% | 1.3 greater_than(int16,float32) | 6.936 | 0.848 | -87.8% | 7.2 greater_than(int16,float64) | 6.858 | 0.879 | -87.2% | 6.8 greater_than(int16,int16) | 7.271 | 0.469 | -93.6% | 14.5 greater_than(int16,int32) | 6.766 | 0.496 | -92.7% | 12.6 greater_than(int16,int64) | 6.682 | 0.570 | -91.5% | 10.7 greater_than(int32,decimal) | 11.305 | 4.732 | -58.1% | 1.4 greater_than(int32,float32) | 6.886 | 0.714 | -89.6% | 8.6 greater_than(int32,float64) | 6.820 | 0.745 | -89.1% | 8.2 greater_than(int32,int16) | 6.668 | 0.493 | -92.6% | 12.5 greater_than(int32,int32) | 6.983 | 0.491 | -93.0% | 13.2 greater_than(int32,int64) | 6.699 | 0.547 | -91.8% | 11.2 greater_than(int64,decimal) | 11.416 | 5.031 | -55.9% | 1.3 greater_than(int64,float32) | 6.898 | 0.679 | -90.2% | 9.2 greater_than(int64,float64) | 6.808 | 0.730 | -89.3% | 8.3 greater_than(int64,int16) | 6.772 | 0.573 | -91.5% | 10.8 greater_than(int64,int32) | 6.692 | 0.551 | -91.8% | 11.1 greater_than(int64,int64) | 6.719 | 0.547 | -91.9% | 11.3 greater_than(interval,interval) | 7.674 | 2.447 | -68.1% | 2.1 greater_than(interval,time) | 8.831 | 2.754 | -68.8% | 2.2 greater_than(time,interval) | 7.853 | 3.016 | -61.6% | 1.6 greater_than(time,time) | 7.229 | 0.778 | -89.2% | 8.3 greater_than(timestamp,date) | 7.123 | 1.456 | -79.6% | 3.9 greater_than(timestamp,timestamp) | 7.808 | 2.193 | -71.9% | 2.6 greater_than(timestampz,timestampz) | 6.691 | 0.561 | -91.6% | 10.9 greater_than(varchar,varchar) | 11.620 | 11.953 | 2.9% | -0.0 is_distinct_from(boolean,boolean) | 7.967 | 0.293 | -96.3% | 26.2 is_distinct_from(date,date) | 6.830 | 0.546 | -92.0% | 11.5 is_distinct_from(date,timestamp) | 7.031 | 1.068 | -84.8% | 5.6 is_distinct_from(decimal,decimal) | 12.050 | 8.189 | -32.0% | 0.5 is_distinct_from(decimal,float32) | 12.670 | 6.270 | -50.5% | 1.0 is_distinct_from(decimal,float64) | 12.565 | 6.312 | -49.8% | 1.0 is_distinct_from(decimal,int16) | 12.300 | 4.674 | -62.0% | 1.6 is_distinct_from(decimal,int32) | 12.231 | 4.553 | -62.8% | 1.7 is_distinct_from(decimal,int64) | 12.417 | 4.640 | -62.6% | 1.7 is_distinct_from(float32,decimal) | 12.468 | 6.309 | -49.4% | 1.0 is_distinct_from(float32,float32) | 6.885 | 0.707 | -89.7% | 8.7 is_distinct_from(float32,float64) | 6.894 | 0.899 | -87.0% | 6.7 is_distinct_from(float32,int16) | 7.597 | 0.927 | -87.8% | 7.2 is_distinct_from(float32,int32) | 7.529 | 0.804 | -89.3% | 8.4 is_distinct_from(float32,int64) | 7.413 | 0.768 | -89.6% | 8.7 is_distinct_from(float64,decimal) | 12.303 | 6.279 | -49.0% | 1.0 is_distinct_from(float64,float32) | 6.850 | 0.899 | -86.9% | 6.6 is_distinct_from(float64,float64) | 6.917 | 0.944 | -86.4% | 6.3 is_distinct_from(float64,int16) | 7.578 | 0.992 | -86.9% | 6.6 is_distinct_from(float64,int32) | 7.427 | 0.866 | -88.3% | 7.6 is_distinct_from(float64,int64) | 7.443 | 0.835 | -88.8% | 7.9 is_distinct_from(int16,decimal) | 12.005 | 4.642 | -61.3% | 1.6 is_distinct_from(int16,float32) | 7.492 | 0.839 | -88.8% | 7.9 is_distinct_from(int16,float64) | 7.403 | 0.801 | -89.2% | 8.2 is_distinct_from(int16,int16) | 7.458 | 0.500 | -93.3% | 13.9 is_distinct_from(int16,int32) | 6.834 | 0.549 | -92.0% | 11.5 is_distinct_from(int16,int64) | 6.749 | 0.653 | -90.3% | 9.3 is_distinct_from(int32,decimal) | 11.828 | 4.436 | -62.5% | 1.7 is_distinct_from(int32,float32) | 7.353 | 0.727 | -90.1% | 9.1 is_distinct_from(int32,float64) | 7.265 | 0.683 | -90.6% | 9.6 is_distinct_from(int32,int16) | 6.729 | 0.556 | -91.7% | 11.1 is_distinct_from(int32,int32) | 6.771 | 0.550 | -91.9% | 11.3 is_distinct_from(int32,int64) | 6.755 | 0.653 | -90.3% | 9.3 is_distinct_from(int64,decimal) | 11.955 | 4.716 | -60.6% | 1.5 is_distinct_from(int64,float32) | 7.365 | 0.671 | -90.9% | 10.0 is_distinct_from(int64,float64) | 7.292 | 0.654 | -91.0% | 10.2 is_distinct_from(int64,int16) | 6.727 | 0.672 | -90.0% | 9.0 is_distinct_from(int64,int32) | 6.734 | 0.624 | -90.7% | 9.8 is_distinct_from(int64,int64) | 6.767 | 0.614 | -90.9% | 10.0 is_distinct_from(interval,interval) | 7.889 | 2.420 | -69.3% | 2.3 is_distinct_from(interval,time) | 8.556 | 1.927 | -77.5% | 3.4 is_distinct_from(time,interval) | 8.243 | 2.084 | -74.7% | 3.0 is_distinct_from(time,time) | 6.954 | 0.600 | -91.4% | 10.6 is_distinct_from(timestamp,date) | 7.219 | 1.087 | -84.9% | 5.6 is_distinct_from(timestamp,timestamp) | 7.202 | 1.205 | -83.3% | 5.0 is_distinct_from(timestampz,timestampz) | 6.762 | 0.617 | -90.9% | 10.0 is_distinct_from(varchar,varchar) | 11.326 | 10.983 | -3.0% | 0.0 is_false(boolean) | 6.597 | 0.161 | -97.6% | 39.9 is_not_distinct_from(boolean,boolean) | 9.049 | 0.306 | -96.6% | 28.6 is_not_distinct_from(date,date) | 7.295 | 0.545 | -92.5% | 12.4 is_not_distinct_from(date,timestamp) | 7.507 | 1.080 | -85.6% | 6.0 is_not_distinct_from(decimal,decimal) | 12.880 | 8.241 | -36.0% | 0.6 is_not_distinct_from(decimal,float32) | 13.258 | 6.351 | -52.1% | 1.1 is_not_distinct_from(decimal,float64) | 13.361 | 6.318 | -52.7% | 1.1 is_not_distinct_from(decimal,int16) | 11.600 | 4.708 | -59.4% | 1.5 is_not_distinct_from(decimal,int32) | 11.458 | 4.586 | -60.0% | 1.5 is_not_distinct_from(decimal,int64) | 11.601 | 4.656 | -59.9% | 1.5 is_not_distinct_from(float32,decimal) | 13.335 | 6.294 | -52.8% | 1.1 is_not_distinct_from(float32,float32) | 7.374 | 0.717 | -90.3% | 9.3 is_not_distinct_from(float32,float64) | 7.293 | 0.905 | -87.6% | 7.1 is_not_distinct_from(float32,int16) | 7.000 | 0.945 | -86.5% | 6.4 is_not_distinct_from(float32,int32) | 6.987 | 0.827 | -88.2% | 7.4 is_not_distinct_from(float32,int64) | 7.081 | 0.785 | -88.9% | 8.0 is_not_distinct_from(float64,decimal) | 13.475 | 6.267 | -53.5% | 1.2 is_not_distinct_from(float64,float32) | 7.395 | 0.946 | -87.2% | 6.8 is_not_distinct_from(float64,float64) | 7.324 | 0.962 | -86.9% | 6.6 is_not_distinct_from(float64,int16) | 7.040 | 1.003 | -85.8% | 6.0 is_not_distinct_from(float64,int32) | 6.928 | 0.881 | -87.3% | 6.9 is_not_distinct_from(float64,int64) | 7.086 | 0.868 | -87.8% | 7.2 is_not_distinct_from(int16,decimal) | 11.357 | 4.619 | -59.3% | 1.5 is_not_distinct_from(int16,float32) | 6.800 | 0.846 | -87.6% | 7.0 is_not_distinct_from(int16,float64) | 6.768 | 0.812 | -88.0% | 7.3 is_not_distinct_from(int16,int16) | 8.060 | 0.515 | -93.6% | 14.6 is_not_distinct_from(int16,int32) | 7.237 | 0.558 | -92.3% | 12.0 is_not_distinct_from(int16,int64) | 7.135 | 0.657 | -90.8% | 9.9 is_not_distinct_from(int32,decimal) | 11.236 | 4.422 | -60.6% | 1.5 is_not_distinct_from(int32,float32) | 6.788 | 0.725 | -89.3% | 8.4 is_not_distinct_from(int32,float64) | 6.731 | 0.683 | -89.9% | 8.9 is_not_distinct_from(int32,int16) | 7.208 | 0.554 | -92.3% | 12.0 is_not_distinct_from(int32,int32) | 7.249 | 0.550 | -92.4% | 12.2 is_not_distinct_from(int32,int64) | 7.185 | 0.653 | -90.9% | 10.0 is_not_distinct_from(int64,decimal) | 11.405 | 4.683 | -58.9% | 1.4 is_not_distinct_from(int64,float32) | 6.803 | 0.690 | -89.9% | 8.9 is_not_distinct_from(int64,float64) | 6.781 | 0.660 | -90.3% | 9.3 is_not_distinct_from(int64,int16) | 7.143 | 0.655 | -90.8% | 9.9 is_not_distinct_from(int64,int32) | 7.235 | 0.635 | -91.2% | 10.4 is_not_distinct_from(int64,int64) | 7.289 | 0.625 | -91.4% | 10.7 is_not_distinct_from(interval,interval) | 8.328 | 2.402 | -71.2% | 2.5 is_not_distinct_from(interval,time) | 7.876 | 1.927 | -75.5% | 3.1 is_not_distinct_from(time,interval) | 7.637 | 2.105 | -72.4% | 2.6 is_not_distinct_from(time,time) | 7.426 | 0.632 | -91.5% | 10.7 is_not_distinct_from(timestamp,date) | 7.630 | 1.090 | -85.7% | 6.0 is_not_distinct_from(timestamp,timestamp) | 8.216 | 1.266 | -84.6% | 5.5 is_not_distinct_from(timestampz,timestampz) | 7.269 | 0.616 | -91.5% | 10.8 is_not_distinct_from(varchar,varchar) | 12.216 | 11.491 | -5.9% | 0.1 is_not_false(boolean) | 6.718 | 0.164 | -97.6% | 40.0 is_not_true(boolean) | 6.574 | 0.127 | -98.1% | 50.7 is_true(boolean) | 6.666 | 0.123 | -98.2% | 53.2 less_than_or_equal(boolean,boolean) | 7.450 | 0.177 | -97.6% | 41.0 less_than_or_equal(date,date) | 7.148 | 0.497 | -93.0% | 13.4 less_than_or_equal(date,timestamp) | 7.991 | 2.113 | -73.6% | 2.8 less_than_or_equal(decimal,decimal) | 12.874 | 8.402 | -34.7% | 0.5 less_than_or_equal(decimal,float32) | 13.990 | 6.012 | -57.0% | 1.3 less_than_or_equal(decimal,float64) | 13.920 | 6.017 | -56.8% | 1.3 less_than_or_equal(decimal,int16) | 11.602 | 4.707 | -59.4% | 1.5 less_than_or_equal(decimal,int32) | 11.361 | 4.481 | -60.6% | 1.5 less_than_or_equal(decimal,int64) | 11.618 | 4.653 | -59.9% | 1.5 less_than_or_equal(float32,decimal) | 13.420 | 6.116 | -54.4% | 1.2 less_than_or_equal(float32,float32) | 7.716 | 0.806 | -89.6% | 8.6 less_than_or_equal(float32,float64) | 7.734 | 1.062 | -86.3% | 6.3 less_than_or_equal(float32,int16) | 7.127 | 1.099 | -84.6% | 5.5 less_than_or_equal(float32,int32) | 7.178 | 0.974 | -86.4% | 6.4 less_than_or_equal(float32,int64) | 7.202 | 0.937 | -87.0% | 6.7 less_than_or_equal(float64,decimal) | 13.405 | 6.104 | -54.5% | 1.2 less_than_or_equal(float64,float32) | 7.756 | 1.101 | -85.8% | 6.0 less_than_or_equal(float64,float64) | 7.655 | 1.146 | -85.0% | 5.7 less_than_or_equal(float64,int16) | 7.175 | 1.170 | -83.7% | 5.1 less_than_or_equal(float64,int32) | 7.156 | 1.059 | -85.2% | 5.8 less_than_or_equal(float64,int64) | 7.170 | 1.017 | -85.8% | 6.1 less_than_or_equal(int16,decimal) | 12.477 | 4.948 | -60.3% | 1.5 less_than_or_equal(int16,float32) | 7.668 | 1.053 | -86.3% | 6.3 less_than_or_equal(int16,float64) | 7.630 | 0.956 | -87.5% | 7.0 less_than_or_equal(int16,int16) | 7.952 | 0.464 | -94.2% | 16.1 less_than_or_equal(int16,int32) | 7.172 | 0.486 | -93.2% | 13.8 less_than_or_equal(int16,int64) | 7.159 | 0.564 | -92.1% | 11.7 less_than_or_equal(int32,decimal) | 12.198 | 4.726 | -61.3% | 1.6 less_than_or_equal(int32,float32) | 7.588 | 0.854 | -88.7% | 7.9 less_than_or_equal(int32,float64) | 7.596 | 0.812 | -89.3% | 8.4 less_than_or_equal(int32,int16) | 7.163 | 0.495 | -93.1% | 13.5 less_than_or_equal(int32,int32) | 7.173 | 0.489 | -93.2% | 13.7 less_than_or_equal(int32,int64) | 7.208 | 0.555 | -92.3% | 12.0 less_than_or_equal(int64,decimal) | 12.139 | 4.885 | -59.8% | 1.5 less_than_or_equal(int64,float32) | 7.583 | 0.809 | -89.3% | 8.4 less_than_or_equal(int64,float64) | 7.609 | 0.778 | -89.8% | 8.8 less_than_or_equal(int64,int16) | 7.210 | 0.569 | -92.1% | 11.7 less_than_or_equal(int64,int32) | 7.174 | 0.549 | -92.3% | 12.1 less_than_or_equal(int64,int64) | 7.111 | 0.546 | -92.3% | 12.0 less_than_or_equal(interval,interval) | 8.402 | 2.372 | -71.8% | 2.5 less_than_or_equal(interval,time) | 8.266 | 2.701 | -67.3% | 2.1 less_than_or_equal(time,interval) | 8.573 | 2.868 | -66.5% | 2.0 less_than_or_equal(time,time) | 7.855 | 0.796 | -89.9% | 8.9 less_than_or_equal(timestamp,date) | 7.834 | 1.416 | -81.9% | 4.5 less_than_or_equal(timestamp,timestamp) | 8.426 | 2.240 | -73.4% | 2.8 less_than_or_equal(timestampz,timestampz) | 7.180 | 0.553 | -92.3% | 12.0 less_than_or_equal(varchar,varchar) | 12.270 | 12.479 | 1.7% | -0.0 less_than(boolean,boolean) | 7.098 | 0.180 | -97.5% | 38.4 less_than(date,date) | 6.805 | 0.492 | -92.8% | 12.8 less_than(date,timestamp) | 7.420 | 2.114 | -71.5% | 2.5 less_than(decimal,decimal) | 12.030 | 8.266 | -31.3% | 0.5 less_than(decimal,float32) | 12.915 | 5.947 | -54.0% | 1.2 less_than(decimal,float64) | 13.107 | 5.917 | -54.9% | 1.2 less_than(decimal,int16) | 11.521 | 4.635 | -59.8% | 1.5 less_than(decimal,int32) | 11.229 | 4.622 | -58.8% | 1.4 less_than(decimal,int64) | 11.420 | 4.744 | -58.5% | 1.4 less_than(float32,decimal) | 12.660 | 6.060 | -52.1% | 1.1 less_than(float32,float32) | 7.281 | 0.716 | -90.2% | 9.2 less_than(float32,float64) | 7.183 | 0.892 | -87.6% | 7.1 less_than(float32,int16) | 7.036 | 1.183 | -83.2% | 4.9 less_than(float32,int32) | 7.286 | 1.036 | -85.8% | 6.0 less_than(float32,int64) | 7.204 | 1.002 | -86.1% | 6.2 less_than(float64,decimal) | 12.515 | 5.981 | -52.2% | 1.1 less_than(float64,float32) | 7.029 | 1.009 | -85.6% | 6.0 less_than(float64,float64) | 7.042 | 0.995 | -85.9% | 6.1 less_than(float64,int16) | 6.965 | 1.280 | -81.6% | 4.4 less_than(float64,int32) | 7.003 | 1.137 | -83.8% | 5.2 less_than(float64,int64) | 7.010 | 1.091 | -84.4% | 5.4 less_than(int16,decimal) | 12.217 | 4.980 | -59.2% | 1.5 less_than(int16,float32) | 7.573 | 0.850 | -88.8% | 7.9 less_than(int16,float64) | 7.505 | 0.914 | -87.8% | 7.2 less_than(int16,int16) | 7.270 | 0.468 | -93.6% | 14.5 less_than(int16,int32) | 6.732 | 0.494 | -92.7% | 12.6 less_than(int16,int64) | 6.798 | 0.561 | -91.7% | 11.1 less_than(int32,decimal) | 11.969 | 4.819 | -59.7% | 1.5 less_than(int32,float32) | 7.480 | 0.730 | -90.2% | 9.2 less_than(int32,float64) | 7.452 | 0.791 | -89.4% | 8.4 less_than(int32,int16) | 6.776 | 0.488 | -92.8% | 12.9 less_than(int32,int32) | 6.739 | 0.492 | -92.7% | 12.7 less_than(int32,int64) | 6.785 | 0.567 | -91.6% | 11.0 less_than(int64,decimal) | 11.980 | 5.072 | -57.7% | 1.4 less_than(int64,float32) | 7.400 | 0.705 | -90.5% | 9.5 less_than(int64,float64) | 7.475 | 0.759 | -89.8% | 8.8 less_than(int64,int16) | 6.747 | 0.580 | -91.4% | 10.6 less_than(int64,int32) | 6.773 | 0.558 | -91.8% | 11.1 less_than(int64,int64) | 6.725 | 0.544 | -91.9% | 11.4 less_than(interval,interval) | 7.719 | 2.367 | -69.3% | 2.3 less_than(interval,time) | 8.346 | 2.622 | -68.6% | 2.2 less_than(time,interval) | 8.451 | 2.687 | -68.2% | 2.1 less_than(time,time) | 7.207 | 0.636 | -91.2% | 10.3 less_than(timestamp,date) | 6.849 | 1.414 | -79.4% | 3.8 less_than(timestamp,timestamp) | 7.881 | 2.070 | -73.7% | 2.8 less_than(timestampz,timestampz) | 6.719 | 0.550 | -91.8% | 11.2 less_than(varchar,varchar) | 11.779 | 11.933 | 1.3% | -0.0 not_equal(boolean,boolean) | 7.161 | 0.132 | -98.2% | 53.2 not_equal(date,date) | 6.731 | 0.511 | -92.4% | 12.2 not_equal(date,timestamp) | 7.162 | 1.032 | -85.6% | 5.9 not_equal(decimal,decimal) | 12.252 | 8.220 | -32.9% | 0.5 not_equal(decimal,float32) | 12.396 | 5.577 | -55.0% | 1.2 not_equal(decimal,float64) | 12.360 | 5.417 | -56.2% | 1.3 not_equal(decimal,int16) | 12.200 | 4.613 | -62.2% | 1.6 not_equal(decimal,int32) | 12.079 | 4.486 | -62.9% | 1.7 not_equal(decimal,int64) | 12.280 | 4.547 | -63.0% | 1.7 not_equal(float32,decimal) | 12.211 | 5.471 | -55.2% | 1.2 not_equal(float32,float32) | 6.836 | 0.666 | -90.3% | 9.3 not_equal(float32,float64) | 6.852 | 0.844 | -87.7% | 7.1 not_equal(float32,int16) | 7.538 | 0.890 | -88.2% | 7.5 not_equal(float32,int32) | 7.495 | 0.766 | -89.8% | 8.8 not_equal(float32,int64) | 7.502 | 0.730 | -90.3% | 9.3 not_equal(float64,decimal) | 12.281 | 5.416 | -55.9% | 1.3 not_equal(float64,float32) | 6.858 | 0.858 | -87.5% | 7.0 not_equal(float64,float64) | 6.806 | 0.903 | -86.7% | 6.5 not_equal(float64,int16) | 7.421 | 0.938 | -87.4% | 6.9 not_equal(float64,int32) | 7.378 | 0.821 | -88.9% | 8.0 not_equal(float64,int64) | 7.384 | 0.785 | -89.4% | 8.4 not_equal(int16,decimal) | 12.429 | 4.587 | -63.1% | 1.7 not_equal(int16,float32) | 7.415 | 0.792 | -89.3% | 8.4 not_equal(int16,float64) | 7.339 | 0.753 | -89.7% | 8.7 not_equal(int16,int16) | 7.355 | 0.473 | -93.6% | 14.6 not_equal(int16,int32) | 6.783 | 0.508 | -92.5% | 12.4 not_equal(int16,int64) | 6.795 | 0.605 | -91.1% | 10.2 not_equal(int32,decimal) | 11.881 | 4.394 | -63.0% | 1.7 not_equal(int32,float32) | 7.334 | 0.668 | -90.9% | 10.0 not_equal(int32,float64) | 7.339 | 0.620 | -91.5% | 10.8 not_equal(int32,int16) | 6.722 | 0.510 | -92.4% | 12.2 not_equal(int32,int32) | 6.797 | 0.506 | -92.5% | 12.4 not_equal(int32,int64) | 6.793 | 0.587 | -91.4% | 10.6 not_equal(int64,decimal) | 12.196 | 4.537 | -62.8% | 1.7 not_equal(int64,float32) | 7.398 | 0.621 | -91.6% | 10.9 not_equal(int64,float64) | 7.314 | 0.606 | -91.7% | 11.1 not_equal(int64,int16) | 6.778 | 0.598 | -91.2% | 10.3 not_equal(int64,int32) | 6.762 | 0.594 | -91.2% | 10.4 not_equal(int64,int64) | 6.821 | 0.569 | -91.7% | 11.0 not_equal(interval,interval) | 7.768 | 2.430 | -68.7% | 2.2 not_equal(interval,time) | 8.638 | 1.908 | -77.9% | 3.5 not_equal(time,interval) | 8.337 | 2.062 | -75.3% | 3.0 not_equal(time,time) | 7.151 | 0.574 | -92.0% | 11.5 not_equal(timestamp,date) | 7.156 | 1.053 | -85.3% | 5.8 not_equal(timestamp,timestamp) | 7.566 | 1.186 | -84.3% | 5.4 not_equal(timestampz,timestampz) | 6.760 | 0.574 | -91.5% | 10.8 not_equal(varchar,varchar) | 12.022 | 12.334 | 2.6% | -0.0 or(boolean,boolean) | 11.085 | 0.265 | -97.6% | 40.8 round_digit(decimal,int32) | 9.480 | 2.505 | -73.6% | 2.8 round(decimal) | 7.697 | 3.512 | -54.4% | 1.2 round(float64) | 4.430 | 0.201 | -95.5% | 21.1 </details> To apply this optimization, we introduced several expression templates in the new `template_fast` module. They are specific to `PrimitiveArray` or `BoolArray`. Operations will be applied to array elements one by one, regardless of the null bitmap and without any branching. Thus the compiler can automatically vectorize them using SIMD instructions. But given the no-branch requirement, we can not apply this technique on fallible operations such as arithmetics and most of the type casts. Although some of them can be addressed with pre- or post-checks (e.g. pre-check 0 for divide-by-zero error, post-check overflow for addition), they are highly operation-specific and hard to generalize. We'll explore the way to vectorize these fallible operations in the future. Approved-By: soundOfDestiny Approved-By: BugenZhao * refactor(batch): refine visibility and dml executors (#7040) - Be aware of visibility. - Split batch chunks with chunk builder before inserting them into the streaming jobs. Also refactor the implementation of `append_chunk` (`trunc_data_chunk`). Approved-By: liurenjie1024 * perf(expr): optimize casting to varchar (#7066) This PR optimizes the performance of casting values to varchar. It introduced write API for `ToText`, so that strings can be directly written to array buffers without generating String. The display function of interval and timestampz was also optimized. <img width="581" alt="perf-cast" src="https://user-images.githubusercontent.com/15158738/209610088-859f0f77-5272-4cb8-bbe3-f743bc0cbe97.png"> <details> <summary>Click to show full results</summary> bench | Before time(us) | After time(us) | Change(%) | Speedup -- | -- | -- | -- | -- cast(timestampz->varchar) | 508.640 | 121.600 | -76.1% | 3.2 cast(timestamp->varchar) | 166.200 | 58.245 | -65.0% | 1.9 cast(float64->varchar) | 78.386 | 57.597 | -26.5% | 0.4 cast(float32->varchar) | 57.903 | 37.384 | -35.4% | 0.5 cast(date->varchar) | 86.896 | 32.669 | -62.4% | 1.7 cast(time->varchar) | 47.508 | 28.428 | -40.2% | 0.7 cast(decimal->varchar) | 67.682 | 28.317 | -58.2% | 1.4 cast(int16->varchar) | 29.532 | 12.337 | -58.2% | 1.4 cast(int64->varchar) | 52.043 | 12.319 | -76.3% | 3.2 cast(int32->varchar) | 28.863 | 12.258 | -57.5% | 1.4 cast(boolean->varchar) | 26.826 | 6.396 | -76.2% | 3.2 bool_out(boolean) | 25.480 | 5.126 | -79.9% | 4.0 </details> The `writer` argument of string functions was also changed from `StringWriter<'_>` to `&mut dyn Write`, making them decouple from array. I tried to use `&mut impl Write` but was blocked by annoying lifetime issues. Anyways, the performance of these operations is still slightly improved: <img width="600" alt="perf-string-ops" src="https://user-images.githubusercontent.com/15158738/209610928-8036e4d1-e994-4178-8ce4-ff1340877e47.png"> <details> <summary>Click to show full results</summary> bench | Before time(us) | After time(us) | Change(%) | Speedup -- | -- | -- | -- | -- rtrim(varchar,varchar) | 21.780 | 15.768 | -27.6% | 0.4 substr(varchar,int32,int32) | 11.126 | 8.090 | -27.3% | 0.4 rtrim(varchar) | 10.537 | 7.712 | -26.8% | 0.4 substr(varchar,int32) | 9.198 | 7.111 | -22.7% | 0.3 ltrim(varchar) | 9.661 | 8.010 | -17.1% | 0.2 trim(varchar) | 11.308 | 9.618 | -14.9% | 0.2 overlay(varchar,varchar,int32,int32) | 17.107 | 14.697 | -14.1% | 0.2 overlay(varchar,varchar,int32) | 13.408 | 12.007 | -10.4% | 0.1 ltrim(varchar,varchar) | 21.198 | 19.021 | -10.3% | 0.1 trim(varchar,varchar) | 20.876 | 19.205 | -8.0% | 0.1 split_part(varchar,varchar,int32) | 30.708 | 29.293 | -4.6% | 0.0 md5(varchar) | 346.010 | 331.670 | -4.1% | 0.0 </details> Approved-By: BowenXiao1999 Approved-By: BugenZhao * feat(optimizer): support share operator (#6956) - `LogicalShare` operator is used to represent reusing of existing operators. It could have multiple parents which makes it different from other operators. - Because most of our optimizations assume that our plan is a tree structure, in order to represent the DAG structured plan, we need to modify our optimizations and prevent them break our DAG plan back to a tree plan accidentally. - Optimization including predicate pushdown, column pruning, heuristic optimizer, stream rewrite and to stream, all of them can break DAG plan back to a tree plan if we don't take care. - Let me take predicate pushdown as an example to illustrate how to implement predicate pushdown for `LogicalShare`. We use a context for `LogicalShare` to keep track of how many times predicate has been pushdown for `LogicalShare`. Once pushdown times equal the parent number of the `LogicalShare`, we can merge all the previous predicates into one and then push it down for the input of `LogicalShare`. - Heuristic optimizer's previous rules won't match any `LogicalShare`, so `LogicalShare` wouldn't affect its correctness. - At the end of optimizer for batch query, we try to convert DAG back to Tree for now by removing `LogicalShare` (the rule named `DagToTreeRule`), because our batch executor doesn't support execute DAG plan directly currently. - This PR also supports reusing source by `ShareSourceRewriter`. `ShareSourceRewriter` will replace all the sources occurred more than once in the streaming query with share operator. Approved-By: st1page Co-Authored-By: Dylan Chen <zilin@singularity-data.com> Co-Authored-By: Dylan <chenzl25@mail2.sysu.edu.cn> * perf(expr): vectorize infallible casts (#7079) Similar to #7055, this PR vectorizes infallible casts. <img width="516" alt="perf-infallible-cast" src="https://user-images.githubusercontent.com/15158738/209652449-61dc1513-7255-436c-aa36-e5a0d1dec384.png"> <details> <summary>Click to show full results</summary> bench | Before time(us) | After time(us) | Change(%) | Speedup -- | -- | -- | -- | -- cast(int16->float32) | 4.434 | 0.146 | -96.7% | 29.3 cast(int16->int32) | 4.408 | 0.154 | -96.5% | 27.7 cast(float32->float64) | 4.432 | 0.187 | -95.8% | 22.7 cast(int32->int64) | 4.415 | 0.192 | -95.7% | 22.0 cast(int32->float64) | 4.422 | 0.194 | -95.6% | 21.8 cast(int16->int64) | 4.412 | 0.212 | -95.2% | 19.8 cast(timestamp->date) | 4.409 | 0.226 | -94.9% | 18.5 cast(timestamp->time) | 5.443 | 0.300 | -94.5% | 17.1 cast(date->timestamp) | 5.504 | 0.304 | -94.5% | 17.1 cast(int16->float64) | 4.430 | 0.298 | -93.3% | 13.9 cast(int32->decimal) | 5.582 | 0.592 | -89.4% | 8.4 cast(time->interval) | 5.511 | 0.727 | -86.8% | 6.6 cast(int64->decimal) | 5.739 | 0.766 | -86.7% | 6.5 cast(int16->decimal) | 5.760 | 0.845 | -85.3% | 5.8 cast(interval->time) | 5.903 | 1.289 | -78.2% | 3.6 cast(float32->decimal) | 21.970 | 18.170 | -17.3% | 0.2 cast(float64->decimal) | 40.131 | 36.049 | -10.2% | 0.1 </details> Approved-By: BowenXiao1999 * fix: clean states in local barrier manager after actor dropped (#7082) Trying to fix continuous recovery found in longevity and chaos test. I found that two problems might be the root cause of continuous recovery: 1. Fixed, unnecessary recovery triggered as described in #6989 . As I tested locally, when workload was very high, there were many ongoing barrier collect responses(up to 80+) when recovery. After recovery finished, each response would trigger a recovery process, because the whole cluster has already reset to previous committed epoch. 2. Before this PR, when force stopping actors in CN, the local manger will clean all states and then abort all actors. The problem is between cleaning states and aborting actors, the actors could also report epoch collected or error status to local barrier manager especially when the number of actors is high. This will cause a chain reaction in recovery. I tested it locally and the recovery became normal. Besides, it could also be the cause of #6639 , #6715 . Approved-By: fuyufjh Approved-By: BugenZhao * fix: return error if source executor failed to receive the first barrier (#7086) **This section will be used as the commit message. Please do not leave this empty!** Please explain **IN DETAIL** what the changes are in this PR and why they are needed: this is a temp fix for #6931, and I believe that this can happen in rare cases. From the log described in the issue, there is a failover that occurred before the panic. I think some part of meta failed to recover and the barrier channel closed for some reason. It shows a possibility that meta node could fail to recover and compute node should be robots enough rather than panicking. Approved-By: waruto210 Approved-By: xx01cyx * fix(optimizer): fix hop window column pruning (#7085) - Fix hop window column pruning. Approved-By: st1page * fix(streaming): fix memory leaks in streaming hash join (#7089) Fix #6942. See the detailed discussions there. The bug is inside the BTreeMap. For now, I will just remove that part of code because we don't rely on Allocator API to get memory usage now. ![image](https://user-images.githubusercontent.com/10192522/209688807-84ae0f84-9e17-44ae-8498-a378ee6b951e.png) Approved-By: yuhao-su Approved-By: BugenZhao * fix: remove redundant `append_only: true` in explain (#7119) fix: remove redundant `append_only: true` in explain result, since it has been expressed by the name "StreamAppendOnlyHashJoin". Approved-By: chenzl25 * feat: Failover follower to leader (#6937) https://github.com/risingwavelabs/risingwave/issues/6936 Approved-By: yezizp2012 * feat(meta): validate CDC connector properties during create source (#6938) Validate connector properties on Meta (in `DebeziumSplitEnumerator`) when creating CDC source. As the conclusion of https://github.com/risingwavelabs/rfcs/pull/29, we will deploy a sidecar connector node colocated with Meta on the cloud to validate the connector properties. Examples: 1. Wrong password ``` create materialized source products ( id INT, name STRING, description STRING, PRIMARY KEY (id) ) with ( connector = 'mysql-cdc', hostname = '127.0.0.1', port = '3306', username = 'root', password = '12346', database.name = 'mydb', table.name = 'prodts', server.id = '5085', debezium.a.b = 'test' ) row format debezium_json; ERROR: QueryError: internal error: gRPC error (Client specified an invalid argument): Access denied for user 'root'@'localhost' (using password: YES) ``` 2. Wrong table name ``` dev=> create materialized source products ( id INT, name STRING, description STRING, PRIMARY KEY (id) ) with ( connector = 'mysql-cdc', hostname = '127.0.0.1', port = '3306', username = 'root', password = '123456', database.name = 'mydb', table.name = 'prodts', server.id = '5085', debezium.a.b = 'test' ) row format debezium_json; ERROR: QueryError: internal error: gRPC error (Client specified an invalid argument): table doesn't exist ``` Approved-By: tabVersion * refactor: decouple memory management from stream, make it accessible for both batch and streaming (#7004) Main idea: 1. Rename `LruManager` to `GlobalMemoryManager`, move it from `stream` crate to `compute` crate. Can not move to `common` as it depends on `risingwave_stream` and `risingwave_batch`. This comes from what we have discussed in the memory management rfc: https://github.com/risingwavelabs/rfcs/pull/26. 2. Fully decouple `risingwave_stream` and memory manager. Before this pr, streaming executor access to lru manager to create cache. However, this will cause cyclic reference if we move `LruManager` out from `risingwave_stream`. What executor really need is the watermark epoch, so instead of let `risingwave_stream` access to Memory Manager, just store the watermark epoch in the `LocalStreamManager` and when executors are building, they can read this value and then they can create cache with their own. Personally I think this is more clean: memory manager have access to stream/batch two components, and vic versa no. 3. Currently the memory manager ref is not stored anywhere. Thinking of where to store it. 🤔 Approved-By: liurenjie1024 * feat: support plan generating & execution for new DDL & DML design (#6836) This PR applies `SourceExecutorV2`, `DmlExecutor`, `RowIdGenExecutor`, and `DmlManager` to query execution. For example, the query plan for `CREATE TABLE t (v int)` will be: ```SQL StreamMaterialize { columns: [v, _row_id(hidden)], pk_columns: [_row_id] } └─StreamExchange { dist: HashShard(_row_id) } └─StreamRowIdGen { row_id_index: 1 } └─StreamDml { columns: [v, _row_id] } └─StreamSource ``` Some explanations: - `StreamSource` here contains no actual external streaming source. It is only responsible for receiving barriers. - `StreamDml` will receive data from `InsertExecutor`, `DeleteExecutor`, and `UpdateExecutor`. - `StreamRowIdGen` will generate row id for the data. In this case, the primary key is not defined by the user, so we internally add a `_row_id` column as the primary key. If the table has a user-defined primary key, then this executor can be eliminated. Note that now **"source" stands for streaming source only**. There is **NO table source** now. Though `CREATE TABLE` will create a `StreamSource`, it actually contains nothing related to a source (catalog). Approved-By: st1page Approved-By: BugenZhao Approved-By: yezizp2012 Co-Authored-By: xx01cyx <caoyuanxin0531@outlook.com> Co-Authored-By: st1page <1245835950@qq.com> * feat(frontend): avoid pk duplication (#7095) avoid pk duplication for streaming executors, and still allow join key duplication. Approved-By: st1page * feat(optimizer): improve column pruning for share operator and perform share source at the beginning (#7111) - Improve column pruning for share operator. We need 2 round column pruning for DAG plan. - Perform share source at the beginning so that we can benefit from predicate pushdown and column pruning. Approved-By: st1page Approved-By: fuyufjh * fix(streaming): handle scaling for row id gen executor (#7122) Correctly handle the scaling for RowIdGen executor. The logic used to work fine, but was lost in the refactoring in #6529. Approved-By: yezizp2012 Approved-By: xx01cyx * fix(optimizer): fix logical join o2i_col_mapping (#7108) - Fix logical join `o2i_col_mapping` by using `output_indices` directly instead of inverse `i2o_col_mapping`. Approved-By: fuyufjh Co-Authored-By: Dylan Chen <zilin@singularity-data.com> Co-Authored-By: xxchan <xxchan22f@gmail.com> * fix(frontend): hash join do not deduplicate input pk (#7123) Previously we want to deduplicate pk for streaming executors, however: - agg will do prefix scan by group key, so we can not deduplicate group key - hash join will do prefix scan by join key, so we can not deduplicate join key - hash join need to be aware of input pk, and there might be an inconsistency between the pk of hash join state table and the input pk got in hash join executor so we decided not to handle deduplicated input pk now, we may complete the dedup task case by case, just like agg instead of add a general method in catalog builder. Approved-By: yuhao-su Co-Authored-By: congyi <15605187270@163.com> Co-Authored-By: congyi wang <58715567+wcy-fdu@users.noreply.github.com> * fix: fix NULL regexp capture group (#7129) If a particular capture group didn't participate in the match, we should return `NULL` instead of skipping it. fix https://github.com/risingwavelabs/risingwave/issues/7126 Approved-By: TennyZhuang * chore(test): compress the test data (#7007) Reduce size from 16MB to 400KB. Approved-By: tabVersion * fix: support kafka sink for struct and list type (#7098) **This section will be used as the commit message. Please do not leave this empty!** fix type matches for struct and list in kafka sink & add struct and list test cases in ut & add script command for compress test cases into zip file Approved-By: lmatz Co-Authored-By: tabVersion <tabvision@bupt.icu> Co-Authored-By: lmatz <lmatz823@gmail.com> Signed-off-by: Runji Wang <wangrunji0408@163.com> Co-authored-by: Eric Fu <eric@singularity-data.com> Co-authored-by: Dylan <chenzl25@mail2.sysu.edu.cn> Co-authored-by: Dylan Chen <zilin@singularity-data.com> Co-authored-by: xxchan <xxchan22f@gmail.com> Co-authored-by: Bugen Zhao <i@bugenzhao.com> Co-authored-by: zwang28 <70626450+zwang28@users.noreply.github.com> Co-authored-by: zwang28 <84491488@qq.com> Co-authored-by: Runji Wang <wangrunji0408@163.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: August <pin@singularity-data.com> Co-authored-by: Noel Kwan <47273164+kwannoel@users.noreply.github.com> Co-authored-by: Yuhao Su <31772373+yuhao-su@users.noreply.github.com> Co-authored-by: TennyZhuang <zty0826@gmail.com> Co-authored-by: Bohan Zhang <tabvision@bupt.icu> Co-authored-by: CAJan93 <jan.mensch@gmx.net> Co-authored-by: StrikeW <wangsiyuanse@gmail.com> Co-authored-by: Bowen <36908971+BowenXiao1999@users.noreply.github.com> Co-authored-by: Yuanxin Cao <60498509+xx01cyx@users.noreply.github.com> Co-authored-by: xx01cyx <caoyuanxin0531@outlook.com> Co-authored-by: st1page <1245835950@qq.com> Co-authored-by: congyi wang <58715567+wcy-fdu@users.noreply.github.com> Co-authored-by: congyi <15605187270@163.com>
Initial microbenchmarks in #6856 show that the performance of expression evaluation is far from optimal, especially for primitive types and simple operations. We open this issue to track the progress of further optimizations.
Benchmark Results
Operates on chunks of size 1024.
Results are collected on M1 Pro with the following commands:
The distribution curve of all operation times before and after optimization:
Click to show all results
PRs
u8
tousize
#7030to_char
#7048The text was updated successfully, but these errors were encountered: