Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

[NSE-857] Fill destination buffer by reducer #880

Merged
merged 3 commits into from
May 1, 2022

Conversation

FelixYBW
Copy link
Collaborator

What changes were proposed in this pull request?

The solution is to create an offset array which list the src offset for each reducer, like below:
reducer0:

reducer# Vector index Vector value
0 0 0
0 1 4
0 2 6
1 3 1
1 4 5
1 5 8

Then we can read the src column randomly and fill the destination column sequentially one reducer by one reducer. The source data size should be smaller enough to hold into L1/L2 cache, and make sure source and destination cache line both are read onece. Otherwise the performance will be very bad. Currently the recordbatch size is 32K rows, so for double column the size is 128K.
On the write, we can use NTStore to bypass RFO. Then we can avoid the cache polution. But when reducer# is very large, ntstore doesn't works well because each reducer will be only fill little data, like 32K batch for 4000 reducer, each reducer will be written 8 values only.

How was this patch tested?

From benchmark data, the solution partially solved the reducer# scaling issue. From below chart, we can see 4096 and 512 reducer has the same performance
image

Remining work

AVX implementation doesn't show better performance than INT solution. NTStore either doesn't show better performance

@github-actions
Copy link

#857

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
@FelixYBW
Copy link
Collaborator Author

FelixYBW commented May 1, 2022

Jenkins TPCH done. 8x partition doesn't show any regression now.
tpch_2022_05_01_application_1647347981137_0322.html

Note it doesn't solve the reducer# scaling issue completely. With even larger reducer#, each reducer only gets a few rows which doesn't fill up 64Byte cache line. So the memory throughput vs. split data size ratio will increase again and performance will drop soon. The another solution is to convert into row based format during split. At recucer side, return to columnar format.

@FelixYBW FelixYBW merged commit f8e51f4 into oap-project:master May 1, 2022
@FelixYBW FelixYBW deleted the shuffle_fillbyreducer branch May 1, 2022 09:17
@FelixYBW
Copy link
Collaborator Author

FelixYBW commented May 1, 2022

To record, Original performance:
tpch_2021_12_01_application_1638156867030_0038.html

zhouyuan added a commit to zhouyuan/native-sql-engine that referenced this pull request May 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants