Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

[NSE-928] Add ARROW_CHECK for batch_size check #973

Merged
merged 4 commits into from
Jun 28, 2022

Conversation

zhixingheyi-tian
Copy link
Collaborator

@zhixingheyi-tian zhixingheyi-tian commented Jun 15, 2022

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

  • ARROW_CHECK for batch_size per Recordbatch
  • ARROW_CHECK for Recordbatch's number

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

@github-actions
Copy link

#928

@zhixingheyi-tian
Copy link
Collaborator Author

CC @FelixYBW @zhouyuan

Copy link
Collaborator

@FelixYBW FelixYBW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any other place we use int16_t but it may exceed 64K?

@@ -854,7 +854,7 @@ Splitter::row_offset_type Splitter::CalculateSplitBatchSize(

arrow::Status Splitter::DoSplit(const arrow::RecordBatch& rb) {
// buffer is allocated less than 64K
// ARROW_CHECK_LE(rb.num_rows(),64*1024);
ARROW_CHECK_LE(rb.num_rows(), 64 * 1024);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fails in TPCDS power test if we add the check here. We need to fix the max batch issue before we enable the check here

@@ -1399,7 +1399,9 @@ class SortOnekeyKernel : public SortArraysToIndicesKernel::Impl {
if (nulls_total_ == 0) {
// if all batches have no null value,
// we do not need to check whether the value is null
ARROW_CHECK_LE(num_batches_, 64 * 1024);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_batches_ is controled by partition size eventually. looks limitation of 64K maximal recordbatch is too small.

@zhixingheyi-tian
Copy link
Collaborator Author

If use uint32_t for ArrayItemIndex: array_id, caused many problems: crash and result mismatch. It is noted in #989 . So retain uint16_t ,and troubleshoot later.

@zhixingheyi-tian
Copy link
Collaborator Author

cc @zhouyuan
Have passed all Jenkins workloads

@zhixingheyi-tian
Copy link
Collaborator Author

Passed Jenkins workloads again.

@zhouyuan zhouyuan merged commit c4d4a65 into oap-project:main Jun 28, 2022
zhouyuan added a commit to zhouyuan/native-sql-engine that referenced this pull request Jul 27, 2022
zhouyuan added a commit that referenced this pull request Aug 10, 2022
* remove sort limit

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>

* s/uint16/uint32

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>

* Revert "s/uint16/uint32"

This reverts commit 35c0909.

* Revert "remove sort limit"

This reverts commit 6b5738a.

* change sort limit

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>

* Revert "[NSE-928] Add ARROW_CHECK for batch_size check (#973)"

This reverts commit c4d4a65.

* fix window sort

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants