-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-40872: [C++][Parquet] Encoding: Optimize DecodeArrow/Decode(bitmap) for PlainBooleanDecoder #40876
Conversation
|
4099c1b
to
962dcfa
Compare
962dcfa
to
b03728b
Compare
cpp/src/parquet/encoding.cc
Outdated
} else { | ||
int64_t previous_offset = 0; | ||
int64_t previous_value_offset = 0; | ||
PARQUET_THROW_NOT_OK(::arrow::internal::VisitSetBitRuns( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the data is interleaving "null, non-null, null, non-null, ...". Don't know that would performance be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the best performance might getting a BitRun scanner here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can start with this. Are there benchmarks for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pasted this in the pr, PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmarks in this PR always have null_probability = 0, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Aha previously all null_probability is 0...
Before:
After:
|
@@ -1214,18 +1234,15 @@ inline int PlainBooleanDecoder::DecodeArrow( | |||
|
|||
int PlainBooleanDecoder::Decode(uint8_t* buffer, int max_values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it's ok.
f8eaf1f
to
47d6a1b
Compare
After:
Before:
|
} | ||
}; | ||
|
||
class BM_DecodeArrowBooleanRle : public BenchmarkDecodeArrowBoolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this, you just need some member variable with Encoding::PLAIN
or Encoding::RLE
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done yet, PTAL again
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit a4acb64. There was 1 benchmark result with an error:
There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 7 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…oding (#41037) ### Rationale for this change Previous patches ( #40876 , #40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: #41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…ow decoding (apache#41037) ### Rationale for this change Previous patches ( apache#40876 , apache#40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: apache#41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…ow decoding (apache#41037) ### Rationale for this change Previous patches ( apache#40876 , apache#40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: apache#41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…bitmap) for PlainBooleanDecoder (apache#40876) ### Rationale for this change This is for enhance boolean decoding. I optimized the `DecodeArrow` for PlainBoolean ### What changes are included in this PR? Optimize DecodeArrow/Decode(bitmap) for PlainBooleanDecoder, and add benchmarks ### Are these changes tested? Yes ### Are there any user-facing changes? Minor optimization. And `Decode` boolean will change the syntax * GitHub Issue: apache#40872 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: mwish <maplewish117@gmail.com>
…ow decoding (apache#41037) ### Rationale for this change Previous patches ( apache#40876 , apache#40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: apache#41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…bitmap) for PlainBooleanDecoder (apache#40876) ### Rationale for this change This is for enhance boolean decoding. I optimized the `DecodeArrow` for PlainBoolean ### What changes are included in this PR? Optimize DecodeArrow/Decode(bitmap) for PlainBooleanDecoder, and add benchmarks ### Are these changes tested? Yes ### Are there any user-facing changes? Minor optimization. And `Decode` boolean will change the syntax * GitHub Issue: apache#40872 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: mwish <maplewish117@gmail.com>
…ow decoding (apache#41037) ### Rationale for this change Previous patches ( apache#40876 , apache#40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: apache#41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…bitmap) for PlainBooleanDecoder (apache#40876) ### Rationale for this change This is for enhance boolean decoding. I optimized the `DecodeArrow` for PlainBoolean ### What changes are included in this PR? Optimize DecodeArrow/Decode(bitmap) for PlainBooleanDecoder, and add benchmarks ### Are these changes tested? Yes ### Are there any user-facing changes? Minor optimization. And `Decode` boolean will change the syntax * GitHub Issue: apache#40872 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: mwish <maplewish117@gmail.com>
…ow decoding (apache#41037) ### Rationale for this change Previous patches ( apache#40876 , apache#40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: apache#41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…bitmap) for PlainBooleanDecoder (apache#40876) ### Rationale for this change This is for enhance boolean decoding. I optimized the `DecodeArrow` for PlainBoolean ### What changes are included in this PR? Optimize DecodeArrow/Decode(bitmap) for PlainBooleanDecoder, and add benchmarks ### Are these changes tested? Yes ### Are there any user-facing changes? Minor optimization. And `Decode` boolean will change the syntax * GitHub Issue: apache#40872 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: mwish <maplewish117@gmail.com>
…ow decoding (apache#41037) ### Rationale for this change Previous patches ( apache#40876 , apache#40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: apache#41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…bitmap) for PlainBooleanDecoder (apache#40876) ### Rationale for this change This is for enhance boolean decoding. I optimized the `DecodeArrow` for PlainBoolean ### What changes are included in this PR? Optimize DecodeArrow/Decode(bitmap) for PlainBooleanDecoder, and add benchmarks ### Are these changes tested? Yes ### Are there any user-facing changes? Minor optimization. And `Decode` boolean will change the syntax * GitHub Issue: apache#40872 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: mwish <maplewish117@gmail.com>
…ow decoding (apache#41037) ### Rationale for this change Previous patches ( apache#40876 , apache#40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: apache#41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
This is for enhance boolean decoding. I optimized the
DecodeArrow
for PlainBooleanWhat changes are included in this PR?
Optimize DecodeArrow/Decode(bitmap) for PlainBooleanDecoder, and add benchmarks
Are these changes tested?
Yes
Are there any user-facing changes?
Minor optimization. And
Decode
boolean will change the syntax