-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-40994: [C++][Parquet] RleBooleanDecoder supports DecodeArrow with nulls #40995
Conversation
|
34d3c72
to
8a33bea
Compare
benchmark number wouldn't be great, because RleDecoder need to decode to |
Well, it seems the benchmarks fail:
|
Testing on My PC ( Amd 3800X, WSL2, gcc11.4, Release(-O2) ) RLE
PLAIN:
For all-nulls seems that Rle is faster because it hacks the logic without any checking |
5e31baa
to
cb1a842
Compare
@@ -963,8 +963,6 @@ TYPED_TEST(EncodingAdHocTyped, ByteStreamSplitArrowDirectPut) { | |||
} | |||
|
|||
TYPED_TEST(EncodingAdHocTyped, RleArrowDirectPut) { | |||
// TODO: test with nulls once RleBooleanDecoder::DecodeArrow supports them | |||
this->null_probability_ = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have coverage yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm what does this mean? The case is covered here
@github-actions crossbow submit -g cpp |
Revision: cb1a842 Submitted crossbow builds: ursacomputing/crossbow @ actions-35133d7bec |
…row into rle-boolean-impl-decode-arrow
cpp/src/parquet/encoding.cc
Outdated
current_index_in_batch = 0; | ||
}; | ||
if (null_count == 0) { | ||
int sum_decode_count = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why introduce a separate variable? We know that we're going to return the original value of num_values
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it, it's copy-pasted from origin impl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please simplify the code a bit. There's a bunch of local variables which are mutually redundant.
This is a nice improvement, thank you @mapleFU ! |
…oding (#41037) ### Rationale for this change Previous patches ( #40876 , #40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: #41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…ow decoding (apache#41037) ### Rationale for this change Previous patches ( apache#40876 , apache#40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: apache#41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…ow decoding (apache#41037) ### Rationale for this change Previous patches ( apache#40876 , apache#40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: apache#41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
… with nulls (apache#40995) ### Rationale for this change Supports DecodeArrow with nulls in RleBooleanDecoder ### What changes are included in this PR? Supports DecodeArrow with nulls in RleBooleanDecoder ### Are these changes tested? Yes ### Are there any user-facing changes? currently not * GitHub Issue: apache#40994 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…ow decoding (apache#41037) ### Rationale for this change Previous patches ( apache#40876 , apache#40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: apache#41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
… with nulls (apache#40995) ### Rationale for this change Supports DecodeArrow with nulls in RleBooleanDecoder ### What changes are included in this PR? Supports DecodeArrow with nulls in RleBooleanDecoder ### Are these changes tested? Yes ### Are there any user-facing changes? currently not * GitHub Issue: apache#40994 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…ow decoding (apache#41037) ### Rationale for this change Previous patches ( apache#40876 , apache#40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: apache#41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
… with nulls (apache#40995) ### Rationale for this change Supports DecodeArrow with nulls in RleBooleanDecoder ### What changes are included in this PR? Supports DecodeArrow with nulls in RleBooleanDecoder ### Are these changes tested? Yes ### Are there any user-facing changes? currently not * GitHub Issue: apache#40994 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…ow decoding (apache#41037) ### Rationale for this change Previous patches ( apache#40876 , apache#40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: apache#41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
… with nulls (apache#40995) ### Rationale for this change Supports DecodeArrow with nulls in RleBooleanDecoder ### What changes are included in this PR? Supports DecodeArrow with nulls in RleBooleanDecoder ### Are these changes tested? Yes ### Are there any user-facing changes? currently not * GitHub Issue: apache#40994 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…ow decoding (apache#41037) ### Rationale for this change Previous patches ( apache#40876 , apache#40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: apache#41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
… with nulls (apache#40995) ### Rationale for this change Supports DecodeArrow with nulls in RleBooleanDecoder ### What changes are included in this PR? Supports DecodeArrow with nulls in RleBooleanDecoder ### Are these changes tested? Yes ### Are there any user-facing changes? currently not * GitHub Issue: apache#40994 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…ow decoding (apache#41037) ### Rationale for this change Previous patches ( apache#40876 , apache#40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: apache#41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit b99b00d. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
Supports DecodeArrow with nulls in RleBooleanDecoder
What changes are included in this PR?
Supports DecodeArrow with nulls in RleBooleanDecoder
Are these changes tested?
Yes
Are there any user-facing changes?
currently not