Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bool column corruption with ORC Reader #7483

Merged
merged 4 commits into from
Mar 22, 2021

Conversation

rgsl888prabhu
Copy link
Contributor

@rgsl888prabhu rgsl888prabhu commented Mar 2, 2021

Since there was a possibility that only partial set of bits would be read in case the values are buffered, the max_vals had to be rounded-off to higher next byte.

closes #7345

@rgsl888prabhu rgsl888prabhu added bug Something isn't working 2 - In Progress Currently a work in progress 4 - Needs cuIO Reviewer non-breaking Non-breaking change labels Mar 2, 2021
@rgsl888prabhu rgsl888prabhu self-assigned this Mar 2, 2021
@rgsl888prabhu rgsl888prabhu requested a review from a team as a code owner March 2, 2021 12:27
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Mar 2, 2021
@codecov
Copy link

codecov bot commented Mar 2, 2021

Codecov Report

Merging #7483 (b3470bf) into branch-0.19 (43b44e1) will increase coverage by 0.45%.
The diff coverage is 88.02%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #7483      +/-   ##
===============================================
+ Coverage        81.80%   82.26%   +0.45%     
===============================================
  Files              101      101              
  Lines            16695    17244     +549     
===============================================
+ Hits             13658    14185     +527     
- Misses            3037     3059      +22     
Impacted Files Coverage Δ
python/cudf/cudf/utils/docutils.py 97.36% <50.00%> (-2.64%) ⬇️
python/cudf/cudf/testing/testing.py 80.00% <57.14%> (-1.04%) ⬇️
python/cudf/cudf/core/column/timedelta.py 88.57% <75.00%> (+0.33%) ⬆️
python/cudf/cudf/core/column/categorical.py 91.74% <77.14%> (-0.49%) ⬇️
python/cudf/cudf/core/column/datetime.py 89.63% <78.57%> (+0.58%) ⬆️
python/dask_cudf/dask_cudf/core.py 71.67% <79.41%> (-2.60%) ⬇️
python/cudf/cudf/core/multiindex.py 82.90% <90.90%> (+0.73%) ⬆️
python/cudf/cudf/core/tools/datetimes.py 84.53% <93.33%> (+0.89%) ⬆️
python/cudf/cudf/core/column_accessor.py 95.47% <95.65%> (+2.53%) ⬆️
python/cudf/cudf/core/column/column.py 87.86% <96.42%> (+0.43%) ⬆️
... and 65 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4f4d87...03f0486. Read the comment docs.

@vuule
Copy link
Contributor

vuule commented Mar 5, 2021

Looks like generating a test for the fix is more trouble than it's worth. Approving without the test.

@rgsl888prabhu rgsl888prabhu added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Mar 6, 2021
@@ -1556,7 +1556,7 @@ __global__ void __launch_bounds__(block_size)
if (t == 0) { s->top.data.buffered_count = n; }
}

numvals = min(numvals * 8, is_last_set ? s->top.data.max_vals : blockDim.x);
numvals = min(numvals * 8, is_last_set ? (s->top.data.max_vals + 7) & (~0x7) : blockDim.x);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how common this is, we should have a ceil to next byte function rather than have a +7)&~7 everywhere.

@vuule
Copy link
Contributor

vuule commented Mar 22, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 5d7767e into rapidsai:branch-0.19 Mar 22, 2021
@vyasr vyasr added 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs cuIO Reviewer labels Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Data Corruption while reading a boolean column using orc reader
4 participants