Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ORC reader issue with reading empty string columns #7656

Merged

Conversation

rgsl888prabhu
Copy link
Contributor

There was a condition in reader where if the data size is zero, then stream pointer was not getting updated.
But in case of ["", ""] where it is a valid data with 0 size, it was reading it as [null, null], so the condition has been removed which caused this issue.

I have also added test cases to validate.

closes #7620

@rgsl888prabhu rgsl888prabhu added bug Something isn't working 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer non-breaking Non-breaking change labels Mar 19, 2021
@rgsl888prabhu rgsl888prabhu self-assigned this Mar 19, 2021
@rgsl888prabhu rgsl888prabhu requested review from a team as code owners March 19, 2021 18:08
@github-actions github-actions bot added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Mar 19, 2021
@rgsl888prabhu rgsl888prabhu requested review from vuule and devavret and removed request for karthikeyann and davidwendt March 19, 2021 18:08
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not show how hard it is to find a fix for an issue like this... great work!

@devavret
Copy link
Contributor

This PR does not show how hard it is to find a fix for an issue like this... great work!

Anyone who's worked on cuIO knows. The issue that took me a month had a 2 line fix.

@vuule
Copy link
Contributor

vuule commented Mar 19, 2021

Anyone who's worked on cuIO knows. The issue that took me a month had a 2 line fix.

Ouch. Do you remember the PR # ?

@devavret
Copy link
Contributor

Anyone who's worked on cuIO knows. The issue that took me a month had a 2 line fix.

Ouch. Do you remember the PR # ?

I misremember. It was #5473 but it was 16 days. It felt like an eternity though.

@kkraus14 kkraus14 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Needs cuIO Reviewer labels Mar 19, 2021
@kkraus14
Copy link
Collaborator

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 217d702 into rapidsai:branch-0.19 Mar 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] ORC writer incorrectly encodes a column of empty or null strings
4 participants