Propagate errors from Parquet reader kernels back to host #14167

vuule · 2023-09-21T21:47:55Z

Description

Pass the error code to the host when a kernel detects invalid input.
If multiple errors types are detected, they are combined using a bitwise OR so that caller gets the aggregate error code that includes all types of errors that occurred.

Does not change the kernel side checks.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…fea-read_parquet-error-report

vuule · 2023-09-25T16:53:56Z

CC @nvdbaranec @etseidl
Please let me know if you see any issues with this approach to error reporting in the reader.

etseidl

I can't think of a better way to do this. Do we want to define some constants for the error codes?

cpp/src/io/parquet/page_data.cu

vuule · 2023-09-25T19:05:07Z

Do we want to define some constants for the error codes?

Definitely, just not sure if it should be in this PR. Related - should we return the error code as a bitmask? Would returning multiple errors even be useful?

etseidl · 2023-09-25T19:25:16Z

Definitely, just not sure if it should be in this PR. Related - should we return the error code as a bitmask? Would returning multiple errors even be useful?

I think a bitmask might be a bit much, and limits us to 32 errors. There will probably be more ways to fail than that, esp if we also return errors from the preprocessing kernels.

cpp/src/io/parquet/page_decode.cuh

cpp/src/io/parquet/page_data.cu

nvdbaranec · 2023-09-26T18:43:33Z

cpp/src/io/parquet/page_decode.cuh

+    cuda::atomic_ref<int32_t, cuda::thread_scope_block> ref{const_cast<int&>(error)};
+    ref.store(err, cuda::std::memory_order_relaxed);


Is this atomic necessary? I didn't see any places where anything other than thread 0 (of the block) sets the error code. I suppose that may not be the case in the future. Based on how this is called, I wonder if an atomic OR is better here so we can stash multiple error types as individual bits.

I made it atomic since we probably don't need to worry about performance when failing. This seemed like a safe option for future checks as well.

About the error code as mask - Ed is concerned about the limit on the number of errors that this would impose. I could be convinced to go either way, don't expect the trade-off to be relevant in practice.

TBH the most common error condition is going to be a buffer overrun detected somewhere. We could probably get away without codes at all and have a single error bit. The host code calling the kernel can report which kernel failed. It just comes down to how fine grained you want the error reporting to be.

I could see it either way. It's so hard to even know what thread failed and the context of why (possibly because some other thread did something wrong) having a set of bits could act as bread-crumbs to lead you to where things really went wrong. But on the other hand, you're a lot more limited on what you can report. I'm fine either way. Parallel error reporting is amusing in any case.

vuule · 2023-09-26T21:18:34Z

Looks like we're leaning towards a mask to aggregate errors. I'll make the changes.

…fea-read_parquet-error-report

…ule/cudf into fea-read_parquet-error-report

cpp/src/io/parquet/page_decode.cuh

nvdbaranec

Yeah I like this mechanism. The explicit names also remove some of the mystery when reading the code itself too.

Co-authored-by: Yunsong Wang <yunsongw@nvidia.com>

…ule/cudf into fea-read_parquet-error-report

etseidl

Looking good, just a few naming nits :D

cpp/src/io/parquet/page_decode.cuh

PointKernel

LGTM

vuule · 2023-09-28T00:33:54Z

/merge

Fixes #13656. Uses the error reporting introduced in #14167 to report errors in header parsing. Authors: - Ed Seidl (https://github.com/etseidl) - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Bradley Dice (https://github.com/bdice) URL: #14237

vuule added 6 commits September 20, 2023 14:33

pass error code back to host

49efa51

Merge branch 'branch-23.10' of https://github.com/rapidsai/cudf into …

903261a

…fea-read_parquet-error-report

fix conditions

519f37f

atomic shared error code

c2e8ff0

atomic global

da12224

revert test error

57e0bc7

vuule added feature request New feature or request cuIO cuIO issue non-breaking Non-breaking change labels Sep 21, 2023

vuule self-assigned this Sep 21, 2023

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 21, 2023

Merge branch 'branch-23.10' into fea-read_parquet-error-report

e9f8acf

vuule marked this pull request as ready for review September 23, 2023 01:38

vuule requested a review from a team as a code owner September 23, 2023 01:38

vuule requested review from bdice and divyegala September 23, 2023 01:38

etseidl reviewed Sep 25, 2023

View reviewed changes

cpp/src/io/parquet/page_data.cu Outdated Show resolved Hide resolved

GregoryKimball requested a review from nvdbaranec September 26, 2023 18:39

divyegala reviewed Sep 26, 2023

View reviewed changes

cpp/src/io/parquet/page_decode.cuh Outdated Show resolved Hide resolved

nvdbaranec reviewed Sep 26, 2023

View reviewed changes

Merge branch 'branch-23.10' of https://github.com/rapidsai/cudf into …

70c4a7e

…fea-read_parquet-error-report

GregoryKimball mentioned this pull request Sep 27, 2023

[BUG] Malformed fixed length byte array Parquet file loads corrupted data instead of error #14104

Open

vuule added 3 commits September 27, 2023 00:02

Merge branch 'branch-23.10' of https://github.com/rapidsai/cudf into …

31b2111

…fea-read_parquet-error-report

codes as masks; enum

3831bc8

Merge branch 'fea-read_parquet-error-report' of https://github.com/vu…

5b1e332

…ule/cudf into fea-read_parquet-error-report

etseidl mentioned this pull request Sep 27, 2023

[BUG] libcudf fails to recognized malformed dictionary during Parquet read #13656

Closed

vuule requested review from nvdbaranec, etseidl and divyegala September 27, 2023 16:41

vuule commented Sep 27, 2023

View reviewed changes

cpp/src/io/parquet/page_decode.cuh Outdated Show resolved Hide resolved

etseidl reviewed Sep 27, 2023

View reviewed changes

cpp/src/io/parquet/page_decode.cuh Outdated Show resolved Hide resolved

nvdbaranec approved these changes Sep 27, 2023

View reviewed changes

vuule and others added 6 commits September 27, 2023 11:47

Merge branch 'branch-23.10' into fea-read_parquet-error-report

a5250f9

memory order

4c2a235

Co-authored-by: Yunsong Wang <yunsongw@nvidia.com>

move enum; docs

0d15e49

Merge branch 'branch-23.10' into fea-read_parquet-error-report

e2531c8

Merge branch 'fea-read_parquet-error-report' of https://github.com/vu…

d87abbf

…ule/cudf into fea-read_parquet-error-report

style

575f40c

etseidl reviewed Sep 27, 2023

View reviewed changes

cpp/src/io/parquet/page_decode.cuh Outdated Show resolved Hide resolved

cpp/src/io/parquet/page_decode.cuh Outdated Show resolved Hide resolved

rename some codes

718c166

vuule requested a review from PointKernel September 27, 2023 21:25

divyegala approved these changes Sep 27, 2023

View reviewed changes

vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Sep 27, 2023

PointKernel approved these changes Sep 27, 2023

View reviewed changes

bdice approved these changes Sep 27, 2023

View reviewed changes

rapids-bot bot merged commit 2c19bf3 into rapidsai:branch-23.10 Sep 28, 2023
57 of 58 checks passed

etseidl mentioned this pull request Sep 29, 2023

Detect and report errors in Parquet header parsing #14237

Merged

3 tasks

GregoryKimball mentioned this pull request Nov 15, 2023

[BUG] Resolve parquet reader performance regression on V100 from #14167 #14415

Closed

hyperbolic2346 mentioned this pull request Nov 30, 2023

[FEA] Add a Parquet reader benchmark that uses multiple CUDA streams #12700

Closed

etseidl mentioned this pull request Jan 4, 2024

Potential fix for peformance regression in #14415 #14706

Merged

3 tasks

This was referenced Feb 23, 2024

[BUG] parquet reader::impl::decode_page_data error_code checking slowness #15122

Closed

Use hostdevice_vector in kernel_error to avoid the pageable copy #15140

Merged

hyperbolic2346 mentioned this pull request Apr 24, 2024

Add multithreaded parquet reader benchmarks. #15585

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagate errors from Parquet reader kernels back to host #14167

Propagate errors from Parquet reader kernels back to host #14167

vuule commented Sep 21, 2023 •

edited

Loading

vuule commented Sep 25, 2023

etseidl left a comment

vuule commented Sep 25, 2023

etseidl commented Sep 25, 2023

nvdbaranec Sep 26, 2023

vuule Sep 26, 2023

etseidl Sep 26, 2023

nvdbaranec Sep 26, 2023

vuule commented Sep 26, 2023

nvdbaranec left a comment

etseidl left a comment

PointKernel left a comment

vuule commented Sep 28, 2023

		cuda::atomic_ref<int32_t, cuda::thread_scope_block> ref{const_cast<int&>(error)};
		ref.store(err, cuda::std::memory_order_relaxed);

Propagate errors from Parquet reader kernels back to host #14167

Propagate errors from Parquet reader kernels back to host #14167

Conversation

vuule commented Sep 21, 2023 • edited Loading

Description

Checklist

vuule commented Sep 25, 2023

etseidl left a comment

Choose a reason for hiding this comment

vuule commented Sep 25, 2023

etseidl commented Sep 25, 2023

nvdbaranec Sep 26, 2023

Choose a reason for hiding this comment

vuule Sep 26, 2023

Choose a reason for hiding this comment

etseidl Sep 26, 2023

Choose a reason for hiding this comment

nvdbaranec Sep 26, 2023

Choose a reason for hiding this comment

vuule commented Sep 26, 2023

nvdbaranec left a comment

Choose a reason for hiding this comment

etseidl left a comment

Choose a reason for hiding this comment

PointKernel left a comment

Choose a reason for hiding this comment

vuule commented Sep 28, 2023

vuule commented Sep 21, 2023 •

edited

Loading