Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] libcudf JSON reader crash with compressed data #16248

Open
lithomas1 opened this issue Jul 10, 2024 · 2 comments
Open

[BUG] libcudf JSON reader crash with compressed data #16248

lithomas1 opened this issue Jul 10, 2024 · 2 comments
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.

Comments

@lithomas1
Copy link
Contributor

lithomas1 commented Jul 10, 2024

Describe the bug
A clear and concise description of what the bug is.

The libcudf JSON reader is "crashing" (not sure if its technically a crash, but I'm getting a CUDA error)

RuntimeError: CUDA error encountered at: /home/coder/cudf/cpp/src/io/json/read_json.cu:142: 1 cudaErrorInvalidValue invalid argument

Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

import cudf
cudf.read_json("baddf.json.gz", orient="records", lines=True, engine="cudf") # Doesn't work :(
pd.read_json("baddf.json.gz", orient="records", lines=True) # OK

Expected behavior
A clear and concise description of what you expected to happen.

Environment overview (please complete the following information)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
  • Method of cuDF install: [conda, Docker, or from source]
    • If method of install is [Docker], provide docker pull & docker run commands used

Successful read, like with pandas.

Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

My cudf is the latest cudf (from main).

Additional context

I think the issue might be with the specific data values (they are all integers, even the string/floating columns). I'm pretty sure libcudf can write all the data types (even the nested struct/list ones).

baddf.json.gz

Also, if you uncompress the file by hand, you are able to read it with cudf

@lithomas1 lithomas1 added bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. labels Jul 10, 2024
@lithomas1 lithomas1 changed the title [BUG] libcudf JSON reader crash [BUG] libcudf JSON reader crash with compressed data Jul 11, 2024
@wence-
Copy link
Contributor

wence- commented Jul 11, 2024

Compute-sanitizer:

========= COMPUTE-SANITIZER
========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaMemcpyAsync.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame: [0x445b06]
=========                in /usr/lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame:cudaMemcpyAsync [0x6dabf]
=========                in /home/coder/.conda/envs/rapids/lib/libcudart.so.12
=========     Host Frame:cudf::io::json::detail::ingest_raw_input(cudf::device_span<char, 18446744073709551615ul>, cudf::host_span<std::unique_ptr<cudf::io::datasource, std::default_delete<cudf::io::datasource> >, 18446744073709551615ul>, cudf::io::compression_type, unsigned long, unsigned long, rmm::cuda_stream_view) [0x1decfdf]
=========                in /home/coder/cudf/cpp/build/conda/cuda-12.2/release/libcudf.so
=========     Host Frame:cudf::io::json::detail::get_record_range_raw_input(cudf::host_span<std::unique_ptr<cudf::io::datasource, std::default_delete<cudf::io::datasource> >, 18446744073709551615ul>, cudf::io::json_reader_options const&, rmm::cuda_stream_view) [0x1dee514]
=========                in /home/coder/cudf/cpp/build/conda/cuda-12.2/release/libcudf.so
=========     Host Frame:cudf::io::json::detail::read_batch(cudf::host_span<std::unique_ptr<cudf::io::datasource, std::default_delete<cudf::io::datasource> >, 18446744073709551615ul>, cudf::io::json_reader_options const&, rmm::cuda_stream_view, cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible>) [0x1deeb75]
=========                in /home/coder/cudf/cpp/build/conda/cuda-12.2/release/libcudf.so
=========     Host Frame:cudf::io::json::detail::read_json(cudf::host_span<std::unique_ptr<cudf::io::datasource, std::default_delete<cudf::io::datasource> >, 18446744073709551615ul>, cudf::io::json_reader_options const&, rmm::cuda_stream_view, cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible>) [0x1df030a]
=========                in /home/coder/cudf/cpp/build/conda/cuda-12.2/release/libcudf.so
=========     Host Frame:cudf::io::read_json(cudf::io::json_reader_options, rmm::cuda_stream_view, cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible>) [0x1d2de18]
=========                in /home/coder/cudf/cpp/build/conda/cuda-12.2/release/libcudf.so

@lithomas1 lithomas1 added the cuIO cuIO issue label Jul 12, 2024
@GregoryKimball
Copy link
Contributor

Thank you @lithomas1 for sharing issue. We haven't done much testing with compressed JSON inputs. There could be a straightforward solution here, and we will take a closer look as soon as we can.

@GregoryKimball GregoryKimball added this to the Nested JSON reader milestone Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.
Projects
Status: In Progress
Development

No branches or pull requests

3 participants