Basic validation in reader benchmarks #14647

vuule · 2023-12-18T18:44:51Z

Description

Check the output table shape in the CSV, JSON, ORC and Parquet reader benchmarks.

Other changes:
Fixed some chunking logic in the CSV reader benchmark.
Shortened the lifetime of the original table to reduce peak memory use (adopted the pattern from the JSON reader benchmark).

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…bm-basic-validation

vuule · 2023-12-20T18:22:52Z

cpp/benchmarks/io/csv/csv_reader_options.cpp

+  size_t const chunk_size = cudf::util::div_rounding_up_safe(source_sink.size(), num_chunks);
+  auto const chunk_row_cnt =
+    cudf::util::div_rounding_up_safe(view.num_rows(), static_cast<cudf::size_type>(num_chunks));


The old approach was rounding down and losing some rows. Adding the check uncovered the issue.
Also some logic in the loop got simplified by rounding up here.

PointKernel

Some non-blocking nits. Otherwise LGTM.

cpp/benchmarks/io/parquet/parquet_reader_input.cpp

Co-authored-by: Yunsong Wang <yunsongw@nvidia.com>

hyperbolic2346

A couple of questions...

cpp/benchmarks/io/csv/csv_reader_options.cpp

cpp/benchmarks/io/orc/orc_reader_input.cpp

…bm-basic-validation

…nto bm-basic-validation

vuule · 2023-12-27T21:40:39Z

/merge

vuule added 4 commits October 9, 2023 15:09

CSV

3d571e5

Merge branch 'branch-23.12' of https://github.com/rapidsai/cudf into …

cf8074b

…bm-basic-validation

Merge branch 'branch-24.02' of https://github.com/rapidsai/cudf into …

62e295b

…bm-basic-validation

Merge branch 'branch-24.02' of https://github.com/rapidsai/cudf into …

aeffc5a

…bm-basic-validation

vuule self-assigned this Dec 18, 2023

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Dec 18, 2023

vuule added tests Unit testing for project improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed libcudf Affects libcudf (C++/CUDA) code. labels Dec 18, 2023

vuule added 6 commits December 18, 2023 10:46

small revert

70865f3

red_orc checks

d95f1e5

read_parquet checks

a9f09d5

read_json checks

8f1918e

no tbl orc and pq

d332227

chunked pq reader check

c053f97

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Dec 18, 2023

Merge branch 'branch-24.02' into bm-basic-validation

01fc637

vuule marked this pull request as ready for review December 19, 2023 18:28

vuule requested a review from a team as a code owner December 19, 2023 18:28

vuule requested review from shrshi and davidwendt December 19, 2023 18:28

vuule commented Dec 20, 2023

View reviewed changes

PointKernel approved these changes Dec 20, 2023

View reviewed changes

cpp/benchmarks/io/parquet/parquet_reader_input.cpp Outdated Show resolved Hide resolved

cpp/benchmarks/io/parquet/parquet_reader_input.cpp Outdated Show resolved Hide resolved

cpp/benchmarks/io/parquet/parquet_reader_input.cpp Outdated Show resolved Hide resolved

vuule and others added 2 commits December 20, 2023 12:05

code review - const

8b90d9d

Co-authored-by: Yunsong Wang <yunsongw@nvidia.com>

Merge branch 'branch-24.02' into bm-basic-validation

0329442

hyperbolic2346 reviewed Dec 26, 2023

View reviewed changes

cpp/benchmarks/io/csv/csv_reader_options.cpp Show resolved Hide resolved

cpp/benchmarks/io/orc/orc_reader_input.cpp Outdated Show resolved Hide resolved

vuule added 3 commits December 27, 2023 09:35

Merge branch 'branch-24.02' of https://github.com/rapidsai/cudf into …

c93514f

…bm-basic-validation

num_rows_written

408a731

clarified comment

f18a0e1

Merge branch 'bm-basic-validation' of https://github.com/vuule/cudf i…

1ac590d

…nto bm-basic-validation

hyperbolic2346 approved these changes Dec 27, 2023

View reviewed changes

rapids-bot bot merged commit 72e6f9b into rapidsai:branch-24.02 Dec 27, 2023
67 checks passed

vuule deleted the bm-basic-validation branch December 27, 2023 21:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic validation in reader benchmarks #14647

Basic validation in reader benchmarks #14647

vuule commented Dec 18, 2023 •

edited

Loading

vuule Dec 20, 2023

PointKernel left a comment

hyperbolic2346 left a comment

vuule commented Dec 27, 2023

Basic validation in reader benchmarks #14647

Basic validation in reader benchmarks #14647

Conversation

vuule commented Dec 18, 2023 • edited Loading

Description

Checklist

vuule Dec 20, 2023

Choose a reason for hiding this comment

PointKernel left a comment

Choose a reason for hiding this comment

hyperbolic2346 left a comment

Choose a reason for hiding this comment

vuule commented Dec 27, 2023

vuule commented Dec 18, 2023 •

edited

Loading