Skip to content

Commit

Permalink
Merge branch 'branch-22.08' of https://github.com/rapidsai/cudf into …
Browse files Browse the repository at this point in the history
…json-tree
  • Loading branch information
karthikeyann committed Jul 26, 2022
2 parents 12cf0be + 2d214ea commit 2b59b04
Show file tree
Hide file tree
Showing 97 changed files with 2,105 additions and 4,834 deletions.
65 changes: 9 additions & 56 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,56 +1,9 @@
<!--
Thank you for contributing to cuDF :)
Here are some guidelines to help the review process go smoothly.
1. Please write a description in this text box of the changes that are being
made.
2. Please ensure that you have written units tests for the changes made/features
added.
3. There are CI checks in place to enforce that committed code follows our style
and syntax standards. Please see our contribution guide in `CONTRIBUTING.MD`
in the project root for more information about the checks we perform and how
you can run them locally.
4. If you are closing an issue please use one of the automatic closing words as
noted here: https://help.github.com/articles/closing-issues-using-keywords/
5. If your pull request is not ready for review but you want to make use of the
continuous integration testing facilities please mark your pull request as Draft.
https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/changing-the-stage-of-a-pull-request#converting-a-pull-request-to-a-draft
6. If your pull request is ready to be reviewed without requiring additional
work on top of it, then remove it from "Draft" and make it "Ready for Review".
https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/changing-the-stage-of-a-pull-request#marking-a-pull-request-as-ready-for-review
If assistance is required to complete the functionality, for example when the
C/C++ code of a feature is complete but Python bindings are still required,
then add the label `help wanted` so that others can triage and assist.
The additional changes then can be implemented on top of the same PR.
If the assistance is done by members of the rapidsAI team, then no
additional actions are required by the creator of the original PR for this,
otherwise the original author of the PR needs to give permission to the
person(s) assisting to commit to their personal fork of the project. If that
doesn't happen then a new PR based on the code of the original PR can be
opened by the person assisting, which then will be the PR that will be
merged.
7. Once all work has been done and review has taken place please do not add
features or make changes out of the scope of those requested by the reviewer
(doing this just add delays as already reviewed code ends up having to be
re-reviewed/it is hard to tell what is new etc!). Further, please do not
rebase your branch on the target branch, force push, or rewrite history.
Doing any of these causes the context of any comments made by reviewers to be lost.
If conflicts occur against the target branch they should be resolved by
merging the target branch into the branch used for making the pull request.
8. Pull requests that modify cpp source that are marked ready for review
will automatically be assigned two cudf-cpp-codeowners reviewers.
Ensure at least two approvals from cudf-cpp-codeowners before merging.
Many thanks in advance for your cooperation!
-->
## Description
<!-- Provide a standalone description of changes in this PR. -->
<!-- Reference any issues closed by this PR with "closes #1234". -->
<!-- Note: The pull request title will be included in the CHANGELOG. -->

## Checklist
- [ ] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md).
- [ ] New or existing tests cover these changes.
- [ ] The documentation is up to date with these changes.
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ conduct. More information can be found at:
describes your planned work. For example, `fix-documentation`.
5. Write code to address the issue or implement the feature.
6. Add unit tests and unit benchmarks.
7. [Create your pull request](https://github.com/rapidsai/cudf/compare).
7. [Create your pull request](https://github.com/rapidsai/cudf/compare). To run continuous integration (CI) tests without requesting review, open a draft pull request.
8. Verify that CI passes all [status checks](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/collaborating-on-repositories-with-code-quality-features/about-status-checks).
Fix if needed.
9. Wait for other developers to review your code and update code as needed.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ cuDF provides a pandas-like API that will be familiar to data engineers & data s

For example, the following snippet downloads a CSV, then uses the GPU to parse it into rows and columns and run calculations:
```python
import cudf, io, requests
import cudf, requests
from io import StringIO

url = "https://github.com/plotly/datasets/raw/master/tips.csv"
Expand Down
4 changes: 2 additions & 2 deletions conda/environments/cudf_dev_cuda11.5.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ dependencies:
- numba>=0.54
- numpy
- pandas>=1.0,<1.5.0dev0
- pyarrow=8.0.0=*cuda
- pyarrow=8.0.0
- fastavro>=0.22.9
- python-snappy>=0.6.0
- notebook>=0.5.0
Expand Down Expand Up @@ -53,7 +53,6 @@ dependencies:
- streamz
- arrow-cpp=8.0.0
- dlpack>=0.5,<0.6.0a0
- arrow-cpp-proc * cuda
- double-conversion
- rapidjson
- hypothesis
Expand All @@ -77,6 +76,7 @@ dependencies:
- botocore>=1.24.21
- aiobotocore>=2.2.0
- s3fs>=2022.3.0
- pytorch<1.12.0
- pip:
- git+https://github.com/python-streamz/streamz.git@master
- pyorc
Expand Down
2 changes: 1 addition & 1 deletion conda/recipes/cudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ requirements:
- setuptools
- numba >=0.54
- dlpack>=0.5,<0.6.0a0
- pyarrow =8.0.0 *cuda
- pyarrow =8.0.0
- libcudf ={{ version }}
- rmm ={{ minor_version }}
- cudatoolkit ={{ cuda_version }}
Expand Down
9 changes: 2 additions & 7 deletions conda/recipes/libcudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,7 @@ requirements:
host:
- librmm {{ minor_version }}.*
- cudatoolkit {{ cuda_version }}.*
- arrow-cpp {{ arrow_cpp_version }} *cuda
- arrow-cpp-proc * cuda
- arrow-cpp {{ arrow_cpp_version }}
- dlpack {{ dlpack_version }}
- librdkafka {{ librdkafka_version }}

Expand All @@ -57,8 +56,7 @@ outputs:
run:
- cudatoolkit {{ cuda_spec }}
- librmm {{ minor_version }}.*
- arrow-cpp {{ arrow_cpp_version }} *cuda
- arrow-cpp-proc * cuda
- arrow-cpp {{ arrow_cpp_version }}
- dlpack {{ dlpack_version }}
test:
commands:
Expand Down Expand Up @@ -159,7 +157,6 @@ outputs:
- test -f $PREFIX/include/cudf/io/text/detail/trie.hpp
- test -f $PREFIX/include/cudf/io/text/multibyte_split.hpp
- test -f $PREFIX/include/cudf/io/types.hpp
- test -f $PREFIX/include/cudf/ipc.hpp
- test -f $PREFIX/include/cudf/join.hpp
- test -f $PREFIX/include/cudf/labeling/label_bins.hpp
- test -f $PREFIX/include/cudf/lists/combine.hpp
Expand All @@ -169,13 +166,11 @@ outputs:
- test -f $PREFIX/include/cudf/lists/detail/concatenate.hpp
- test -f $PREFIX/include/cudf/lists/detail/contains.hpp
- test -f $PREFIX/include/cudf/lists/detail/copying.hpp
- test -f $PREFIX/include/cudf/lists/detail/drop_list_duplicates.hpp
- test -f $PREFIX/include/cudf/lists/detail/extract.hpp
- test -f $PREFIX/include/cudf/lists/detail/interleave_columns.hpp
- test -f $PREFIX/include/cudf/lists/detail/scatter_helper.cuh
- test -f $PREFIX/include/cudf/lists/detail/sorting.hpp
- test -f $PREFIX/include/cudf/lists/detail/stream_compaction.hpp
- test -f $PREFIX/include/cudf/lists/drop_list_duplicates.hpp
- test -f $PREFIX/include/cudf/lists/explode.hpp
- test -f $PREFIX/include/cudf/lists/extract.hpp
- test -f $PREFIX/include/cudf/lists/filling.hpp
Expand Down
26 changes: 10 additions & 16 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -196,26 +196,26 @@ add_library(
src/ast/expression_parser.cpp
src/ast/expressions.cpp
src/binaryop/binaryop.cpp
src/binaryop/compiled/Add.cu
src/binaryop/compiled/ATan2.cu
src/binaryop/compiled/Add.cu
src/binaryop/compiled/BitwiseAnd.cu
src/binaryop/compiled/BitwiseOr.cu
src/binaryop/compiled/BitwiseXor.cu
src/binaryop/compiled/Less.cu
src/binaryop/compiled/Greater.cu
src/binaryop/compiled/LessEqual.cu
src/binaryop/compiled/GreaterEqual.cu
src/binaryop/compiled/Div.cu
src/binaryop/compiled/equality_ops.cu
src/binaryop/compiled/FloorDiv.cu
src/binaryop/compiled/Greater.cu
src/binaryop/compiled/GreaterEqual.cu
src/binaryop/compiled/IntPow.cu
src/binaryop/compiled/Less.cu
src/binaryop/compiled/LessEqual.cu
src/binaryop/compiled/LogBase.cu
src/binaryop/compiled/LogicalAnd.cu
src/binaryop/compiled/LogicalOr.cu
src/binaryop/compiled/Mod.cu
src/binaryop/compiled/Mul.cu
src/binaryop/compiled/NullEquals.cu
src/binaryop/compiled/NullLogicalOr.cu
src/binaryop/compiled/NullLogicalAnd.cu
src/binaryop/compiled/NullLogicalOr.cu
src/binaryop/compiled/NullMax.cu
src/binaryop/compiled/NullMin.cu
src/binaryop/compiled/PMod.cu
Expand All @@ -227,6 +227,7 @@ add_library(
src/binaryop/compiled/Sub.cu
src/binaryop/compiled/TrueDiv.cu
src/binaryop/compiled/binary_ops.cu
src/binaryop/compiled/equality_ops.cu
src/binaryop/compiled/util.cpp
src/labeling/label_bins.cu
src/bitmask/null_mask.cu
Expand All @@ -236,7 +237,6 @@ add_library(
src/column/column_factories.cpp
src/column/column_factories.cu
src/column/column_view.cpp
src/comms/ipc/ipc.cpp
src/copying/concatenate.cu
src/copying/contiguous_split.cu
src/copying/copy.cpp
Expand Down Expand Up @@ -379,7 +379,6 @@ add_library(
src/lists/copying/segmented_gather.cu
src/lists/copying/scatter_helper.cu
src/lists/count_elements.cu
src/lists/drop_list_duplicates.cu
src/lists/explode.cu
src/lists/extract.cu
src/lists/interleave_columns.cu
Expand Down Expand Up @@ -711,6 +710,8 @@ add_library(cudf::cudftestutil ALIAS cudftestutil)
if(CUDF_BUILD_TESTS)
# include CTest module -- automatically calls enable_testing()
include(CTest)
# Always print verbose output when tests fail if run using `make test`.
list(APPEND CMAKE_CTEST_ARGUMENTS "--output-on-failure")
add_subdirectory(tests)
endif()

Expand Down Expand Up @@ -807,13 +808,6 @@ endif()
]=]
)

set(install_code_string
[=[
set(ArrowCUDA_DIR "${Arrow_DIR}")
find_dependency(ArrowCUDA)
]=]
)

if(CUDF_ENABLE_ARROW_PARQUET)
string(
APPEND
Expand Down
18 changes: 6 additions & 12 deletions cpp/benchmarks/join/join.cu
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,10 @@ void nvbench_inner_join(nvbench::state& state,

auto join = [](cudf::table_view const& left_input,
cudf::table_view const& right_input,
std::vector<cudf::size_type> const& left_on,
std::vector<cudf::size_type> const& right_on,
cudf::null_equality compare_nulls,
rmm::cuda_stream_view stream) {
cudf::hash_join hj_obj(left_input.select(left_on), compare_nulls, stream);
return hj_obj.inner_join(right_input.select(right_on), std::nullopt, stream);
cudf::hash_join hj_obj(left_input, compare_nulls, stream);
return hj_obj.inner_join(right_input, std::nullopt, stream);
};

BM_join<key_type, payload_type, Nullable>(state, join);
Expand All @@ -66,12 +64,10 @@ void nvbench_left_join(nvbench::state& state,

auto join = [](cudf::table_view const& left_input,
cudf::table_view const& right_input,
std::vector<cudf::size_type> const& left_on,
std::vector<cudf::size_type> const& right_on,
cudf::null_equality compare_nulls,
rmm::cuda_stream_view stream) {
cudf::hash_join hj_obj(left_input.select(left_on), compare_nulls, stream);
return hj_obj.left_join(right_input.select(right_on), std::nullopt, stream);
cudf::hash_join hj_obj(left_input, compare_nulls, stream);
return hj_obj.left_join(right_input, std::nullopt, stream);
};

BM_join<key_type, payload_type, Nullable>(state, join);
Expand All @@ -88,12 +84,10 @@ void nvbench_full_join(nvbench::state& state,

auto join = [](cudf::table_view const& left_input,
cudf::table_view const& right_input,
std::vector<cudf::size_type> const& left_on,
std::vector<cudf::size_type> const& right_on,
cudf::null_equality compare_nulls,
rmm::cuda_stream_view stream) {
cudf::hash_join hj_obj(left_input.select(left_on), compare_nulls, stream);
return hj_obj.full_join(right_input.select(right_on), std::nullopt, stream);
cudf::hash_join hj_obj(left_input, compare_nulls, stream);
return hj_obj.full_join(right_input, std::nullopt, stream);
};

BM_join<key_type, payload_type, Nullable>(state, join);
Expand Down
11 changes: 5 additions & 6 deletions cpp/benchmarks/join/join_common.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -143,17 +143,16 @@ static void BM_join(state_type& state, Join JoinFunc)
for (auto _ : state) {
cuda_event_timer raii(state, true, cudf::default_stream_value);

auto result = JoinFunc(
probe_table, build_table, columns_to_join, columns_to_join, cudf::null_equality::UNEQUAL);
auto result = JoinFunc(probe_table.select(columns_to_join),
build_table.select(columns_to_join),
cudf::null_equality::UNEQUAL);
}
}
if constexpr (std::is_same_v<state_type, nvbench::state> and (not is_conditional)) {
state.exec(nvbench::exec_tag::sync, [&](nvbench::launch& launch) {
rmm::cuda_stream_view stream_view{launch.get_stream()};
auto result = JoinFunc(probe_table,
build_table,
columns_to_join,
columns_to_join,
auto result = JoinFunc(probe_table.select(columns_to_join),
build_table.select(columns_to_join),
cudf::null_equality::UNEQUAL,
stream_view);
});
Expand Down
44 changes: 20 additions & 24 deletions cpp/benchmarks/join/left_join.cu
Original file line number Diff line number Diff line change
Expand Up @@ -20,37 +20,33 @@ template <typename key_type, typename payload_type>
class Join : public cudf::benchmark {
};

#define LEFT_ANTI_JOIN_BENCHMARK_DEFINE(name, key_type, payload_type, nullable) \
BENCHMARK_TEMPLATE_DEFINE_F(Join, name, key_type, payload_type) \
(::benchmark::State & st) \
{ \
auto join = [](cudf::table_view const& left, \
cudf::table_view const& right, \
std::vector<cudf::size_type> const& left_on, \
std::vector<cudf::size_type> const& right_on, \
cudf::null_equality compare_nulls) { \
return cudf::left_anti_join(left, right, left_on, right_on, compare_nulls); \
}; \
BM_join<key_type, payload_type, nullable>(st, join); \
#define LEFT_ANTI_JOIN_BENCHMARK_DEFINE(name, key_type, payload_type, nullable) \
BENCHMARK_TEMPLATE_DEFINE_F(Join, name, key_type, payload_type) \
(::benchmark::State & st) \
{ \
auto join = [](cudf::table_view const& left, \
cudf::table_view const& right, \
cudf::null_equality compare_nulls) { \
return cudf::left_anti_join(left, right, compare_nulls); \
}; \
BM_join<key_type, payload_type, nullable>(st, join); \
}

LEFT_ANTI_JOIN_BENCHMARK_DEFINE(left_anti_join_32bit, int32_t, int32_t, false);
LEFT_ANTI_JOIN_BENCHMARK_DEFINE(left_anti_join_64bit, int64_t, int64_t, false);
LEFT_ANTI_JOIN_BENCHMARK_DEFINE(left_anti_join_32bit_nulls, int32_t, int32_t, true);
LEFT_ANTI_JOIN_BENCHMARK_DEFINE(left_anti_join_64bit_nulls, int64_t, int64_t, true);

#define LEFT_SEMI_JOIN_BENCHMARK_DEFINE(name, key_type, payload_type, nullable) \
BENCHMARK_TEMPLATE_DEFINE_F(Join, name, key_type, payload_type) \
(::benchmark::State & st) \
{ \
auto join = [](cudf::table_view const& left, \
cudf::table_view const& right, \
std::vector<cudf::size_type> const& left_on, \
std::vector<cudf::size_type> const& right_on, \
cudf::null_equality compare_nulls) { \
return cudf::left_semi_join(left, right, left_on, right_on, compare_nulls); \
}; \
BM_join<key_type, payload_type, nullable>(st, join); \
#define LEFT_SEMI_JOIN_BENCHMARK_DEFINE(name, key_type, payload_type, nullable) \
BENCHMARK_TEMPLATE_DEFINE_F(Join, name, key_type, payload_type) \
(::benchmark::State & st) \
{ \
auto join = [](cudf::table_view const& left, \
cudf::table_view const& right, \
cudf::null_equality compare_nulls) { \
return cudf::left_semi_join(left, right, compare_nulls); \
}; \
BM_join<key_type, payload_type, nullable>(st, join); \
}

LEFT_SEMI_JOIN_BENCHMARK_DEFINE(left_semi_join_32bit, int32_t, int32_t, false);
Expand Down
Loading

0 comments on commit 2b59b04

Please sign in to comment.