-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
string scalar support in AST - proof of concept #6
Commits on Mar 30, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 982af8a - Browse repository at this point
Copy the full SHA 982af8aView commit details
Commits on Apr 4, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 0a9eb86 - Browse repository at this point
Copy the full SHA 0a9eb86View commit details -
Configuration menu - View commit details
-
Copy full SHA for 50ee55d - Browse repository at this point
Copy the full SHA 50ee55dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9735d51 - Browse repository at this point
Copy the full SHA 9735d51View commit details -
Merge branch 'branch-23.06' of github.com:rapidsai/cudf into fea-stri…
…ng_scalar_ast_compare
Configuration menu - View commit details
-
Copy full SHA for 8653e61 - Browse repository at this point
Copy the full SHA 8653e61View commit details
Commits on Apr 5, 2023
-
Fix OOB memory access in CSV reader when reading without NA values (r…
…apidsai#13011) CSV reader uses a trie to read field with special values as nulls. The creation of the trie does not work correctly when there are not special values. This can happen when the NA filter is enabled, but the default NA values are removed, and user does not specify custom values. In this case, use of this trie leads to OOB memory access. This PR fixes the trie creation to create an empty trie when there are not special values to look for. Included a C++ test that crashes without the fix. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Mike Wilson (https://github.com/hyperbolic2346) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#13011
Configuration menu - View commit details
-
Copy full SHA for 9a770f6 - Browse repository at this point
Copy the full SHA 9a770f6View commit details -
Add except declaration in Cython interface for regex_program::create (r…
…apidsai#13054) Add the `except +` declaration to the `cudf::strings::regex_program::create()` function in the Cython `regex_program.pxd` interface since invalid regex patterns are thrown by this call. This allows the normal Cython exception handling to pass the exception to the Python logic without aborting the process. Closes rapidsai#13052 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Ashwin Srinath (https://github.com/shwina) URL: rapidsai#13054
Configuration menu - View commit details
-
Copy full SHA for 7a739ce - Browse repository at this point
Copy the full SHA 7a739ceView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7268b5f - Browse repository at this point
Copy the full SHA 7268b5fView commit details -
Fix tests/identify_stream_usage.cpp (rapidsai#13066)
The identify_stream_usage test uses `strcmp` but not does not include `<cstring>`. This PR fixes that. The missing include was surfaced by rapidsai#13064, showing that the test relied on headers in `spdlog` to include `cstring`. Authors: - Allard Hendriksen (https://github.com/ahendriksen) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#13066
Configuration menu - View commit details
-
Copy full SHA for da7fe2a - Browse repository at this point
Copy the full SHA da7fe2aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6563440 - Browse repository at this point
Copy the full SHA 6563440View commit details -
Configuration menu - View commit details
-
Copy full SHA for 54e7889 - Browse repository at this point
Copy the full SHA 54e7889View commit details -
Configuration menu - View commit details
-
Copy full SHA for 46a8016 - Browse repository at this point
Copy the full SHA 46a8016View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1d95f75 - Browse repository at this point
Copy the full SHA 1d95f75View commit details -
Configuration menu - View commit details
-
Copy full SHA for a3ed98a - Browse repository at this point
Copy the full SHA a3ed98aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5179b8e - Browse repository at this point
Copy the full SHA 5179b8eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 241b560 - Browse repository at this point
Copy the full SHA 241b560View commit details -
Configuration menu - View commit details
-
Copy full SHA for d1a0114 - Browse repository at this point
Copy the full SHA d1a0114View commit details -
Add algorithm include in data_sink.hpp (rapidsai#13068)
`data_sink.hpp` uses `std::transform` but not does not include <algorithm>. This PR fixes that. The missing include was surfaced by rapidsai#13064. Authors: - Allard Hendriksen (https://github.com/ahendriksen) Approvers: - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#13068
Configuration menu - View commit details
-
Copy full SHA for 0cf8c91 - Browse repository at this point
Copy the full SHA 0cf8c91View commit details
Commits on Apr 6, 2023
-
Merge pull request rapidsai#13070 from galipremsagar/pin_dask
[REVIEW] Pin `dask` and `distributed` for release
Configuration menu - View commit details
-
Copy full SHA for 2c3b2ab - Browse repository at this point
Copy the full SHA 2c3b2abView commit details -
Update
join
to use experimental row hasher and comparator (rapidsai……#12787) Part of rapidsai#11844. I will create a separate PR for `mixed_join`. Compilation times: `main` rapidsai@94bbc82 : `16m47.513s` This PR rapidsai@5d75db8 : `16m47.520s` Benchmarks: rapidsai#12787 (comment) Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#12787
Configuration menu - View commit details
-
Copy full SHA for d5aad2f - Browse repository at this point
Copy the full SHA d5aad2fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0f4ea41 - Browse repository at this point
Copy the full SHA 0f4ea41View commit details -
Merge pull request rapidsai#13080 from galipremsagar/branch-23.06-mer…
…ge-23.04 Resolved automerger from `branch-23.04` to `branch-23.06`
Configuration menu - View commit details
-
Copy full SHA for d82f97c - Browse repository at this point
Copy the full SHA d82f97cView commit details
Commits on Apr 7, 2023
-
Adding
hostdevice_span
that is a span createable from `hostdevice_v……ector` (rapidsai#12981) I ran into a need for a span-like view into a `hostdevice_vector`. I was chopping it up into pieces to pass into a function to process portions at a time, but it still wanted to do things like host to device on the spans. This class is a result of that need. Authors: - Mike Wilson (https://github.com/hyperbolic2346) - Nghia Truong (https://github.com/ttnghia) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vukasin Milovanovic (https://github.com/vuule) URL: rapidsai#12981
Configuration menu - View commit details
-
Copy full SHA for e28c9c5 - Browse repository at this point
Copy the full SHA e28c9c5View commit details -
Fix column selection
read_parquet
benchmarks (rapidsai#13082)Helper function `get_col_names` in the Parquet reader benchmarks throws with nested columns. It should instead just ignore the children columns and return the top-level colum names. Also renamed the function to better reflect what it does. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - https://github.com/nvdbaranec - Yunsong Wang (https://github.com/PointKernel) URL: rapidsai#13082
Configuration menu - View commit details
-
Copy full SHA for 46b5900 - Browse repository at this point
Copy the full SHA 46b5900View commit details -
Compute column sizes in Parquet preprocess with single kernel (rapids…
…ai#12931) Addresses rapidsai#11922 Currently in Parquet preprocessing a `thrust::reduce()` and `thrust::exclusive_scan_by_key()` is performed to compute the column size and offsets for each nested column. For complicated schemas this results in a large number of kernel invocations. This PR calculates the sizes and offsets of all columns in single calls to `thrust::reduce_by_key()` and `thrust::exclusive_scan_by_key()`. This change results in around 1.3x speedup when reading a complicated schema. Before: ![image](https://user-images.githubusercontent.com/26264495/224823213-ae998654-274c-450a-8ad7-ea854541335e.png) After: ![image](https://user-images.githubusercontent.com/26264495/224823108-cb91c380-5e35-4c77-a6f9-6703e321be05.png) Authors: - Srikar Vanavasam (https://github.com/SrikarVanavasam) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) - Vukasin Milovanovic (https://github.com/vuule) URL: rapidsai#12931
Configuration menu - View commit details
-
Copy full SHA for f328b64 - Browse repository at this point
Copy the full SHA f328b64View commit details -
Reduce shared memory usage in gpuComputePageSizes by 50% (rapidsai#13047
) In a multithreaded, multi-stream environment (Spark) we were experiencing a performance regression on some benchmark queries. The culprit was gpu scheduling issues related to the `gpuComputePageSizes` kernel. Dependent kernels (`gpuDecodePages`) were getting serialized because `gpuComputePageSizes` wasn't running alongside other streams well. The fix was reducing shared memory usage in `gpuComputePageSizes`. The kernel shares a lot of code and data structures with `gpuDecodePages` but doesn't actually use several of the large buffers that are stored in shared memory. This PR refactors those buffers out so that they are only declared in the `gpuDecodePages` kernel, reducing the shared usage by 50% (3kb). This clears up the performance issue on Spark. I am currently experiencing build issues with cudf benchmarks so I'm marking this as do-not-merge until I can verify with them. Authors: - https://github.com/nvdbaranec - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vukasin Milovanovic (https://github.com/vuule) URL: rapidsai#13047
Configuration menu - View commit details
-
Copy full SHA for c4a34eb - Browse repository at this point
Copy the full SHA c4a34ebView commit details -
Add empty test files for test reorganization (rapidsai#12288)
This PR adds empty test modules that match the "Test Organization" guidelines outlined in the [developer guide](https://github.com/rapidsai/cudf/blob/branch-23.02/docs/cudf/source/developer_guide/testing.md#test-organization). Follow-up PRs will move existing tests into these test modules. While I have attempted to match the structure of our API reference as much a possible, there are small differences. For example, the API reference lumps together [Reshaping, Sorting, and Transposing](https://docs.rapids.ai/api/cudf/stable/api_docs/dataframe.html#reshaping-sorting-transposing), while I opted to include two different modules for reshaping and sorting. There are only a couple of instances where I needed to deviate from the structure though. Authors: - Ashwin Srinath (https://github.com/shwina) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#12288
Configuration menu - View commit details
-
Copy full SHA for 5a703d0 - Browse repository at this point
Copy the full SHA 5a703d0View commit details
Commits on Apr 8, 2023
-
Raise
NotImplementedError
when attempting to construct cuDF objects…… from timezone-aware datetimes (rapidsai#13086) Closes rapidsai#13077 Authors: - Ashwin Srinath (https://github.com/shwina) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: rapidsai#13086
Configuration menu - View commit details
-
Copy full SHA for 52e8b5e - Browse repository at this point
Copy the full SHA 52e8b5eView commit details
Commits on Apr 10, 2023
-
Remove deprecated regex functions from libcudf (rapidsai#13067)
Removes the libcudf regex APIs that were deprecated in 23.04. All calls to these functions within the repo had already been removed. Marking this breaking since APIs are being removed. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: rapidsai#13067
Configuration menu - View commit details
-
Copy full SHA for ebe4757 - Browse repository at this point
Copy the full SHA ebe4757View commit details -
Fix unused variable error/warning in page_data.cu (rapidsai#13093)
Fixes a minor compile error/warnings for unused variables in the `cpp/src/io/parquet/page_data.cu` source file. ``` cudf/cpp/src/io/parquet/page_data.cu -o CMakeFiles/cudf.dir/src/io/parquet/page_data.cu.o /cudf/cpp/src/io/parquet/page_data.cu(636): error rapidsai#177-D: parameter "s" was declared but never referenced /cudf/cpp/src/io/parquet/page_data.cu(343): error rapidsai#177-D: parameter "sb" was declared but never referenced detected during instantiation of "cuda::std::__4::pair<int, int> cudf::io::parquet::gpu::<unnamed>::gpuDecodeDictionaryIndices<sizes_only>(volatile cudf::io::parquet::gpu::<unnamed>::page_state_s *, volatile cudf::io::parquet::gpu::<unnamed>::page_state_buffers_s *, int, int) [with sizes_only=true]" (1720): here /cudf/cpp/src/io/parquet/page_data.cu(527): error rapidsai#177-D: parameter "sb" was declared but never referenced detected during instantiation of "cudf::size_type cudf::io::parquet::gpu::<unnamed>::gpuInitStringDescriptors<sizes_only>(volatile cudf::io::parquet::gpu::<unnamed>::page_state_s *, volatile cudf::io::parquet::gpu::<unnamed>::page_state_buffers_s *, int, int) [with sizes_only=true]" (1724): here 3 errors detected in the compilation of "/cudf/cpp/src/io/parquet/page_data.cu". ``` Found these with a Debug build using nvcc 11.5 and gcc 9.5. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Divye Gala (https://github.com/divyegala) - Vukasin Milovanovic (https://github.com/vuule) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#13093
Configuration menu - View commit details
-
Copy full SHA for cf26353 - Browse repository at this point
Copy the full SHA cf26353View commit details -
Fix missing confluent kafka version (rapidsai#13101)
This PR fixes missing `python-confluent-kafka` version changes in `custreamz` and removes `python-confluent-kafka` from `cudf_kafka` because I don't see any usage of `confluent_kafka` in the python code. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) - Bradley Dice (https://github.com/bdice) Approvers: - Ray Douglass (https://github.com/raydouglass) - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: rapidsai#13101
Configuration menu - View commit details
-
Copy full SHA for 5e41c1f - Browse repository at this point
Copy the full SHA 5e41c1fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 14214db - Browse repository at this point
Copy the full SHA 14214dbView commit details -
Remove uses-setup-env-vars (rapidsai#13105)
This setting now matches the default behavior of the shared-action-workflows repo Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#13105
Configuration menu - View commit details
-
Copy full SHA for 8c9a8c4 - Browse repository at this point
Copy the full SHA 8c9a8c4View commit details -
Support structs of lists in row lexicographic comparator (rapidsai#13005
) This fixes the lexicographic comparator that cannot handle the input having structs of lists. The new implementation mainly changes the helper functions `decompose_structs`. In particular: * If a structs column has its first child is a lists column, the first column of the result table will no longer be `Struct<Struct<...<List<SomeType>...>` (i.e., nested structs ultimately having one child). * Instead, the first output column will be nested empty structs: `Struct<...Struct<>>...>`. The innermost child column `List<SomeType>` is output as the second column in the result table. Depends on: * rapidsai#12995 Closes rapidsai#11672. Authors: - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Divye Gala (https://github.com/divyegala) - Bradley Dice (https://github.com/bdice) URL: rapidsai#13005
Configuration menu - View commit details
-
Copy full SHA for f357892 - Browse repository at this point
Copy the full SHA f357892View commit details -
Optimize set-like operations (rapidsai#12769)
Set-like operations such as `intersect_distinct` and `difference_distinct` call `purge_nonempty_nulls` when the input is nullable. This PR optimizes these set APIs by checking the existence of non-empty nulls (using `has_nonempty_nulls`) before calling to `purge_nonempty_nulls`. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Yunsong Wang (https://github.com/PointKernel) URL: rapidsai#12769
Configuration menu - View commit details
-
Copy full SHA for 30411b5 - Browse repository at this point
Copy the full SHA 30411b5View commit details -
Replace unnecessary uses of
UNKNOWN_NULL_COUNT
(rapidsai#13102)This PR replaces uses of `cudf::UNKNOWN_NULL_COUNT` where the null count is either already known or trivially computed. Contributes to rapidsai#11968 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#13102
Configuration menu - View commit details
-
Copy full SHA for cab6522 - Browse repository at this point
Copy the full SHA cab6522View commit details
Commits on Apr 11, 2023
-
Adding ifdefs around nvcc-specific pragmas (rapidsai#13110)
This change wraps the NVCC-specific `#pragma` macros inside an `ifdef` to prevent compilation warnings as described in issue rapidsai#13106 closes rapidsai#13106 Authors: - Mike Wilson (https://github.com/hyperbolic2346) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#13110
Configuration menu - View commit details
-
Copy full SHA for e9e86f4 - Browse repository at this point
Copy the full SHA e9e86f4View commit details -
Fixes sliced list and struct column bug in JSON chunked writer (rapid…
…sai#13108) Fixes the OOM access error while using chunked JSON writer on list columns. The issue is present in struct columns also, which is fixed in this change. Fixes rapidsai#13030 Authors: - Karthikeyan (https://github.com/karthikeyann) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) URL: rapidsai#13108
Configuration menu - View commit details
-
Copy full SHA for 5638d44 - Browse repository at this point
Copy the full SHA 5638d44View commit details -
Remove using namespace cudf; from libcudf gtests source (rapidsai#13089)
Removes `using namespace cudf;` from gtests source code to make it easier to read -- find where utilities and function calls are implemented. Also removed a few `using namespace cudf::test;` usages which by extension includes namespace `cudf`. Found these while working on rapidsai#13081 Reference rapidsai#11734 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) URL: rapidsai#13089
Configuration menu - View commit details
-
Copy full SHA for 73c8d16 - Browse repository at this point
Copy the full SHA 73c8d16View commit details -
Fix GPU_ARCHS setting in Java CMake build and CMAKE_CUDA_ARCHITECTURE…
…S in Python package build. (rapidsai#13117) Changes the `GPU_ARCHS` setting in `pom.xml` from `ALL` to `RAPIDS` per recent change in rapidsai/rapids-cmake/pull/397 The Python package build requires CUDA as of the addition of string UDFs, which added compilation of ptx code to the Python build. Therefore CMAKE_CUDA_ARCHITECTURES must be set appropriately even when only the Python package is being built. rapidsai/rapids-cmake#397 requires a nonempty string value to be used if the variable is set at all. This PR updates build.sh to include the appropriate default when only the Python build is requested. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Bradley Dice (https://github.com/bdice) URL: rapidsai#13117
Configuration menu - View commit details
-
Copy full SHA for 50718e6 - Browse repository at this point
Copy the full SHA 50718e6View commit details
Commits on Apr 12, 2023
-
Cleanup ORC chunked writer (rapidsai#13091)
This changes the internal variables of ORC chunked writer: * Renaming them to have a `_` prefix consistently. * Add `const` qualifier to some variables that are writer parameters. * Regroup them. There is not any new implementation added. However, the unused parameter `mr` is removed from its interface thus this is flagged as `breaking` changes. Closes: * rapidsai#12973 Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Karthikeyan (https://github.com/karthikeyann) - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#13091
Configuration menu - View commit details
-
Copy full SHA for ecadda5 - Browse repository at this point
Copy the full SHA ecadda5View commit details -
Use make_empty_lists_column instead of make_empty_column(type_id::LIS…
…T) (rapidsai#13099) Fixes bug where `cudf::make_empty_column(type_id::LIST)` is called and adds a gtests to check for this error. The `make_empty_column` cannot accept a nested type because it requires a child type. The internal `make_empty_lists_column` is moved to the `lists_column_factories.hpp` header which is itself moved to the `cpp/include/cudf/lists/detail` directory since it only contains detail functions. Closes rapidsai#13096 Authors: - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) Approvers: - Nghia Truong (https://github.com/ttnghia) - AJ Schmidt (https://github.com/ajschmidt8) - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#13099
Configuration menu - View commit details
-
Copy full SHA for 9a9f718 - Browse repository at this point
Copy the full SHA 9a9f718View commit details -
Refactor
cudf::detail::sorted_order
(rapidsai#13062)This PR does some cleanup for the `src/sort/sort_impl.cuh` file and the related headers/source files: * Moving some `include<header>` from there to the directly used source files. * Adding `constexpr` for the `if/else` statements. * Adding missing doxygen tag. * Removing code duplicate by extracting the common code into a lambda. There is not any new implementation added in this PR. Authors: - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - David Wendt (https://github.com/davidwendt) URL: rapidsai#13062
Configuration menu - View commit details
-
Copy full SHA for 1d77984 - Browse repository at this point
Copy the full SHA 1d77984View commit details -
Cleanup Parquet chunked writer (rapidsai#13094)
Similar to rapidsai#13091, this changes the internal variables of Parquet chunked writer: * Renaming them to have a `_` prefix consistently. * Add `const` qualifier to some variables that are writer parameters. * Regroup them. There is not any new implementation added. However, the unused parameter `mr` is removed from its interface thus this is flagged as breaking changes. Closes: * rapidsai#13079 Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#13094
Configuration menu - View commit details
-
Copy full SHA for 2bf0b44 - Browse repository at this point
Copy the full SHA 2bf0b44View commit details -
Pin curand version (rapidsai#13127)
Merging the conda-forge curand recipe and building conda-forge packages has caused conda to choose a newer version of curand than what cudf currently supports (we cannot use the version from CUDA 12). Closes rapidsai#13126 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Ray Douglass (https://github.com/raydouglass) - Robert Maynard (https://github.com/robertmaynard) - Bradley Dice (https://github.com/bdice)
Configuration menu - View commit details
-
Copy full SHA for ed9385b - Browse repository at this point
Copy the full SHA ed9385bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2d311d7 - Browse repository at this point
Copy the full SHA 2d311d7View commit details -
Merge pull request rapidsai#13131 from vyasr/branch-23.06-merge-23.04
Branch 23.06 merge 23.04
Configuration menu - View commit details
-
Copy full SHA for cae6132 - Browse repository at this point
Copy the full SHA cae6132View commit details
Commits on Apr 13, 2023
-
Adds checks to make sure json reader won't overflow (rapidsai#13115)
The JSON reader is currently using 32-bit offsets to index into the input's characters to lower memory footprint and for performance reasons. Hence, currently, if an input larger than `UINT_MAX` is read, the parser may return incorrect data. This PR adds a check that fails for inputs that could overflow. The longer term plan is to make the finite-state transducer stage reentrant and split up inputs larger than `UINT_MAX` into smaller chunks. Authors: - Elias Stehle (https://github.com/elstehle) - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#13115
Configuration menu - View commit details
-
Copy full SHA for 3069f1e - Browse repository at this point
Copy the full SHA 3069f1eView commit details -
Fix hash join when the input tables have nulls on only one side (rapi…
…dsai#13120) This is very similar to rapidsai#11284, which fixes a bug when only one input table has nulls while the other doesn't. This is due to the new experimental hasher producing different hash values depending on an input flag `has_nulls`. In order to properly use it, `has_nulls` must be computed by checking all the possible input tables, or set to a constant value (`true`). Closes: * rapidsai#13109 Authors: - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Divye Gala (https://github.com/divyegala) - Yunsong Wang (https://github.com/PointKernel) URL: rapidsai#13120
Configuration menu - View commit details
-
Copy full SHA for d415ffe - Browse repository at this point
Copy the full SHA d415ffeView commit details -
Prevent overflow with
skip_rows
in ORC and Parquet readers (rapidsa……i#13063) Use int64_t for `skip_rows` since source or combined sources can have more than two billion rows, and we should be able to read a range of rows even in that case. Store `num_rows` as `std::optional`, instead of using special value (`-1`). Reuse code with error-prone logic between ORC and Parquet. Added unit tests for the tricky code above. Converted inout `select_stripes` parameters to input params + return values. Authors: - Vukasin Milovanovic (https://github.com/vuule) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Robert Maynard (https://github.com/robertmaynard) - https://github.com/brandon-b-miller - Yunsong Wang (https://github.com/PointKernel) - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#13063
Configuration menu - View commit details
-
Copy full SHA for f77403e - Browse repository at this point
Copy the full SHA f77403eView commit details -
Purge nonempty nulls from byte_cast list outputs. (rapidsai#11971)
Resolves rapidsai#11754. The `byte_cast` function is creating unsanitized lists from null inputs, which is a bug. [This logic](https://github.com/rapidsai/cudf/blob/9c06330363db4da99803a3728b8bf44f9829f0b9/cpp/src/reshape/byte_cast.cu#L66-L81) copies nonzero bytes even if the input element is null. The input's null mask is copied onto the output parent list column, but the null children are nonempty. This PR fixes the bug by calling `cudf::purge_nonempty_nulls` on the result before returning, if there are any nulls to be purged. Depends on: * rapidsai#13099 Authors: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - David Wendt (https://github.com/davidwendt) - Vyas Ramasubramani (https://github.com/vyasr) - Mike Wilson (https://github.com/hyperbolic2346) URL: rapidsai#11971
Configuration menu - View commit details
-
Copy full SHA for 4b34831 - Browse repository at this point
Copy the full SHA 4b34831View commit details -
Fix
null_count
of columns returned bychunked_parquet_reader
(rap……idsai#13111) Chunked Parquet reader returns columns with incorrect null counts - the counts are cumulative sums that include all previous chunks. Root cause is that `nesting_decode_cache` is not copied back to `nesting_decode` when `gpuDecodePageData` returns early, so previously computed null counts are only reset in the cache. With this PR, we use RAII to make sure cached decode info is always copied back in `gpuDecodePageData`. Also fixed `column_buffer::empty_like` to return zero null count and empty null mask. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - https://github.com/nvdbaranec - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#13111
Configuration menu - View commit details
-
Copy full SHA for 5764ba5 - Browse repository at this point
Copy the full SHA 5764ba5View commit details -
Remove more instances of
UNKNOWN_NULL_COUNT
(rapidsai#13134)Contributes to rapidsai#11968. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Jason Lowe (https://github.com/jlowe) - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) - Bradley Dice (https://github.com/bdice) URL: rapidsai#13134
Configuration menu - View commit details
-
Copy full SHA for 6ae591f - Browse repository at this point
Copy the full SHA 6ae591fView commit details -
Explicitly compute null count in concatenate APIs (rapidsai#13104)
The total number of nulls in the output can be computed by summing the nulls in the input columns. Contributes to rapidsai#11968 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) URL: rapidsai#13104
Configuration menu - View commit details
-
Copy full SHA for 4f0c46e - Browse repository at this point
Copy the full SHA 4f0c46eView commit details
Commits on Apr 14, 2023
-
Fix
Series
andDataFrame
constructors to validate index lengths (r……apidsai#13122) Fixes: rapidsai#12999, rapidsai#13056 This PR fixes the `Series` and `DataFrame` constructors to validate the `data` & `index` lengths. This also contains fixes where `index` was being ignored in certain cases. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#13122
Configuration menu - View commit details
-
Copy full SHA for 2d70331 - Browse repository at this point
Copy the full SHA 2d70331View commit details -
Use CTAD instead of functions in ProtobufReader (rapidsai#13135)
Replaced `std::make_tuple` with `std::tuple` constructor Removed `std::make_field_reader`, calling `field_reader` constructor directly now. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#13135
Configuration menu - View commit details
-
Copy full SHA for a562a7e - Browse repository at this point
Copy the full SHA a562a7eView commit details -
Compute null-count in cudf::detail::slice (rapidsai#13124)
Calculates the null-count in the `cudf::detail::slice()` function. This requires adding a stream parameter to the function and updating the callers to pass the stream. Also moved the function definition to the `slice.cu` file since there are only two possible values for the template parameter. Labeling this with non-breaking since it is a detail function. Contributes to: rapidsai#11968 Authors: - David Wendt (https://github.com/davidwendt) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Nghia Truong (https://github.com/ttnghia) - Divye Gala (https://github.com/divyegala) - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#13124
Configuration menu - View commit details
-
Copy full SHA for 891698d - Browse repository at this point
Copy the full SHA 891698dView commit details -
Set null-count in linked_column_view conversion operator (rapidsai#13121
) Removes the `UNKNOWN_NULL_COUNT` usage in the `linked_column_view::column_view()` conversion operator. The null-count is copied from the parent instance. The `linked_column_view` class was reworked to move the C++ function definitions from the header file to a new .cpp file. Contributes to: rapidsai#11968 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#13121
Configuration menu - View commit details
-
Copy full SHA for e6cb2d0 - Browse repository at this point
Copy the full SHA e6cb2d0View commit details -
Use
.element()
instead of.data()
for window range calculations (r……apidsai#13095) In the staging step for executing window range queries, the boundaries of each row's window are calculated. This involves subtracting/adding the `preceding`/`following` values from each order-by column row, and then searching backwards/forwards for the boundary values. The staging step has been using `column_device_view.data()` for accessing the order-by rows, an acceptable approach for when the order-by columns are numeric (e.g. `INT32`). This approach fails when the order-by column is a `STRING`, because `.data()` is not defined for such columns. A better approach would be to use `.element()` to directly access the rows, because it has special handling for `STRING`, among other types, while continuing to work for numeric primitives. ## Future In a followup to this change, support for `STRING` order-by columns will be added. Authors: - MithunR (https://github.com/mythrocks) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#13095
Configuration menu - View commit details
-
Copy full SHA for 5c93b44 - Browse repository at this point
Copy the full SHA 5c93b44View commit details -
Change cudf::test::make_null_mask to also return null-count (rapidsai…
…#13081) Change the `cudf::test::make_null_mask` to return both the null-mask and the null-count. Callers can then use this null-count instead of `UNKNOWN_NULL_COUNT`. These changes include removing `UNKNOWN_NULL_COUNT` usage from the libcudf C++ test source code. One side-effect found that strings column with all nulls can technically have no children but using `UNKNOWN_NULL_COUNT` allowed the check for this to be bypassed. Therefore many utilities started to fail when `UNKNOWN_NULL_COUNT` was removed. The factory was modified to remove the check which results in an offsets column and an empty chars column as children. More code will likely need to be change when the `UNKNOWN_NULL_COUNT` is no longer used as a default parameter for factories and other column functions. No behavior is changed. Since the `cudf::test::make_null_mask` is technically a public API, this PR could be marked as a breaking change as well. Contributes to: rapidsai#11968 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - MithunR (https://github.com/mythrocks) - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#13081
Configuration menu - View commit details
-
Copy full SHA for 4481142 - Browse repository at this point
Copy the full SHA 4481142View commit details -
Deprecate
pad
andbackfill
methods (rapidsai#13140)This PR deprecates `pad` and `backfill` methods in favor of `ffill` and `bfill` methods. Pandas recently deprecated these: pandas-dev/pandas#51221 pandas-dev/pandas#45076 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#13140
Configuration menu - View commit details
-
Copy full SHA for daf3ac0 - Browse repository at this point
Copy the full SHA daf3ac0View commit details
Commits on Apr 15, 2023
-
Enable binary operations between scalars and columns of differing dec…
…imal types (rapidsai#13034) Closes rapidsai#12958 This PR enables some previously xfailing tests. Authors: - Ashwin Srinath (https://github.com/shwina) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#13034
Configuration menu - View commit details
-
Copy full SHA for 0b59fda - Browse repository at this point
Copy the full SHA 0b59fdaView commit details -
Update clang-format to 16.0.1. (rapidsai#13133)
This PR updates the clang-format version used by pre-commit. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Nghia Truong (https://github.com/ttnghia) - Elias Stehle (https://github.com/elstehle) URL: rapidsai#13133
Configuration menu - View commit details
-
Copy full SHA for 580ee40 - Browse repository at this point
Copy the full SHA 580ee40View commit details
Commits on Apr 17, 2023
-
Fix a few clang-format style check errors (rapidsai#13146)
Fixes some build errors occuring after rapidsai#13133 was merged. Looks like a couple files may have gotten mismerged perhaps. This should unblock several current PRs. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Divye Gala (https://github.com/divyegala) URL: rapidsai#13146
Configuration menu - View commit details
-
Copy full SHA for a6fb6a2 - Browse repository at this point
Copy the full SHA a6fb6a2View commit details -
Add null-count parameter to json experimental parse_data utility (rap…
…idsai#13107) Add `null_count` parameter to the `cudf::io::json::experimental::detail::parse_data` function which already accepts a `null_mask`. Normally, the callers already know the count. This unction can use the parameter to help build the output column. Found while working on rapidsai#13081 Contributes to: rapidsai#11968 Authors: - David Wendt (https://github.com/davidwendt) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#13107
Configuration menu - View commit details
-
Copy full SHA for 7c3a34e - Browse repository at this point
Copy the full SHA 7c3a34eView commit details -
Add Python bindings for time zone data (TZiF) reader (rapidsai#12826)
This PR adds bindings to the TZiF reader that was added in the libcudf API in rapidsai#12805. No tests are being added as these bindings are just for internal-use. In follow-up PRs, I will add a timezone-aware datetime type and timezone-aware operations to the public API, along with tests for those operations. The bindings can be used as follows: ```python >>> transition_times, offsets = make_timezone_transition_table("/usr/share/zoneinfo", "America/New_York") >>> transition_times <cudf.core.column.datetime.DatetimeColumn object at 0x7f95cd6ac840> [ 1883-11-18 17:00:00, 1883-11-18 17:00:00, 1918-03-31 07:00:00, 1918-10-27 06:00:00, 1919-03-30 07:00:00, 1919-10-26 06:00:00, 1920-03-28 07:00:00, 1920-10-31 06:00:00, 1921-04-24 07:00:00, 1921-09-25 06:00:00, ... 2365-03-14 07:00:00, 2365-11-07 06:00:00, 2366-03-13 07:00:00, 2366-11-06 06:00:00, 2367-03-12 07:00:00, 2367-11-05 06:00:00, 2368-03-10 07:00:00, 2368-11-03 06:00:00, 2369-03-09 07:00:00, 2369-11-02 06:00:00 ] dtype: datetime64[s] >>> offsets <cudf.core.column.timedelta.TimeDeltaColumn object at 0x7f94e69bad40> [ -18000, -18000, -14400, -18000, -14400, -18000, -14400, -18000, -14400, -18000, ... -14400, -18000, -14400, -18000, -14400, -18000, -14400, -18000, -14400, -18000 ] dtype: timedelta64[s] ``` Authors: - Ashwin Srinath (https://github.com/shwina) - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#12826
Configuration menu - View commit details
-
Copy full SHA for b05d5e7 - Browse repository at this point
Copy the full SHA b05d5e7View commit details -
Use ARC V2 self-hosted runners for GPU jobs (rapidsai#13123)
This PR is updating the runner labels to use ARC V2 self-hosted runners for GPU jobs. This is needed to resolve the auto-scalling issues. Authors: - Jordan Jacobelli (https://github.com/jjacobelli) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#13123
Configuration menu - View commit details
-
Copy full SHA for b8ab63d - Browse repository at this point
Copy the full SHA b8ab63dView commit details -
Fix read_avro() skip_rows and num_rows. (rapidsai#12912)
This PR fixes the avro reader (`cudf.read_avro()`) such that it honors the values passed to the `skip_rows` and `num_rows` parameters. In implementing this new logic, we also revamp the reader's ability to handle multi-block avro files, which we also test extensively with a new `test_avro_reader_multiblock()` test that features some 1300 permutations of various block size combinations. Closes rapidsai#6529. Authors: - Trent Nelson (https://github.com/tpn) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#12912
Configuration menu - View commit details
-
Copy full SHA for 62e02c6 - Browse repository at this point
Copy the full SHA 62e02c6View commit details
Commits on Apr 18, 2023
-
Configuration menu - View commit details
-
Copy full SHA for e5ea7df - Browse repository at this point
Copy the full SHA e5ea7dfView commit details -
Improve performance of slice_strings for long strings (rapidsai#13057)
Improves on performance for longer strings with `cudf::strings::slice_strings()` API. The `cudf::string_view::substr` was reworked to minimize counting characters and the gather version of `make_strings_children` is used to build the resulting strings column. This version is already optimized for small and large strings. Additionally, the code was refactored so the common case of `step==1 and start < stop` can also make use of the gather approach. Common code was also grouped closer together to help navigate the source file better. The `slice.cpp` benchmark was updated to better measure large strings with comparable slice boundaries. The benchmark showed performance improvement was up to 9x for larger strings with no significant degradation for smaller strings. Reference rapidsai#13048 and rapidsai#12445 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Elias Stehle (https://github.com/elstehle) URL: rapidsai#13057
Configuration menu - View commit details
-
Copy full SHA for feea040 - Browse repository at this point
Copy the full SHA feea040View commit details -
Allow compilation with any GTest version 1.11+ (rapidsai#13153)
GTest max support for `Types` was removed in 1.11, so we remove the workarounds in cudf_gtest. Since we need to support our custom `Types` and the GTest 1.11+ version rework the type_list_utilities to be generic and not depend on specific traits. Also corrected the `<<` overloads for GTest printing so that they work with GTest 1.11. Authors: - Robert Maynard (https://github.com/robertmaynard) - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#13153
Configuration menu - View commit details
-
Copy full SHA for 1750bff - Browse repository at this point
Copy the full SHA 1750bffView commit details -
Configuration menu - View commit details
-
Copy full SHA for a4febb6 - Browse repository at this point
Copy the full SHA a4febb6View commit details -
Merge branch 'fea-string_scalar_ast_compare' of github.com:karthikeya…
…nn/cudf into fea-string_scalar_ast_compare
Configuration menu - View commit details
-
Copy full SHA for 972b9fa - Browse repository at this point
Copy the full SHA 972b9faView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8354982 - Browse repository at this point
Copy the full SHA 8354982View commit details