Memory Profiling #15866

madsbk · 2024-05-27T15:13:03Z

Use RMM's new memory profiler to profile all functions already decorated with _cudf_nvtx_annotate.

Example

import cudf
from cudf.utils.performance_tracking import print_memory_report

cudf.set_option("memory_profiling", True)

df1 = cudf.DataFrame({"a": [1, 2, 3]})
df2 = cudf.DataFrame({"a": [2, 2, 3]})
df3 = df1.merge(df2)

print_memory_report()

Output:

Memory Profiling
================

Ordered by: memory_peak

ncalls     memory_peak    memory_total  filename:lineno(function)
     1             272             688  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:4072(DataFrame.merge)
     2              32              64  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:1043(DataFrame._init_from_dict_like)
     2              32              64  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:690(DataFrame.__init__)
     2               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:1131(DataFrame._align_input_series_indices)
     7               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:214(RangeIndex.__init__)
     6               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:424(RangeIndex.__len__)
     4               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/frame.py:271(Frame.__len__)
     2               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:3195(DataFrame._insert)
     2               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:270(RangeIndex.name)
     2               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:369(RangeIndex.copy)
     5               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/frame.py:134(Frame._from_data)
     2               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/frame.py:1039(Frame._copy_type_metadata)
     2               0               0  /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/indexed_frame.py:315(IndexedFrame._from_columns_like_self)

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
Rename _cudf_nvtx_annotate to _performance_tracking

See <rapidsai/rmm#1563>

madsbk · 2024-06-07T12:01:47Z

python/cudf/cudf/utils/nvtx_annotation.py

@@ -17,12 +19,13 @@ def _get_color_for_nvtx(name):


 def _cudf_nvtx_annotate(func, domain="cudf_python"):


Should we rename _cudf_nvtx_annotate to something like _performance_tracking ?
And nvtx_annotation.py => performance_tracking.py

I think that would be a more generically clear name. +1

I will do the rename when the PR is ready to be merged. It results in 591 function renames, which might be hard to keep merge-conflict free :)

harrism · 2024-06-10T23:08:25Z

I requested this in rapidsai/rmm#1563, but it seems not to have made it in. The current output shown above provides no way of knowing what the units are in the peak and total memory report. Suggest changing the headings to make this explicit, otherwise usability is reduced.

ncalls     memory_peak (MiB)    memory_total (MiB)  filename:lineno(function)
...

madsbk · 2024-06-11T06:24:13Z

@harrism, I added a Legends section, which include units

Memory Profiling
================

Legends:
  ncalls       - number of times the function or code block was called
  memory_peak  - peak memory allocated in function or code block (in bytes)
  memory_total - total memory allocated in function or code block (in bytes)

…tatistics

bdice · 2024-06-11T17:42:01Z

@madsbk Is there any measurable overhead to using the statistic profiler? The NVTX overhead is very small, which is why we have enabled it across cuDF. We may want to run benchmarks to verify this before merging. If it has a cost, perhaps use an environment variable such as CUDF_PROFILE_MEMORY_STATISTICS to control the behavior.

madsbk · 2024-06-12T08:28:23Z

Good point, it has a price of 1.7 microseconds.
The overhead goes from 0.18us to 1.9us. The extra overhead is dominated by two calls to rmm.mr.get_current_device_resource(), which takes 1.5us.

I think this is small enough to ignore?

bdice · 2024-06-13T02:13:43Z

The overhead goes from 0.18us to 1.9us.

@vyasr @galipremsagar Do you think this is small enough to enable memory statistics tracking on every NVTX-annotated function in cudf Python? I am pretty hesitant to enable this by default but don't want to be the only person weighing in.

harrism · 2024-06-13T10:12:02Z

Given the overhead is constant, when memory size is high, this will be a drop in the bucket. When it's low, it may be significant on some functions.

Can we just make it really easy to enable, rather than enabled by default?

My strong opinion, weakly held: nonzero cost profiling should not be enabled by default.

…tatistics

madsbk · 2024-06-14T13:16:02Z

Alright, I have added a memory_profiling option :)

It also makes it a bit more user-friendly. To enable memory profiling do:

import cudf
from cudf.utils.performance_tracking import print_memory_report

cudf.set_option("memory_profiling", True)

# my code

print_memory_report()

NB: before merging this PR, I will remove nvtx_annotation.py and use @_performance_tracking throughout.

…tatistics

Looks walk_stack() use an incorrect current frame

wence-

I think we want to figure out a way to be able to disable the statistics MR, which is currently not possible.

python/cudf/cudf/utils/performance_tracking.py

python/cudf/cudf/utils/utils.py

wence- · 2024-06-24T08:45:48Z

python/cudf/cudf/utils/performance_tracking.py

+            if get_option("memory_profiling"):
+                rmm.statistics.enable_statistics()


issue: This doesn't behave like a normal context manager (leave the state unchanged after exit) because it doesn't unwind the change to the current memory resource that rmm.statistics.enable_statistics enacts. This means that the following code does an "unexpected" thing: side-effectfully changes the MR:

# statistics not enabled... with cudf.options_context(("memory_profiling", True)): # code that profiles statistics # expect stats memory resource to be popped, but it is not

It seems like the call to enable statistics should be paired with a matching disable statistics call. But this seems bad if it were done at every level of the call stack.

It feels like the right place for modification of the MR is in the set_option call. set_option("memory_profiling", True) would store the old MR, push the statistics MR, and then a later set_option("memory_profiling", False") would pop the statistics MR if it still matches.

This, however, would preclude use of the environment variable default (because that is set on import, and we don't want to have to call RMM on import).

Does this suggest that we need a specific enable_memory_profiling that can be used as a context manager that sets the option and sets up the MR?

If this is an issue, RMM already have rmm.statistics.statistics() and rmm.statistics.enable_statistics().
Thus the user code would look something like:

import cudf import rmm.statistics # The must first enable the cudf option cudf.set_option("memory_profiling", True) # Then enable statistics within a context with rmm.statistics.statistics(): # by code # Or globally rmm.statistics.enable_statistics()

Of cause, we could wrap/alias the rmm functions in cudf native functions. However, I am not sure if the disadvantage of the "unexpected" side-effect outweigh the user-friendliness of just setting CUDF_MEMORY_PROFILING="1"?

Alternatively, we could implement lazy environment variable in options.py. It might be worth streamlining the option module in any case.

In terms of solutions, it depends a little bit on how end-user-friendly we want this to be. If this is mostly for developer use, I am happy with the programmatic set_option + use RMM calls. If we want this as a debugging tool for end users "where is my code allocating lots of vram", then we likely need something a little more ergonomic.

As much as I am allergic to environment variables from a reproducibility point of view, I can see the attraction.

I am not sure but I guess we could start with the set_option + use RMM calls and then later consider a more end-user-friendly solution?

I think that's a reasonable compromise.

Co-authored-by: Lawrence Mitchell <wence@gmx.li>

harrism

Doc comments.

docs/cudf/source/user_guide/memory-profiling.md

Co-authored-by: Mark Harris <783069+harrism@users.noreply.github.com>

…y_statistics

docs/cudf/source/user_guide/memory-profiling.md

Co-authored-by: Mark Harris <783069+harrism@users.noreply.github.com>

wence-

Looks good, tiny typographic suggestions!

docs/cudf/source/user_guide/memory-profiling.md

vyasr

One question, otherwise LGTM.

python/cudf/cudf/utils/nvtx_annotation.py

Co-authored-by: Lawrence Mitchell <wence@gmx.li>

…tatistics

wence-

Approving dask-cudf changes

madsbk · 2024-06-28T10:31:03Z

Thanks all

madsbk · 2024-06-28T10:31:13Z

/merge

commit 60287e1 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Mon Jul 1 17:56:34 2024 +0000 address more comments commit 25c25d4 Merge: 7806ce4 51fb873 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Mon Jul 1 17:31:44 2024 +0000 Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers commit 51fb873 Merge: 599ce95 e932fbd Author: gpuCI <38199262+GPUtester@users.noreply.github.com> Date: Mon Jul 1 12:17:38 2024 -0400 Merge pull request rapidsai#16145 from rapidsai/branch-24.06 Forward-merge branch-24.06 into branch-24.08 commit e932fbd Author: Vyas Ramasubramani <vyasr@nvidia.com> Date: Mon Jul 1 09:17:32 2024 -0700 Add patch for incorrect cuco noexcept clauses (rapidsai#16077) [cuco previously marked a number of methods as noexcept that can in fact throw exceptions](NVIDIA/cuCollections#510). This causes problems for cudf functions that call these methods. The issue [was fixed in cuco upstream](NVIDIA/cuCollections#511), but we cannot easily update to the latest commit of cuco, especially in a patch fix for 24.06. This PR instead adds a rapids-cmake patch for the cuco clone to address this issue. The patch may be removed once we update to a commit of cuco that contains the necessary fix. Resolves rapidsai#16059 commit 599ce95 Author: Lawrence Mitchell <lmitchell@nvidia.com> Date: Mon Jul 1 09:35:35 2024 +0100 Implement handlers for series literal in cudf-polars (rapidsai#16113) A query plan can contain a "literal" polars Series. Often, for example, when calling a contains-like function. To translate these, introduce a new `LiteralColumn` node to capture the concept and add an evaluation rule (converting from arrow). Since list-dtype Series need the same casting treatment as in dataframe scan case, factor the casting out into a utility, and take the opportunity to handled casting of nested lists correctly. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Thomas Li (https://github.com/lithomas1) - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#16113 commit 7806ce4 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Sat Jun 29 00:47:53 2024 +0000 simplify again commit e57a677 Merge: e940e30 3c3edfe Author: Thomas Li <thomasli1234567890@gmail.com> Date: Sat Jun 29 00:26:03 2024 +0000 Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers commit 3c3edfe Author: Yunsong Wang <yunsongw@nvidia.com> Date: Fri Jun 28 13:58:22 2024 -0700 Update implementations to build with the latest cuco (rapidsai#15938) This PR updates existing libcudf to accommodate a cuco breaking change introduced in NVIDIA/cuCollections#479. It helps avoid breaking cudf when bumping the cuco version in `rapids-cmake`. Redundant equal/hash overloads will be removed once the version bump is done on the `rapids-cmake` end. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#15938 commit df88cf5 Author: Bradley Dice <bdice@bradleydice.com> Date: Fri Jun 28 15:40:52 2024 -0500 Use size_t to allow large conditional joins (rapidsai#16127) The conditional join kernels were using `cudf::size_type` where `std::size_t` was needed. This PR fixes that bug, which caused `cudaErrorIllegalAddress` as shown in rapidsai#16115. This closes rapidsai#16115. I did not add tests because we typically do not test very large workloads. However, I committed the test and reverted it in this PR, so there is a record of my validation code. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - https://github.com/nvdbaranec - Yunsong Wang (https://github.com/PointKernel) URL: rapidsai#16127 commit fb12d98 Author: Robert Maynard <rmaynard@nvidia.com> Date: Fri Jun 28 12:14:58 2024 -0400 Installed cudf header use cudf::allocate_like (rapidsai#16087) Remove usage of non public cudf::allocate_like from implementations in headers we install Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#16087 commit 78f4a8a Author: Robert Maynard <rmaynard@nvidia.com> Date: Fri Jun 28 11:26:27 2024 -0400 Move common string utilities to public api (rapidsai#16070) As part of rapidsai#15982 a subset of the strings utility functions have been identified as being worth expsosing as part of the cudf public API. The `create_string_vector_from_column`, `get_offset64_threshold`, and `is_large_strings_enabled` are now made part of the public `cudf::strings` api. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - MithunR (https://github.com/mythrocks) - David Wendt (https://github.com/davidwendt) - Jayjeet Chakraborty (https://github.com/JayjeetAtGithub) - Lawrence Mitchell (https://github.com/wence-) URL: rapidsai#16070 commit a4b951a Author: nvdbaranec <56695930+nvdbaranec@users.noreply.github.com> Date: Fri Jun 28 10:20:42 2024 -0500 Templatization of fixed-width parquet decoding kernels. (rapidsai#15911) This PR merges all of the fixed-width parquet decoding kernels into a single templatized kernel that can be selectively instantiated with desired features (dictionary/no-dictionary, nested/non-nested, etc). It also adds support for (non-list) nested columns in this path. So structs do not have to use the much slower general decode kernel any more. A new benchmark was added specific to structs containing only fixed width columns. I added this because the performance improvement is fairly high (+20%) but we don't see it in the normal struct benchmarks because they include (and are dominated by) string decode times. The new benchmark shows: Before this PR: ``` | data_type | io_type | cardinality | run_length | bytes_per_second | peak_memory_usage | encoded_file_size | |-----------|---------------|-------------|------------|------------------|-------------------|-------------------| | STRUCT | DEVICE_BUFFER | 0 | 1 | 21071216823 | 1.047 GiB | 511.675 MiB | | STRUCT | DEVICE_BUFFER | 1000 | 1 | 18974392387 | 821.312 MiB | 128.884 MiB | | STRUCT | DEVICE_BUFFER | 0 | 32 | 20429356824 | 621.787 MiB | 28.141 MiB | | STRUCT | DEVICE_BUFFER | 1000 | 32 | 20572327813 | 598.421 MiB | 16.475 MiB | ``` After this PR: ``` | data_type | io_type | cardinality | run_length | bytes_per_second | peak_memory_usage | encoded_file_size | |-----------|---------------|-------------|------------|------------------|-------------------|-------------------| | STRUCT | DEVICE_BUFFER | 0 | 1 | 25805996399 | 1.047 GiB | 511.675 MiB | | STRUCT | DEVICE_BUFFER | 1000 | 1 | 22422306660 | 821.312 MiB | 128.884 MiB | | STRUCT | DEVICE_BUFFER | 0 | 32 | 24460694014 | 621.787 MiB | 28.141 MiB | | STRUCT | DEVICE_BUFFER | 1000 | 32 | 24674861214 | 598.421 MiB | 16.475 MiB | ``` Split-page decoding for fixed-width types + structs are also going through this new path. New test added. This brings us closer to eliminating the "general" kernel. The only things left that run through it are lists and booleans. This is PR 1 of 2, with the followup moving a lot of code around. At this point, I think it makes sense to start consolidating our files a bit. I also left some breadcrumbs (a few small commented out code blocks) in the core kernel `gpuDecodePageDataGeneric` for the next step of adding list support. They can be removed if people don't like them. Authors: - https://github.com/nvdbaranec Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - Vukasin Milovanovic (https://github.com/vuule) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: rapidsai#15911 commit e434fdb Author: David Wendt <45795991+davidwendt@users.noreply.github.com> Date: Fri Jun 28 10:57:01 2024 -0400 Update libcudf compiler requirements in contributing doc (rapidsai#16103) Updates the compiler requirements in the contributing document. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#16103 commit 565c0d1 Author: Matthew Murray <41342305+Matt711@users.noreply.github.com> Date: Fri Jun 28 10:16:55 2024 -0400 Migrate lists/contains to pylibcudf (rapidsai#15981) Part of rapidsai#15162. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#15981 commit c40e0cc Author: Matthew Murray <41342305+Matt711@users.noreply.github.com> Date: Fri Jun 28 10:10:31 2024 -0400 Add support for proxy `np.flatiter` objects (rapidsai#16107) Closes rapidsai#15388 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#16107 commit 673d766 Author: Paul Mattione <156858817+pmattione-nvidia@users.noreply.github.com> Date: Fri Jun 28 09:38:57 2024 -0400 Make binary operators work between fixed-point and floating args (rapidsai#16116) Some of the binary operators in cuDF don't work between fixed_point and floating-point numbers after [this earlier PR](rapidsai#15438) removed the ability to construct and implicitly cast fixed_point numbers from floating point numbers. This PR restores that functionality by detecting and performing the necessary explicit casts, and adds tests for the supported operators. Note that the `binary_op_has_common_type` code is modeled after `has_common_type` found in traits.hpp. This closes [issue 16090](rapidsai#16090) Authors: - Paul Mattione (https://github.com/pmattione-nvidia) Approvers: - Jayjeet Chakraborty (https://github.com/JayjeetAtGithub) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#16116 commit 224ac5b Author: David Wendt <45795991+davidwendt@users.noreply.github.com> Date: Fri Jun 28 09:26:37 2024 -0400 Add libcudf public/detail API pattern to developer guide (rapidsai#16086) Adds specific description for the public API to detail API function pattern to the libcudf developer guide. Also fixes some formatting issues and broken link. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Shruti Shivakumar (https://github.com/shrshi) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#16086 commit 2b547dc Author: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> Date: Fri Jun 28 03:11:01 2024 -1000 Add ensure_index to not unnecessarily shallow copy cudf.Index (rapidsai#16117) The `cudf.Index` constructor will shallow copy a `cudf.Index` input. Sometimes, we just need to make sure an input is a `cudf.Index`, so created `ensure_index` (pandas has something similar) so we don't shallow copy these inputs unnecessarily Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: rapidsai#16117 commit 57862a3 Author: Robert Maynard <rmaynard@nvidia.com> Date: Fri Jun 28 08:43:12 2024 -0400 stable_distinct public api now has a stream parameter (rapidsai#16068) As part of rapidsai#15982 we determined that the cudf `stable_distinct` public API needs to be updated so that a user provided stream can be provided. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Nghia Truong (https://github.com/ttnghia) - Srinivas Yadav (https://github.com/srinivasyadav18) - Bradley Dice (https://github.com/bdice) URL: rapidsai#16068 commit 6b04fd3 Author: Mads R. B. Kristensen <madsbk@gmail.com> Date: Fri Jun 28 12:31:18 2024 +0200 Memory Profiling (rapidsai#15866) Use [RMM's new memory profiler](rapidsai/rmm#1563) to profile all functions already decorated with `_cudf_nvtx_annotate`. Example ```python import cudf from cudf.utils.performance_tracking import print_memory_report cudf.set_option("memory_profiling", True) df1 = cudf.DataFrame({"a": [1, 2, 3]}) df2 = cudf.DataFrame({"a": [2, 2, 3]}) df3 = df1.merge(df2) print_memory_report() ``` Output: ``` Memory Profiling ================ Ordered by: memory_peak ncalls memory_peak memory_total filename:lineno(function) 1 272 688 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:4072(DataFrame.merge) 2 32 64 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:1043(DataFrame._init_from_dict_like) 2 32 64 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:690(DataFrame.__init__) 2 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:1131(DataFrame._align_input_series_indices) 7 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:214(RangeIndex.__init__) 6 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:424(RangeIndex.__len__) 4 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/frame.py:271(Frame.__len__) 2 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:3195(DataFrame._insert) 2 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:270(RangeIndex.name) 2 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:369(RangeIndex.copy) 5 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/frame.py:134(Frame._from_data) 2 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/frame.py:1039(Frame._copy_type_metadata) 2 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/indexed_frame.py:315(IndexedFrame._from_columns_like_self) ``` Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Mark Harris (https://github.com/harrism) - Lawrence Mitchell (https://github.com/wence-) - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#15866 commit e35da6b Author: Lawrence Mitchell <lmitchell@nvidia.com> Date: Fri Jun 28 09:54:03 2024 +0100 Implement Ternary copy_if_else (rapidsai#16114) A straightforward evaluation using `copy_if_else`. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - https://github.com/brandon-b-miller URL: rapidsai#16114 commit e940e30 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Thu Jun 27 21:44:41 2024 +0000 Address code review Co-authored-by: Vyas Ramasubramani <vyasr@nvidia.com> commit c847b98 Author: Lawrence Mitchell <lmitchell@nvidia.com> Date: Thu Jun 27 21:33:29 2024 +0100 Finish implementation of cudf-polars boolean function handlers (rapidsai#16098) The missing nodes were `is_in`, `not` (both easy), `is_finite` and `is_infinite` (obtained by translating to `contains` calls). While here, remove the implementation of `IsBetween` and just translate to an expression with binary operations. This removes the need for special-casing scalar arguments to `IsBetween` and reproducing the code for binop evaluation. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#16098 commit 2ed69c9 Author: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> Date: Thu Jun 27 10:11:09 2024 -1000 Ensure MultiIndex.to_frame deep copies columns (rapidsai#16110) Additionally, this allows simplification in `MultiIndex.__repr__` which avoids a shallow copy and also caught a bug where `NaT` was not supposed to be quoted Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#16110 commit a71c249 Author: GALI PREM SAGAR <sagarprem75@gmail.com> Date: Thu Jun 27 14:29:31 2024 -0500 Fix dtype errors in `StringArrays` (rapidsai#16111) This PR adds proxy classes for `ArrowStringArray` and `ArrowStringArrayNumpySemantics` that will increase the pandas test pass rate by 1%. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#16111 commit 8fc139f Merge: 79c1dfd f7cd9e6 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Thu Jun 27 18:33:52 2024 +0000 Merge branch 'pylibcudf-io-writers' of github.com:lithomas1/cudf into pylibcudf-io-writers commit 79c1dfd Author: Thomas Li <thomasli1234567890@gmail.com> Date: Thu Jun 27 18:33:40 2024 +0000 clean source_or_sink commit c5a3fbe Merge: aff6178 5d49fe6 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Thu Jun 27 18:25:42 2024 +0000 Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers commit f7cd9e6 Author: Thomas Li <47963215+lithomas1@users.noreply.github.com> Date: Wed Jun 26 09:15:50 2024 -0700 cleanup utils commit aff6178 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Tue Jun 25 20:45:47 2024 +0000 small test fixes commit 0ed9af6 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Tue Jun 25 19:27:14 2024 +0000 Fix error in testing utils Co-authored-by: Lawrence Mitchell <lmitchell@nvidia.com> commit 9a6a896 Merge: 186a2fb cdfb550 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Tue Jun 25 19:12:37 2024 +0000 Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers commit 186a2fb Merge: 53b821c 0c6b828 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Mon Jun 24 17:19:39 2024 +0000 Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers commit 53b821c Merge: 624d444 604c16d Author: Thomas Li <thomasli1234567890@gmail.com> Date: Mon Jun 24 17:19:12 2024 +0000 Merge branch 'pylibcudf-io-writers' of github.com:lithomas1/cudf into pylibcudf-io-writers commit 624d444 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Mon Jun 24 17:17:27 2024 +0000 fix all nested struct cases commit e6c3ec7 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Mon Jun 24 16:57:29 2024 +0000 address more comments commit 604c16d Author: Thomas Li <thomasli1234567890@gmail.com> Date: Mon Jun 24 16:57:29 2024 +0000 address more comments commit d22953f Merge: e0901dd dcc153b Author: Thomas Li <47963215+lithomas1@users.noreply.github.com> Date: Tue Jun 18 10:19:24 2024 -0700 Merge branch 'branch-24.08' into pylibcudf-io-writers commit e0901dd Author: Thomas Li <47963215+lithomas1@users.noreply.github.com> Date: Mon Jun 17 09:45:19 2024 -0700 fix bad merge commit 564358f Merge: e242182 87f6a7e Author: Thomas Li <47963215+lithomas1@users.noreply.github.com> Date: Mon Jun 17 09:44:11 2024 -0700 Merge branch 'branch-24.08' into pylibcudf-io-writers commit e242182 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Thu Jun 13 20:52:23 2024 +0000 address more comments commit 699efd3 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Thu Jun 13 20:09:43 2024 +0000 cleanup tests commit 1228569 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Thu Jun 13 18:20:03 2024 +0000 update following feedback commit b1951d0 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Thu Jun 13 03:01:19 2024 +0000 try fix commit 9150a6c Author: Thomas Li <thomasli1234567890@gmail.com> Date: Wed Jun 12 23:48:18 2024 +0000 try something else commit 63358e9 Merge: 8c4c4e4 b35991c Author: Thomas Li <thomasli1234567890@gmail.com> Date: Wed Jun 12 23:30:56 2024 +0000 Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers commit 8c4c4e4 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Wed Jun 12 18:31:54 2024 +0000 address comments commit dc93356 Merge: c54316e 0891c5d Author: Thomas Li <thomasli1234567890@gmail.com> Date: Wed Jun 12 17:49:26 2024 +0000 Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers commit c54316e Author: Thomas Li <thomasli1234567890@gmail.com> Date: Tue Jun 11 20:41:18 2024 +0000 update commit cd6df5e Merge: 2b3853f 8efa64e Author: Thomas Li <thomasli1234567890@gmail.com> Date: Tue Jun 11 17:00:05 2024 +0000 Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers commit 2b3853f Author: Thomas Li <thomasli1234567890@gmail.com> Date: Tue Jun 11 16:49:14 2024 +0000 add some tests commit 8c88c7c Merge: c24664c 719a8a6 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Tue Jun 11 00:19:28 2024 +0000 Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers commit c24664c Author: Thomas Li <thomasli1234567890@gmail.com> Date: Fri Jun 7 18:25:06 2024 +0000 update and start writing tests commit 72204f1 Merge: 15daaaa 9bd16bb Author: Thomas Li <thomasli1234567890@gmail.com> Date: Fri Jun 7 16:02:25 2024 +0000 Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcudf-io-writers commit 15daaaa Author: Thomas Li <thomasli1234567890@gmail.com> Date: Fri Jun 7 16:02:10 2024 +0000 update docs commit 591cdd2 Author: Thomas Li <thomasli1234567890@gmail.com> Date: Thu Jun 6 23:54:58 2024 +0000 Start migrating I/O writers to pylibcudf (starting with JSON)

commit 1a4c2aa Author: Thomas Li <47963215+lithomas1@users.noreply.github.com> Date: Tue Jul 2 07:38:18 2024 -0700 Start migrating I/O writers to pylibcudf (starting with JSON) (rapidsai#15952) Switches the JSON writer to use pylibcudf. xref rapidsai#15162 Authors: - Thomas Li (https://github.com/lithomas1) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#15952 commit a1447c7 Author: Robert Maynard <rmaynard@nvidia.com> Date: Tue Jul 2 09:34:29 2024 -0400 Promote has_nested_columns to cudf public API (rapidsai#16131) The `has_nested_columns` functionality is used in numerous tests. It looks like it should be part of our stable public API. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Muhammad Haseeb (https://github.com/mhaseeb123) - Yunsong Wang (https://github.com/PointKernel) URL: rapidsai#16131 commit a4be7bd Author: Vyas Ramasubramani <vyasr@nvidia.com> Date: Tue Jul 2 00:50:42 2024 -0700 Use Arrow C Data Interface functions for Python interop (rapidsai#15904) This PR replaces the internals of `from_arrow` in pylibcudf with an implementation that uses the [Arrow C Data Interface](https://arrow.apache.org/docs/format/CDataInterface.html) using the [Python Capsule interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html). This allows us to decouple our Python builds from using pyarrow Cython (partially, we haven't replaced the `to_arrow` conversion yet) and it will also allow us to support any other Python package that is a producer of the data interface. To support the above functionality, the following additional changes were needed in this PR: - Added the ability to produce cudf tables from `ArrowArrayStream` objects since that is what `pyarrow.Table` produces. This function is a simple wrapper around the existing `from_arrrow(ArrowArray)` API. - Added support for the large strings type, for which support has improved throughout cudf since the `from_arrow_host` API was added and for which we now require a basic overload for tests to pass. I did not add corresponding support for `from_arrow_device` to avoid ballooning the scope of this PR, so that work can be done in a follow-up. - Proper handling of `type_id::EMPTY` in concatenate because the most natural implementation of the ArrowArrayStream processing is to run `from_arrow` on each chunk and then concatenate the outputs, and from the Python side we can produce chunks of all null arrays from arrow. Contributes to rapidsai#14926 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Robert Maynard (https://github.com/robertmaynard) - David Wendt (https://github.com/davidwendt) URL: rapidsai#15904 commit 08552f8 Author: Lawrence Mitchell <lmitchell@nvidia.com> Date: Tue Jul 2 03:12:50 2024 +0100 Update cudf-polars for v1 release of polars (rapidsai#16149) Minor changes to the IR, which we adapt to, and request `polars>=1.0` in dependencies. Authors: - Lawrence Mitchell (https://github.com/wence-) - Thomas Li (https://github.com/lithomas1) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#16149 commit 760c15c Author: Kyle Edwards <kyedwards@nvidia.com> Date: Mon Jul 1 14:27:30 2024 -0400 Use verify-alpha-spec hook (rapidsai#16144) With the deployment of rapids-build-backend, we need to make sure our dependencies have alpha specs. Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#16144 commit b691b1c Author: David Wendt <45795991+davidwendt@users.noreply.github.com> Date: Mon Jul 1 14:25:11 2024 -0400 Add stream parameter to cudf::io::text::multibyte_split (rapidsai#16034) Adds stream support the `cudf::io::text::multibyte_split` API. Also adds a stream test and deprecates an overloaded API. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Mark Harris (https://github.com/harrism) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#16034 commit 5efd72f Author: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> Date: Mon Jul 1 07:37:12 2024 -1000 Ensure cudf objects can astype to any type when empty (rapidsai#16106) pandas allows objects to `astype` to any other type if the object is empty. The PR mirrors that behavior for cudf. This PR also more consistently uses `astype` instead of `as_*_column` and fixes a bug in `IntervalDtype.__eq__` discovered when writing a unit test for this bug. Authors: - Matthew Roeschke (https://github.com/mroeschke) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: rapidsai#16106 commit 51fb873 Merge: 599ce95 e932fbd Author: gpuCI <38199262+GPUtester@users.noreply.github.com> Date: Mon Jul 1 12:17:38 2024 -0400 Merge pull request rapidsai#16145 from rapidsai/branch-24.06 Forward-merge branch-24.06 into branch-24.08 commit e932fbd Author: Vyas Ramasubramani <vyasr@nvidia.com> Date: Mon Jul 1 09:17:32 2024 -0700 Add patch for incorrect cuco noexcept clauses (rapidsai#16077) [cuco previously marked a number of methods as noexcept that can in fact throw exceptions](NVIDIA/cuCollections#510). This causes problems for cudf functions that call these methods. The issue [was fixed in cuco upstream](NVIDIA/cuCollections#511), but we cannot easily update to the latest commit of cuco, especially in a patch fix for 24.06. This PR instead adds a rapids-cmake patch for the cuco clone to address this issue. The patch may be removed once we update to a commit of cuco that contains the necessary fix. Resolves rapidsai#16059 commit 599ce95 Author: Lawrence Mitchell <lmitchell@nvidia.com> Date: Mon Jul 1 09:35:35 2024 +0100 Implement handlers for series literal in cudf-polars (rapidsai#16113) A query plan can contain a "literal" polars Series. Often, for example, when calling a contains-like function. To translate these, introduce a new `LiteralColumn` node to capture the concept and add an evaluation rule (converting from arrow). Since list-dtype Series need the same casting treatment as in dataframe scan case, factor the casting out into a utility, and take the opportunity to handled casting of nested lists correctly. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Thomas Li (https://github.com/lithomas1) - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#16113 commit 3c3edfe Author: Yunsong Wang <yunsongw@nvidia.com> Date: Fri Jun 28 13:58:22 2024 -0700 Update implementations to build with the latest cuco (rapidsai#15938) This PR updates existing libcudf to accommodate a cuco breaking change introduced in NVIDIA/cuCollections#479. It helps avoid breaking cudf when bumping the cuco version in `rapids-cmake`. Redundant equal/hash overloads will be removed once the version bump is done on the `rapids-cmake` end. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#15938 commit df88cf5 Author: Bradley Dice <bdice@bradleydice.com> Date: Fri Jun 28 15:40:52 2024 -0500 Use size_t to allow large conditional joins (rapidsai#16127) The conditional join kernels were using `cudf::size_type` where `std::size_t` was needed. This PR fixes that bug, which caused `cudaErrorIllegalAddress` as shown in rapidsai#16115. This closes rapidsai#16115. I did not add tests because we typically do not test very large workloads. However, I committed the test and reverted it in this PR, so there is a record of my validation code. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - https://github.com/nvdbaranec - Yunsong Wang (https://github.com/PointKernel) URL: rapidsai#16127 commit fb12d98 Author: Robert Maynard <rmaynard@nvidia.com> Date: Fri Jun 28 12:14:58 2024 -0400 Installed cudf header use cudf::allocate_like (rapidsai#16087) Remove usage of non public cudf::allocate_like from implementations in headers we install Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#16087 commit 78f4a8a Author: Robert Maynard <rmaynard@nvidia.com> Date: Fri Jun 28 11:26:27 2024 -0400 Move common string utilities to public api (rapidsai#16070) As part of rapidsai#15982 a subset of the strings utility functions have been identified as being worth expsosing as part of the cudf public API. The `create_string_vector_from_column`, `get_offset64_threshold`, and `is_large_strings_enabled` are now made part of the public `cudf::strings` api. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - MithunR (https://github.com/mythrocks) - David Wendt (https://github.com/davidwendt) - Jayjeet Chakraborty (https://github.com/JayjeetAtGithub) - Lawrence Mitchell (https://github.com/wence-) URL: rapidsai#16070 commit a4b951a Author: nvdbaranec <56695930+nvdbaranec@users.noreply.github.com> Date: Fri Jun 28 10:20:42 2024 -0500 Templatization of fixed-width parquet decoding kernels. (rapidsai#15911) This PR merges all of the fixed-width parquet decoding kernels into a single templatized kernel that can be selectively instantiated with desired features (dictionary/no-dictionary, nested/non-nested, etc). It also adds support for (non-list) nested columns in this path. So structs do not have to use the much slower general decode kernel any more. A new benchmark was added specific to structs containing only fixed width columns. I added this because the performance improvement is fairly high (+20%) but we don't see it in the normal struct benchmarks because they include (and are dominated by) string decode times. The new benchmark shows: Before this PR: ``` | data_type | io_type | cardinality | run_length | bytes_per_second | peak_memory_usage | encoded_file_size | |-----------|---------------|-------------|------------|------------------|-------------------|-------------------| | STRUCT | DEVICE_BUFFER | 0 | 1 | 21071216823 | 1.047 GiB | 511.675 MiB | | STRUCT | DEVICE_BUFFER | 1000 | 1 | 18974392387 | 821.312 MiB | 128.884 MiB | | STRUCT | DEVICE_BUFFER | 0 | 32 | 20429356824 | 621.787 MiB | 28.141 MiB | | STRUCT | DEVICE_BUFFER | 1000 | 32 | 20572327813 | 598.421 MiB | 16.475 MiB | ``` After this PR: ``` | data_type | io_type | cardinality | run_length | bytes_per_second | peak_memory_usage | encoded_file_size | |-----------|---------------|-------------|------------|------------------|-------------------|-------------------| | STRUCT | DEVICE_BUFFER | 0 | 1 | 25805996399 | 1.047 GiB | 511.675 MiB | | STRUCT | DEVICE_BUFFER | 1000 | 1 | 22422306660 | 821.312 MiB | 128.884 MiB | | STRUCT | DEVICE_BUFFER | 0 | 32 | 24460694014 | 621.787 MiB | 28.141 MiB | | STRUCT | DEVICE_BUFFER | 1000 | 32 | 24674861214 | 598.421 MiB | 16.475 MiB | ``` Split-page decoding for fixed-width types + structs are also going through this new path. New test added. This brings us closer to eliminating the "general" kernel. The only things left that run through it are lists and booleans. This is PR 1 of 2, with the followup moving a lot of code around. At this point, I think it makes sense to start consolidating our files a bit. I also left some breadcrumbs (a few small commented out code blocks) in the core kernel `gpuDecodePageDataGeneric` for the next step of adding list support. They can be removed if people don't like them. Authors: - https://github.com/nvdbaranec Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - Vukasin Milovanovic (https://github.com/vuule) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: rapidsai#15911 commit e434fdb Author: David Wendt <45795991+davidwendt@users.noreply.github.com> Date: Fri Jun 28 10:57:01 2024 -0400 Update libcudf compiler requirements in contributing doc (rapidsai#16103) Updates the compiler requirements in the contributing document. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#16103 commit 565c0d1 Author: Matthew Murray <41342305+Matt711@users.noreply.github.com> Date: Fri Jun 28 10:16:55 2024 -0400 Migrate lists/contains to pylibcudf (rapidsai#15981) Part of rapidsai#15162. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#15981 commit c40e0cc Author: Matthew Murray <41342305+Matt711@users.noreply.github.com> Date: Fri Jun 28 10:10:31 2024 -0400 Add support for proxy `np.flatiter` objects (rapidsai#16107) Closes rapidsai#15388 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#16107 commit 673d766 Author: Paul Mattione <156858817+pmattione-nvidia@users.noreply.github.com> Date: Fri Jun 28 09:38:57 2024 -0400 Make binary operators work between fixed-point and floating args (rapidsai#16116) Some of the binary operators in cuDF don't work between fixed_point and floating-point numbers after [this earlier PR](rapidsai#15438) removed the ability to construct and implicitly cast fixed_point numbers from floating point numbers. This PR restores that functionality by detecting and performing the necessary explicit casts, and adds tests for the supported operators. Note that the `binary_op_has_common_type` code is modeled after `has_common_type` found in traits.hpp. This closes [issue 16090](rapidsai#16090) Authors: - Paul Mattione (https://github.com/pmattione-nvidia) Approvers: - Jayjeet Chakraborty (https://github.com/JayjeetAtGithub) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#16116 commit 224ac5b Author: David Wendt <45795991+davidwendt@users.noreply.github.com> Date: Fri Jun 28 09:26:37 2024 -0400 Add libcudf public/detail API pattern to developer guide (rapidsai#16086) Adds specific description for the public API to detail API function pattern to the libcudf developer guide. Also fixes some formatting issues and broken link. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Shruti Shivakumar (https://github.com/shrshi) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#16086 commit 2b547dc Author: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> Date: Fri Jun 28 03:11:01 2024 -1000 Add ensure_index to not unnecessarily shallow copy cudf.Index (rapidsai#16117) The `cudf.Index` constructor will shallow copy a `cudf.Index` input. Sometimes, we just need to make sure an input is a `cudf.Index`, so created `ensure_index` (pandas has something similar) so we don't shallow copy these inputs unnecessarily Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: rapidsai#16117 commit 57862a3 Author: Robert Maynard <rmaynard@nvidia.com> Date: Fri Jun 28 08:43:12 2024 -0400 stable_distinct public api now has a stream parameter (rapidsai#16068) As part of rapidsai#15982 we determined that the cudf `stable_distinct` public API needs to be updated so that a user provided stream can be provided. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Nghia Truong (https://github.com/ttnghia) - Srinivas Yadav (https://github.com/srinivasyadav18) - Bradley Dice (https://github.com/bdice) URL: rapidsai#16068 commit 6b04fd3 Author: Mads R. B. Kristensen <madsbk@gmail.com> Date: Fri Jun 28 12:31:18 2024 +0200 Memory Profiling (rapidsai#15866) Use [RMM's new memory profiler](rapidsai/rmm#1563) to profile all functions already decorated with `_cudf_nvtx_annotate`. Example ```python import cudf from cudf.utils.performance_tracking import print_memory_report cudf.set_option("memory_profiling", True) df1 = cudf.DataFrame({"a": [1, 2, 3]}) df2 = cudf.DataFrame({"a": [2, 2, 3]}) df3 = df1.merge(df2) print_memory_report() ``` Output: ``` Memory Profiling ================ Ordered by: memory_peak ncalls memory_peak memory_total filename:lineno(function) 1 272 688 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:4072(DataFrame.merge) 2 32 64 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:1043(DataFrame._init_from_dict_like) 2 32 64 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:690(DataFrame.__init__) 2 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:1131(DataFrame._align_input_series_indices) 7 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:214(RangeIndex.__init__) 6 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:424(RangeIndex.__len__) 4 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/frame.py:271(Frame.__len__) 2 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:3195(DataFrame._insert) 2 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:270(RangeIndex.name) 2 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:369(RangeIndex.copy) 5 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/frame.py:134(Frame._from_data) 2 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/frame.py:1039(Frame._copy_type_metadata) 2 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/indexed_frame.py:315(IndexedFrame._from_columns_like_self) ``` Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Mark Harris (https://github.com/harrism) - Lawrence Mitchell (https://github.com/wence-) - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#15866 commit e35da6b Author: Lawrence Mitchell <lmitchell@nvidia.com> Date: Fri Jun 28 09:54:03 2024 +0100 Implement Ternary copy_if_else (rapidsai#16114) A straightforward evaluation using `copy_if_else`. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - https://github.com/brandon-b-miller URL: rapidsai#16114 commit c847b98 Author: Lawrence Mitchell <lmitchell@nvidia.com> Date: Thu Jun 27 21:33:29 2024 +0100 Finish implementation of cudf-polars boolean function handlers (rapidsai#16098) The missing nodes were `is_in`, `not` (both easy), `is_finite` and `is_infinite` (obtained by translating to `contains` calls). While here, remove the implementation of `IsBetween` and just translate to an expression with binary operations. This removes the need for special-casing scalar arguments to `IsBetween` and reproducing the code for binop evaluation. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#16098 commit 2ed69c9 Author: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com> Date: Thu Jun 27 10:11:09 2024 -1000 Ensure MultiIndex.to_frame deep copies columns (rapidsai#16110) Additionally, this allows simplification in `MultiIndex.__repr__` which avoids a shallow copy and also caught a bug where `NaT` was not supposed to be quoted Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#16110 commit a71c249 Author: GALI PREM SAGAR <sagarprem75@gmail.com> Date: Thu Jun 27 14:29:31 2024 -0500 Fix dtype errors in `StringArrays` (rapidsai#16111) This PR adds proxy classes for `ArrowStringArray` and `ArrowStringArrayNumpySemantics` that will increase the pandas test pass rate by 1%. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#16111

github-actions bot added the Python Affects Python cuDF API. label May 27, 2024

madsbk added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 27, 2024

madsbk mentioned this pull request May 27, 2024

Add a stack to the statistics resource rapidsai/rmm#1563

Merged

3 tasks

_cudf_nvtx_annotate: add RMM's memory profiler

cf15a02

See <rapidsai/rmm#1563>

madsbk force-pushed the memory_statistics branch from 61093a4 to cf15a02 Compare June 7, 2024 07:42

madsbk commented Jun 7, 2024

View reviewed changes

madsbk marked this pull request as ready for review June 7, 2024 12:02

madsbk requested a review from a team as a code owner June 7, 2024 12:02

madsbk requested review from mroeschke and lithomas1 June 7, 2024 12:02

Merge branch 'branch-24.08' of github.com:rapidsai/cudf into memory_s…

f1ff8d4

…tatistics

beckernick mentioned this pull request Jun 14, 2024

[FEA] Implement (configurable) RMM-based memory profiling for all NVTX annotated functions rapidsai/cuml#5932

Open

madsbk added 3 commits June 14, 2024 10:23

get_option("memory_profiling")

32ca877

Merge branch 'branch-24.08' of github.com:rapidsai/cudf into memory_s…

aeff5c7

…tatistics

_dask_cudf_performance_tracking

b323b28

madsbk added 5 commits June 14, 2024 18:11

doc

386292b

remove {py:mod}

3625db6

Merge branch 'branch-24.08' of github.com:rapidsai/cudf into memory_s…

2a0c77a

…tatistics

fix _external_only_api

7cae9d1

use traceback.extract_stack(limit=2)[0]

a4e4fc6

Looks walk_stack() use an incorrect current frame

wence- requested changes Jun 24, 2024

View reviewed changes

typo

69a54f3

Co-authored-by: Lawrence Mitchell <wence@gmx.li>

don't call rmm.statistics.enable_statistics()

46d9471

madsbk requested a review from wence- June 25, 2024 13:56

test

ffa6360

madsbk added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Jun 25, 2024

harrism requested changes Jun 26, 2024

View reviewed changes

docs/cudf/source/user_guide/memory-profiling.md Outdated Show resolved Hide resolved

docs/cudf/source/user_guide/memory-profiling.md Outdated Show resolved Hide resolved

madsbk and others added 3 commits June 26, 2024 09:39

Update docs/cudf/source/user_guide/memory-profiling.md

6cf2ca3

Co-authored-by: Mark Harris <783069+harrism@users.noreply.github.com>

doc

339fa3e

Merge branch 'memory_statistics' of github.com:madsbk/cudf into memor…

124b179

…y_statistics

madsbk requested a review from harrism June 26, 2024 08:04

harrism reviewed Jun 26, 2024

View reviewed changes

docs/cudf/source/user_guide/memory-profiling.md Outdated Show resolved Hide resolved

Update docs/cudf/source/user_guide/memory-profiling.md

2fe290b

Co-authored-by: Mark Harris <783069+harrism@users.noreply.github.com>

madsbk requested a review from harrism June 26, 2024 11:22

lithomas1 removed their request for review June 26, 2024 14:19

harrism approved these changes Jun 27, 2024

View reviewed changes

wence- approved these changes Jun 27, 2024

View reviewed changes

docs/cudf/source/user_guide/memory-profiling.md Outdated Show resolved Hide resolved

docs/cudf/source/user_guide/memory-profiling.md Outdated Show resolved Hide resolved

vyasr requested changes Jun 27, 2024

View reviewed changes

python/cudf/cudf/utils/nvtx_annotation.py Outdated Show resolved Hide resolved

vyasr approved these changes Jun 27, 2024

View reviewed changes

madsbk and others added 4 commits June 28, 2024 08:22

Apply suggestions from code review

3f2ae60

Co-authored-by: Lawrence Mitchell <wence@gmx.li>

Merge branch 'branch-24.08' of github.com:rapidsai/cudf into memory_s…

e3e3d73

…tatistics

remove nvtx_annotation.py

944901a

remove _cudf_nvtx_annotate

3510e27

madsbk requested a review from a team as a code owner June 28, 2024 07:24

wence- approved these changes Jun 28, 2024

View reviewed changes

madsbk removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Jun 28, 2024

rapids-bot bot merged commit 6b04fd3 into rapidsai:branch-24.08 Jun 28, 2024
80 checks passed

madsbk deleted the memory_statistics branch June 28, 2024 10:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Profiling #15866

Memory Profiling #15866

madsbk commented May 27, 2024 •

edited

Loading

madsbk Jun 7, 2024 •

edited

Loading

mroeschke Jun 7, 2024

madsbk Jun 10, 2024

harrism commented Jun 10, 2024

madsbk commented Jun 11, 2024 •

edited

Loading

bdice commented Jun 11, 2024

madsbk commented Jun 12, 2024

bdice commented Jun 13, 2024

harrism commented Jun 13, 2024 •

edited

Loading

madsbk commented Jun 14, 2024

wence- left a comment

wence- Jun 24, 2024

madsbk Jun 24, 2024

madsbk Jun 24, 2024 •

edited

Loading

wence- Jun 24, 2024

madsbk Jun 24, 2024

wence- Jun 25, 2024

madsbk Jun 25, 2024

harrism left a comment

wence- left a comment

vyasr left a comment

wence- left a comment

madsbk commented Jun 28, 2024 •

edited

Loading

madsbk commented Jun 28, 2024

		@@ -17,12 +19,13 @@ def _get_color_for_nvtx(name):


		def _cudf_nvtx_annotate(func, domain="cudf_python"):

		if get_option("memory_profiling"):
		rmm.statistics.enable_statistics()

Memory Profiling #15866

Memory Profiling #15866

Conversation

madsbk commented May 27, 2024 • edited Loading

Checklist

madsbk Jun 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harrism commented Jun 10, 2024

madsbk commented Jun 11, 2024 • edited Loading

bdice commented Jun 11, 2024

madsbk commented Jun 12, 2024

bdice commented Jun 13, 2024

harrism commented Jun 13, 2024 • edited Loading

madsbk commented Jun 14, 2024

wence- left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

madsbk Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harrism left a comment

Choose a reason for hiding this comment

wence- left a comment

Choose a reason for hiding this comment

vyasr left a comment

Choose a reason for hiding this comment

wence- left a comment

Choose a reason for hiding this comment

madsbk commented Jun 28, 2024 • edited Loading

madsbk commented Jun 28, 2024

madsbk commented May 27, 2024 •

edited

Loading

madsbk Jun 7, 2024 •

edited

Loading

madsbk commented Jun 11, 2024 •

edited

Loading

harrism commented Jun 13, 2024 •

edited

Loading

madsbk Jun 24, 2024 •

edited

Loading

madsbk commented Jun 28, 2024 •

edited

Loading