Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Profiling #15866

Merged
merged 21 commits into from
Jun 28, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/cudf/source/user_guide/api_docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,4 @@ This page provides a list of all publicly accessible modules, methods and classe
options
extension_dtypes
pylibcudf/index.rst
performance_tracking
12 changes: 12 additions & 0 deletions docs/cudf/source/user_guide/api_docs/performance_tracking.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
.. _api.performance_tracking:

====================
Performance Tracking
====================

.. currentmodule:: cudf.utils.performance_tracking
.. autosummary::
:toctree: api/

get_memory_records
print_memory_report
1 change: 1 addition & 0 deletions docs/cudf/source/user_guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,6 @@ options
performance-comparisons/index
PandasCompat
copy-on-write
memory-profiling
pandas-2.0-breaking-changes
```
59 changes: 59 additions & 0 deletions docs/cudf/source/user_guide/memory-profiling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
(memory-profiling-user-doc)=

# Memory Profiling

Peak memory usage is a common concern in GPU programming since the available GPU memory is typically less than available CPU memory. To easily identify memory hotspots, cudf provides a memory profiler. In comes with an overhead so avoid using it in performance-sensitive code.
madsbk marked this conversation as resolved.
Show resolved Hide resolved

## Enabling memory profiling

First, we need to enable memory profiling in [RMM](https://docs.rapids.ai/api/rmm/stable/guide/). One way to do this is by calling {py:func}`rmm.statistics.enable_statistics()`. This will add a statistics resource adaptor to the current RMM memory resource, which enables cudf to access memory profiling information. See the RMM documentation for more details.

Second, we need to enable memory profiling in cudf by using either of the following:
madsbk marked this conversation as resolved.
Show resolved Hide resolved

1. Use {py:func}`cudf.set_option`:

```python
>>> import cudf
>>> cudf.set_option("memory_profiling", True)
```

2. Set the environment variable ``CUDF_MEMORY_PROFILING`` to ``1`` prior to the
launch of the Python interpreter:

```
CUDF_MEMORY_PROFILING="1" python -c "import cudf"
```

To get the result of the profiling, use {py:func}`cudf.utils.performance_tracking.print_memory_report`.

In the following example, we enable profiling, do some work, and then print the profiling results:

```python
>>> import cudf
>>> from cudf.utils.performance_tracking import print_memory_report
>>> from rmm.statistics import enable_statistics
>>> enable_statistics()
>>> cudf.set_option("memory_profiling", True)
>>> cudf.DataFrame({"a": [1, 2, 3]}) # Some work
a
0 1
1 2
2 3
>>> print_memory_report() # Pretty print the result of the profiling
Memory Profiling
================

Legends:
ncalls - number of times the function or code block was called
memory_peak - peak memory allocated in function or code block (in bytes)
memory_total - total memory allocated in function or code block (in bytes)

Ordered by: memory_peak

ncalls memory_peak memory_total filename:lineno(function)
1 32 32 cudf/core/dataframe.py:690(DataFrame.__init__)
2 0 0 cudf/core/index.py:214(RangeIndex.__init__)
6 0 0 cudf/core/index.py:424(RangeIndex.__len__)
```

It is also possible to access the raw profiling data by calling: {py:func}`cudf.utils.performance_tracking.get_memory_records`.
4 changes: 2 additions & 2 deletions python/cudf/cudf/core/buffer/spill_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,14 @@
import rmm.mr

from cudf.options import get_option
from cudf.utils.nvtx_annotation import _cudf_nvtx_annotate
from cudf.utils.performance_tracking import _performance_tracking
from cudf.utils.string import format_bytes

if TYPE_CHECKING:
from cudf.core.buffer.spillable_buffer import SpillableBufferOwner

_spill_cudf_nvtx_annotate = partial(
_cudf_nvtx_annotate, domain="cudf_python-spill"
_performance_tracking, domain="cudf_python-spill"
)


Expand Down
7 changes: 4 additions & 3 deletions python/cudf/cudf/core/buffer/spillable_buffer.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from typing import TYPE_CHECKING, Any, Dict, List, Literal, Optional, Tuple

import numpy
import nvtx
from typing_extensions import Self

import rmm
Expand All @@ -21,7 +22,7 @@
host_memory_allocation,
)
from cudf.core.buffer.exposure_tracked_buffer import ExposureTrackedBuffer
from cudf.utils.nvtx_annotation import _get_color_for_nvtx, annotate
from cudf.utils.performance_tracking import _get_color_for_nvtx
from cudf.utils.string import format_bytes

if TYPE_CHECKING:
Expand Down Expand Up @@ -200,7 +201,7 @@ def spill(self, target: str = "cpu") -> None:
)

if (ptr_type, target) == ("gpu", "cpu"):
with annotate(
with nvtx.annotate(
message="SpillDtoH",
color=_get_color_for_nvtx("SpillDtoH"),
domain="cudf_python-spill",
Expand All @@ -218,7 +219,7 @@ def spill(self, target: str = "cpu") -> None:
# trigger a new call to this buffer's `spill()`.
# Therefore, it is important that spilling-on-demand doesn't
# try to unspill an already locked buffer!
with annotate(
with nvtx.annotate(
message="SpillHtoD",
color=_get_color_for_nvtx("SpillHtoD"),
domain="cudf_python-spill",
Expand Down
14 changes: 14 additions & 0 deletions python/cudf/cudf/options.py
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,20 @@ def _integer_and_none_validator(val):
_make_contains_validator([False, True]),
)

_register_option(
"memory_profiling",
_env_get_bool("CUDF_MEMORY_PROFILING", False),
textwrap.dedent(
"""
If set to `False`, disables memory profiling.
If set to `True`, enables memory profiling.
Read more at: :ref:`memory-profiling-user-doc`
\tValid values are True or False. Default is False.
"""
),
_make_contains_validator([False, True]),
)


class option_context(ContextDecorator):
"""
Expand Down
41 changes: 41 additions & 0 deletions python/cudf/cudf/tests/test_performance_tracking.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Copyright (c) 2024, NVIDIA CORPORATION.

from io import StringIO

import pytest

import rmm.mr
import rmm.statistics

import cudf
from cudf.utils.performance_tracking import (
get_memory_records,
print_memory_report,
)


@pytest.fixture
def rmm_reset():
"""Fixture to reset the RMM resource before and after the test"""
mr = rmm.mr.get_current_device_resource()
try:
rmm.mr.set_current_device_resource(rmm.mr.CudaMemoryResource())
yield
finally:
rmm.mr.set_current_device_resource(mr)


def test_memory_profiling(rmm_reset):
df1 = cudf.DataFrame({"a": [1, 2, 3]})
assert len(get_memory_records()) == 0

rmm.statistics.enable_statistics()
cudf.set_option("memory_profiling", True)

df1.merge(df1)

assert len(get_memory_records()) > 0

out = StringIO()
print_memory_report(file=out)
assert "DataFrame.merge" in out.getvalue()
35 changes: 8 additions & 27 deletions python/cudf/cudf/utils/nvtx_annotation.py
Original file line number Diff line number Diff line change
@@ -1,30 +1,11 @@
# Copyright (c) 2023, NVIDIA CORPORATION.
# Copyright (c) 2023-2024, NVIDIA CORPORATION.

import hashlib
from functools import partial

from nvtx import annotate

_NVTX_COLORS = ["green", "blue", "purple", "rapids"]


def _get_color_for_nvtx(name):
m = hashlib.sha256()
m.update(name.encode())
hash_value = int(m.hexdigest(), 16)
idx = hash_value % len(_NVTX_COLORS)
return _NVTX_COLORS[idx]


def _cudf_nvtx_annotate(func, domain="cudf_python"):
"""Decorator for applying nvtx annotations to methods in cudf."""
return annotate(
message=func.__qualname__,
color=_get_color_for_nvtx(func.__qualname__),
domain=domain,
)(func)


_dask_cudf_nvtx_annotate = partial(
_cudf_nvtx_annotate, domain="dask_cudf_python"
from cudf.utils.performance_tracking import (
_dask_cudf_performance_tracking,
_performance_tracking,
)

# TODO: will remove this file and use _performance_tracking before merging
_cudf_nvtx_annotate = _performance_tracking
vyasr marked this conversation as resolved.
Show resolved Hide resolved
_dask_cudf_nvtx_annotate = _dask_cudf_performance_tracking
81 changes: 81 additions & 0 deletions python/cudf/cudf/utils/performance_tracking.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Copyright (c) 2023-2024, NVIDIA CORPORATION.

import contextlib
import functools
import hashlib
import sys
from typing import Dict

import nvtx

import rmm.statistics

from cudf.options import get_option

_NVTX_COLORS = ["green", "blue", "purple", "rapids"]


def _get_color_for_nvtx(name):
m = hashlib.sha256()
m.update(name.encode())
hash_value = int(m.hexdigest(), 16)
idx = hash_value % len(_NVTX_COLORS)
return _NVTX_COLORS[idx]


def _performance_tracking(func, domain="cudf_python"):
"""Decorator for applying performance tracking (if enabled)."""

@functools.wraps(func)
def wrapper(*args, **kwargs):
with contextlib.ExitStack() as stack:
if get_option("memory_profiling"):
# NB: the user still needs to call `rmm.statistics.enable_statistics()`
# to enable memory profiling.
stack.enter_context(
rmm.statistics.profiler(
name=rmm.statistics._get_descriptive_name_of_object(
func
)
)
)
if nvtx.enabled():
stack.enter_context(
nvtx.annotate(
message=func.__qualname__,
color=_get_color_for_nvtx(func.__qualname__),
domain=domain,
)
)
return func(*args, **kwargs)

return wrapper


_dask_cudf_performance_tracking = functools.partial(
_performance_tracking, domain="dask_cudf_python"
)


def get_memory_records() -> (
Dict[str, rmm.statistics.ProfilerRecords.MemoryRecord]
):
"""Get the memory records from the memory profiling

Returns
-------
Dict that maps function names to memory records. Empty if
memory profiling is disabled
"""
return rmm.statistics.default_profiler_records.records


def print_memory_report(file=sys.stdout) -> None:
"""Pretty print the result of the memory profiling

Parameters
----------
file
The output stream
"""
print(rmm.statistics.default_profiler_records.report(), file=file)
5 changes: 3 additions & 2 deletions python/cudf/cudf/utils/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,8 +159,9 @@ def _external_only_api(func, alternative=""):
@functools.wraps(func)
def wrapper(*args, **kwargs):
# Check the immediately preceding frame to see if it's in cudf.
frame, lineno = next(traceback.walk_stack(None))
fn = frame.f_code.co_filename
pre_frame = traceback.extract_stack(limit=2)[0]
fn = pre_frame.filename
lineno = pre_frame.lineno
wence- marked this conversation as resolved.
Show resolved Hide resolved
if _cudf_root in fn and _tests_root not in fn:
raise RuntimeError(
f"External-only API called in {fn} at line {lineno}. "
Expand Down
Loading