[BUG] Memory leak in read_region in generator mode (using a list of locations) #598

diricxbart · 2023-08-03T11:45:18Z

Describe the bug
I observe a significant memory leak when using cucim read_region in generator mode (using a list of locations).

Steps/Code to reproduce bug
The following snippet reproduces the memory leak:

import cucim
import numpy as np
import random
import os
import psutil

MEMORY_LEAK = True  # Set to False to avoid the memory leak using a workaround...
slide_path = 'some_svs_file.svs'

level = 0
size = (256, 256)
batch_size = 64
num_workers = 16
num_batches = 4.3  # This may be a non-integer number, to have the last iteration return a number of patches < batch_size
locations = [(random.randint(0, 1000), random.randint(2000, 3000)) for _ in range(int(batch_size * num_batches))]
print(f'Number of locations: {len(locations)}, batch size: {batch_size}, number of workers: {num_workers}')

def get_total_mem_usage():
    current_process = psutil.Process(os.getpid())
    mem = current_process.memory_info().rss
    for child in current_process.children(recursive=True):
        mem += child.memory_info().rss
    return mem

with cucim.CuImage(slide_path) as slide:
    start_mem = None
    for i in range(101):
        gen = slide.read_region(locations, size, level, batch_size=batch_size, num_workers=num_workers)
        if MEMORY_LEAK:
            # The regular cucim read_region results in a memory leak when iterating over it using `for x in gen`
            for x in gen:
                _ = np.asarray(x)
        else:
            # When iterating over the read_region generator using `next()`, no memory leak is observed
            num_reads = len(locations) // batch_size + min(1, len(locations) % batch_size)
            for _ in range(num_reads):
                _ = np.asarray(next(gen))

        if i % 10 == 0:
            if start_mem is None:
                start_mem = get_total_mem_usage()
                print(f'starting with: {(start_mem) // (1024 * 1024)} MB of memory consumption')
            else:
                print(f'mem increase (iteration: {i:3d}): {(get_total_mem_usage() - start_mem) // (1024 * 1024):4d} MB')

Setting MEMORY_LEAK to False activates a workaround avoiding the memory leak.
Output on my system with the memory leak:

Number of locations: 275, batch size: 64, number of workers: 16
starting with: 166 MB of memory consumption
mem increase (iteration:  10):  378 MB
mem increase (iteration:  20):  737 MB
mem increase (iteration:  30): 1099 MB
mem increase (iteration:  40): 1458 MB
mem increase (iteration:  50): 1819 MB
mem increase (iteration:  60): 2178 MB
mem increase (iteration:  70): 2539 MB
mem increase (iteration:  80): 2898 MB
mem increase (iteration:  90): 3259 MB
mem increase (iteration: 100): 3618 MB

And without:

Number of locations: 275, batch size: 64, number of workers: 16
starting with: 164 MB of memory consumption
mem increase (iteration:  10):   81 MB
mem increase (iteration:  20):   81 MB
mem increase (iteration:  30):   83 MB
mem increase (iteration:  40):   81 MB
mem increase (iteration:  50):   84 MB
mem increase (iteration:  60):   82 MB
mem increase (iteration:  70):   84 MB
mem increase (iteration:  80):   82 MB
mem increase (iteration:  90):   84 MB
mem increase (iteration: 100):   82 MB

In the above example, the same slide is read from multiple times.
Note that the issue is also present when iterating the following: reading from a slide and then closing it.
This can be observed by replacing the last part of the above snippet with:

start_mem = None
for i in range(101):
    slide = cucim.CuImage(slide_path)
    gen = slide.read_region(locations, size, level, batch_size=batch_size, num_workers=num_workers)
    if MEMORY_LEAK:
        # The regular cucim read_region results in a memory leak when iterating over it using `for x in gen`
        for x in gen:
            _ = np.asarray(x)
    else:
        # When iterating over the read_region generator using `next()`, no memory leak is observed
        num_reads = len(locations) // batch_size + min(1, len(locations) % batch_size)
        for _ in range(num_reads):
            _ = np.asarray(next(gen))

    if i % 10 == 0:
        if start_mem is None:
            start_mem = get_total_mem_usage()
            print(f'starting with: {(start_mem) // (1024 * 1024)} MB of memory consumption')
        else:
            print(f'mem increase (iteration: {i:3d}): {(get_total_mem_usage() - start_mem) // (1024 * 1024):4d} MB')
    slide.close()

Environment details (please complete the following information):

Environment location: Docker, starting from nvcr.io/nvidia/tensorflow:23.06-tf2-py3
Method of cuCIM install: pip, v23.06.00
CUDA: 12.4, NVidia driver: 525.125.06

The text was updated successfully, but these errors were encountered:

diricxbart · 2023-08-03T13:25:36Z

Some additional information: the difference between for x in gen and for _ in range(num_reads) is that the first one will perform an additional iteration on the generator, triggering a StopIteration exception.

This mean that this will NOT produce the memory leak:

num_reads = len(locations) // batch_size + min(1, len(locations) % batch_size)
for _ in range(num_reads):
    _ = np.asarray(next(gen))

but this WILL produce the memory leak (notice the + 1):

for _ in range(num_reads + 1):
    try:
        _ = np.asarray(next(gen))
    except StopIteration:
        break

which is the equivalent of the originally posted

for x in gen:
    _ = np.asarray(x)

gigony · 2023-10-30T21:29:36Z

Kudos to @diricxbart for identifying and reporting the bug, along with supplying test cases that can be reliably replicated!
Appreciate your help! Fingers crossed that the problem is resolved in PR #620.

MERGE after #618 ### Fix memory corruption issue in ThreadBatchDataLoader Fix #592 - Return a shared pointer of CuImageIterator, taking ownership of CuImageIterator instance, instead of returning a copy of CuImageIterator - Do not create new instance for CuImageIterator when `__iter__` is called. Instead, return self. - Wait until all threads are finished in ThreadBatchDataLoader destructor ### Fix memory leak in ThreadBatchDataLoader Fix #598 - Throw StopIteration exception early - Do not set nullptr for `raster_data_` in `ThreadBatchDataLoader::next_data()` - Setting nullptr prevents freeing memory in the destructor of ThreadBatchDataLoader ### Fix fatal errors for cache object - GIL is not acquired when using Python object. ### Add iterator-related test cases for memory issue Authors: - Gigon Bae (https://github.com/gigony) Approvers: - Gregory Lee (https://github.com/grlee77) - https://github.com/jakirkham URL: #620

diricxbart added the bug Something isn't working label Aug 3, 2023

gigony mentioned this issue Oct 27, 2023

Fix iterator-related memory issues #620

Merged

gigony self-assigned this Oct 30, 2023

rapids-bot bot closed this as completed in #620 Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Memory leak in read_region in generator mode (using a list of locations) #598

[BUG] Memory leak in read_region in generator mode (using a list of locations) #598

diricxbart commented Aug 3, 2023 •

edited

Loading

diricxbart commented Aug 3, 2023 •

edited

Loading

gigony commented Oct 30, 2023

[BUG] Memory leak in read_region in generator mode (using a list of locations) #598

[BUG] Memory leak in read_region in generator mode (using a list of locations) #598

Comments

diricxbart commented Aug 3, 2023 • edited Loading

diricxbart commented Aug 3, 2023 • edited Loading

gigony commented Oct 30, 2023

diricxbart commented Aug 3, 2023 •

edited

Loading

diricxbart commented Aug 3, 2023 •

edited

Loading