Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Memory leak in read_region in generator mode (using a list of locations) #598

Closed
diricxbart opened this issue Aug 3, 2023 · 2 comments · Fixed by #620
Closed

[BUG] Memory leak in read_region in generator mode (using a list of locations) #598

diricxbart opened this issue Aug 3, 2023 · 2 comments · Fixed by #620
Assignees
Labels
bug Something isn't working

Comments

@diricxbart
Copy link

diricxbart commented Aug 3, 2023

Describe the bug
I observe a significant memory leak when using cucim read_region in generator mode (using a list of locations).

Steps/Code to reproduce bug
The following snippet reproduces the memory leak:

import cucim
import numpy as np
import random
import os
import psutil

MEMORY_LEAK = True  # Set to False to avoid the memory leak using a workaround...
slide_path = 'some_svs_file.svs'

level = 0
size = (256, 256)
batch_size = 64
num_workers = 16
num_batches = 4.3  # This may be a non-integer number, to have the last iteration return a number of patches < batch_size
locations = [(random.randint(0, 1000), random.randint(2000, 3000)) for _ in range(int(batch_size * num_batches))]
print(f'Number of locations: {len(locations)}, batch size: {batch_size}, number of workers: {num_workers}')

def get_total_mem_usage():
    current_process = psutil.Process(os.getpid())
    mem = current_process.memory_info().rss
    for child in current_process.children(recursive=True):
        mem += child.memory_info().rss
    return mem

with cucim.CuImage(slide_path) as slide:
    start_mem = None
    for i in range(101):
        gen = slide.read_region(locations, size, level, batch_size=batch_size, num_workers=num_workers)
        if MEMORY_LEAK:
            # The regular cucim read_region results in a memory leak when iterating over it using `for x in gen`
            for x in gen:
                _ = np.asarray(x)
        else:
            # When iterating over the read_region generator using `next()`, no memory leak is observed
            num_reads = len(locations) // batch_size + min(1, len(locations) % batch_size)
            for _ in range(num_reads):
                _ = np.asarray(next(gen))

        if i % 10 == 0:
            if start_mem is None:
                start_mem = get_total_mem_usage()
                print(f'starting with: {(start_mem) // (1024 * 1024)} MB of memory consumption')
            else:
                print(f'mem increase (iteration: {i:3d}): {(get_total_mem_usage() - start_mem) // (1024 * 1024):4d} MB')

Setting MEMORY_LEAK to False activates a workaround avoiding the memory leak.
Output on my system with the memory leak:

Number of locations: 275, batch size: 64, number of workers: 16
starting with: 166 MB of memory consumption
mem increase (iteration:  10):  378 MB
mem increase (iteration:  20):  737 MB
mem increase (iteration:  30): 1099 MB
mem increase (iteration:  40): 1458 MB
mem increase (iteration:  50): 1819 MB
mem increase (iteration:  60): 2178 MB
mem increase (iteration:  70): 2539 MB
mem increase (iteration:  80): 2898 MB
mem increase (iteration:  90): 3259 MB
mem increase (iteration: 100): 3618 MB

And without:

Number of locations: 275, batch size: 64, number of workers: 16
starting with: 164 MB of memory consumption
mem increase (iteration:  10):   81 MB
mem increase (iteration:  20):   81 MB
mem increase (iteration:  30):   83 MB
mem increase (iteration:  40):   81 MB
mem increase (iteration:  50):   84 MB
mem increase (iteration:  60):   82 MB
mem increase (iteration:  70):   84 MB
mem increase (iteration:  80):   82 MB
mem increase (iteration:  90):   84 MB
mem increase (iteration: 100):   82 MB

In the above example, the same slide is read from multiple times.
Note that the issue is also present when iterating the following: reading from a slide and then closing it.
This can be observed by replacing the last part of the above snippet with:

start_mem = None
for i in range(101):
    slide = cucim.CuImage(slide_path)
    gen = slide.read_region(locations, size, level, batch_size=batch_size, num_workers=num_workers)
    if MEMORY_LEAK:
        # The regular cucim read_region results in a memory leak when iterating over it using `for x in gen`
        for x in gen:
            _ = np.asarray(x)
    else:
        # When iterating over the read_region generator using `next()`, no memory leak is observed
        num_reads = len(locations) // batch_size + min(1, len(locations) % batch_size)
        for _ in range(num_reads):
            _ = np.asarray(next(gen))

    if i % 10 == 0:
        if start_mem is None:
            start_mem = get_total_mem_usage()
            print(f'starting with: {(start_mem) // (1024 * 1024)} MB of memory consumption')
        else:
            print(f'mem increase (iteration: {i:3d}): {(get_total_mem_usage() - start_mem) // (1024 * 1024):4d} MB')
    slide.close()

Environment details (please complete the following information):

  • Environment location: Docker, starting from nvcr.io/nvidia/tensorflow:23.06-tf2-py3
  • Method of cuCIM install: pip, v23.06.00
  • CUDA: 12.4, NVidia driver: 525.125.06
@diricxbart diricxbart added the bug Something isn't working label Aug 3, 2023
@diricxbart
Copy link
Author

diricxbart commented Aug 3, 2023

Some additional information: the difference between for x in gen and for _ in range(num_reads) is that the first one will perform an additional iteration on the generator, triggering a StopIteration exception.

This mean that this will NOT produce the memory leak:

num_reads = len(locations) // batch_size + min(1, len(locations) % batch_size)
for _ in range(num_reads):
    _ = np.asarray(next(gen))

but this WILL produce the memory leak (notice the + 1):

for _ in range(num_reads + 1):
    try:
        _ = np.asarray(next(gen))
    except StopIteration:
        break

which is the equivalent of the originally posted

for x in gen:
    _ = np.asarray(x)

@gigony
Copy link
Contributor

gigony commented Oct 30, 2023

Kudos to @diricxbart for identifying and reporting the bug, along with supplying test cases that can be reliably replicated!
Appreciate your help! Fingers crossed that the problem is resolved in PR #620.

@rapids-bot rapids-bot bot closed this as completed in #620 Oct 30, 2023
rapids-bot bot pushed a commit that referenced this issue Oct 30, 2023
MERGE after #618

### Fix memory corruption issue in ThreadBatchDataLoader

Fix #592    

- Return a shared pointer of CuImageIterator, taking ownership of CuImageIterator instance, instead of returning a copy of CuImageIterator
- Do not create new instance for CuImageIterator when `__iter__` is called. Instead, return self.
- Wait until all threads are finished in ThreadBatchDataLoader destructor

### Fix memory leak in ThreadBatchDataLoader

Fix #598
    
- Throw StopIteration exception early
- Do not set nullptr for `raster_data_` in `ThreadBatchDataLoader::next_data()`
  - Setting nullptr prevents freeing memory in the destructor of ThreadBatchDataLoader

### Fix fatal errors for cache object

- GIL is not acquired when using Python object.

### Add iterator-related test cases for memory issue

Authors:
  - Gigon Bae (https://github.com/gigony)

Approvers:
  - Gregory Lee (https://github.com/grlee77)
  - https://github.com/jakirkham

URL: #620
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants