Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lru_cache for get_variables_by_attributes #1253

Merged
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions src/netCDF4/_netCDF4.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -1226,6 +1226,7 @@ from cpython.bytes cimport PyBytes_FromStringAndSize
from .utils import (_StartCountStride, _quantize, _find_dim, _walk_grps,
_out_array_shape, _sortbylist, _tostr, _safecast, _is_int)
import sys
import functools

__version__ = "1.6.3"

Expand Down Expand Up @@ -2617,7 +2618,9 @@ Is the Dataset open or closed?
return bool(self._isopen)

def __dealloc__(self):
# close file when there are no references to object left
# close file when there are no references to object left and clear the cache.
if self.get_variables_by_attributes:
self.get_variables_by_attributes.cache_clear()
Copy link
Collaborator Author

@ocefpaf ocefpaf May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do need this but this alone did not fix the failing test. In fact, the __dealloc__ test fails when I add a function object in the class __init__ that is a reference to the defined one, no relation to caching or not. I don't know enough about cython to understand what is wrong there.

if self._isopen:
self._close(False)

Expand Down Expand Up @@ -3333,9 +3336,10 @@ of existing (sub-) groups and their variables.
for group in groups:
group.set_ncstring_attrs(value) # recurse into subgroups...

@functools.lru_cache(maxsize=128)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the decorator here makes things simpler but can lead to memory leaks, see python/cpython#64058. However, b/c I could not make the most common fix for this work here and I believe with the cache clear in the __dealloc__ we cover the cases of opening and closing multiple datasets, I don't think this would be a problem. (I may be wrong here! I'm not an expert and I'm envisioning a subset of the use cases I know.)

What do you think @jswhit?

The alternative would be to factor this function out into a private one that takes self.variables and self._vars, cache that one, and use it in the class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ocefpaf I'm OK with this approach unless there are significant memory leaks. I guess you could create a test for memory leaks by repeatedly opening and closing the same file and invoking get_variables_by_attributes multiple times to see what happens to memory usage.

def get_variables_by_attributes(self, **kwargs):
"""
**`get_variables_by_attribute(self, **kwargs)`**
**`get_variables_by_attributes(self, **kwargs)`**

Returns a list of variables that match specific conditions.

Expand Down
5 changes: 5 additions & 0 deletions test/run_all.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,11 @@
test_files.remove('tst_cdl.py');
sys.stdout.write('not running tst_cdl.py ...\n')

# Don't run computationally intensive test
if os.getenv('MEMORY_LEAK_TEST'):
Copy link
Collaborator

@jswhit jswhit May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want if not os.getenv('MEMORY_LEAK_TEST'): so the test doesn't run unless the env var is set? Looks like the test did run (and pass) in the latest github actions.

test_files.remove('tst_multiple_open_close.py');
sys.stdout.write('not running tst_multiple_open_close.py ...\n')

# Build the test suite from the tests found in the test files.
testsuite = unittest.TestSuite()
for f in test_files:
Expand Down
50 changes: 50 additions & 0 deletions test/tst_multiple_open_close.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import os
import tracemalloc
Copy link
Collaborator Author

@ocefpaf ocefpaf May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I'm using this correctly. I'm out of my depth here. Also, it fails in the CIs but works locally. Not sure why.


Edit: moving it before the __main__ works and the reported memory seems OK.

import unittest

import netCDF4

class MultipleVariablesByAttributesCallsTests(unittest.TestCase):


def test_multiple_calls(self):
netcdf_file = os.path.join(os.path.dirname(__file__), "netcdf_dummy_file.nc")
tracemalloc.start()
snapshot = tracemalloc.take_snapshot()

k_times = 10
for _k in range(k_times):
nc = netCDF4.Dataset(netcdf_file)

vs = nc.get_variables_by_attributes(axis='Z')
self.assertEqual(len(vs), 1)

vs = nc.get_variables_by_attributes(units='m/s')
self.assertEqual(len(vs), 4)

vs = nc.get_variables_by_attributes(axis='Z', units='m')
self.assertEqual(len(vs), 1)

vs = nc.get_variables_by_attributes(axis=lambda v: v in ['X', 'Y', 'Z', 'T'])
self.assertEqual(len(vs), 1)

vs = nc.get_variables_by_attributes(grid_mapping=lambda v: v is not None)
self.assertEqual(len(vs), 12)

vs = nc.get_variables_by_attributes(grid_mapping=lambda v: v is not None, long_name=lambda v: v is not None and 'Upward (w) velocity' in v)
self.assertEqual(len(vs), 1)

vs = nc.get_variables_by_attributes(units='m/s', grid_mapping=lambda v: v is not None)
self.assertEqual(len(vs), 4)

vs = nc.get_variables_by_attributes(grid_mapping=lambda v: v is not None, long_name='Upward (w) velocity')
self.assertEqual(len(vs), 1)
nc.close()
stats = tracemalloc.take_snapshot().compare_to(snapshot, 'filename')
tracemalloc.stop()
print("[ Top 10 differences ]")
for stat in stats[:10]:
print(stat)

if __name__ == '__main__':
unittest.main()