Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lru_cache for get_variables_by_attributes #1253

Merged

Conversation

ocefpaf
Copy link
Collaborator

@ocefpaf ocefpaf commented May 18, 2023

Still a work in progress and should be reviewed after #1252.

I'm not sure this is the best approach b/c the Dataset class will get a nice cached version of it without the memory leak caused if we use the decorator directly. However, that means the MFDataset will get the uncached one. We can do the same we did for Dataset in this PR to implement a cached version for MFDataset but I'm not sure that is worth the extra code (burden).

@ocefpaf
Copy link
Collaborator Author

ocefpaf commented May 18, 2023

My changes caused the error in the serial variant but I'm not sure how to fix it.

ERROR: runTest (tst_refcount.RefCountTestCase.runTest)
testing garbage collection (issue 218)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/runner/work/netcdf4-python/netcdf4-python/test/tst_refcount.py", line 25, in runTest
    nc = netCDF4.Dataset(self.file, mode='w', format='NETCDF4')
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/netCDF4/_netCDF4.pyx", line 2454, in netCDF4._netCDF4.Dataset.__init__
  File "src/netCDF4/_netCDF4.pyx", line [20](https://github.com/Unidata/netcdf4-python/actions/runs/5015979728/jobs/8992262365?pr=1253#step:6:21)13, in netCDF4._netCDF4._ensure_nc_success
PermissionError: [Errno 13] Permission denied: '/var/folders/[24](https://github.com/Unidata/netcdf4-python/actions/runs/5015979728/jobs/8992262365?pr=1253#step:6:25)/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmpajh6quvt.nc'

@jswhit
Copy link
Collaborator

jswhit commented May 20, 2023

My changes caused the error in the serial variant but I'm not sure how to fix it.

ERROR: runTest (tst_refcount.RefCountTestCase.runTest)
testing garbage collection (issue 218)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/runner/work/netcdf4-python/netcdf4-python/test/tst_refcount.py", line 25, in runTest
    nc = netCDF4.Dataset(self.file, mode='w', format='NETCDF4')
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/netCDF4/_netCDF4.pyx", line 2454, in netCDF4._netCDF4.Dataset.__init__
  File "src/netCDF4/_netCDF4.pyx", line [20](https://github.com/Unidata/netcdf4-python/actions/runs/5015979728/jobs/8992262365?pr=1253#step:6:21)13, in netCDF4._netCDF4._ensure_nc_success
PermissionError: [Errno 13] Permission denied: '/var/folders/[24](https://github.com/Unidata/netcdf4-python/actions/runs/5015979728/jobs/8992262365?pr=1253#step:6:25)/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmpajh6quvt.nc'

Maybe clear the LRU cache in the __dealloc__ Dataset method?

@ocefpaf ocefpaf force-pushed the lru_cache_4_get_variables_by_attributes branch from ad8df0a to b739ccc Compare May 22, 2023 13:46
# close file when there are no references to object left
# close file when there are no references to object left and clear the cache.
if self.get_variables_by_attributes:
self.get_variables_by_attributes.cache_clear()
Copy link
Collaborator Author

@ocefpaf ocefpaf May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do need this but this alone did not fix the failing test. In fact, the __dealloc__ test fails when I add a function object in the class __init__ that is a reference to the defined one, no relation to caching or not. I don't know enough about cython to understand what is wrong there.

@@ -3333,9 +3336,10 @@ of existing (sub-) groups and their variables.
for group in groups:
group.set_ncstring_attrs(value) # recurse into subgroups...

@functools.lru_cache(maxsize=128)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the decorator here makes things simpler but can lead to memory leaks, see python/cpython#64058. However, b/c I could not make the most common fix for this work here and I believe with the cache clear in the __dealloc__ we cover the cases of opening and closing multiple datasets, I don't think this would be a problem. (I may be wrong here! I'm not an expert and I'm envisioning a subset of the use cases I know.)

What do you think @jswhit?

The alternative would be to factor this function out into a private one that takes self.variables and self._vars, cache that one, and use it in the class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ocefpaf I'm OK with this approach unless there are significant memory leaks. I guess you could create a test for memory leaks by repeatedly opening and closing the same file and invoking get_variables_by_attributes multiple times to see what happens to memory usage.

@ocefpaf ocefpaf force-pushed the lru_cache_4_get_variables_by_attributes branch from 1618ea5 to 91af6c5 Compare May 22, 2023 17:30
@ocefpaf ocefpaf marked this pull request as ready for review May 22, 2023 17:38
@ocefpaf ocefpaf force-pushed the lru_cache_4_get_variables_by_attributes branch from 2512ed3 to 412fbb3 Compare May 24, 2023 13:43
@ocefpaf ocefpaf marked this pull request as draft May 24, 2023 14:29
@@ -0,0 +1,52 @@
import os
import tracemalloc
Copy link
Collaborator Author

@ocefpaf ocefpaf May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I'm using this correctly. I'm out of my depth here. Also, it fails in the CIs but works locally. Not sure why.


Edit: moving it before the __main__ works and the reported memory seems OK.

netcdf_file = os.path.join(os.path.dirname(__file__), "netcdf_dummy_file.nc")
snapshot = tracemalloc.take_snapshot()

# k_times = 1000_000
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely don't want a large number on the CIs, or maybe we don't even want this test to be run. With that said, I ran this locally with 1000_000 and it worked out fine. Here are the stats for it:

/home/filipe/Dropbox/pymodules/01-forks/netcdf4-python/test/tst_multiple_open_close.py:0: size=1555 KiB (+1554 KiB), count=15380 (+15377), average=104 B
/home/filipe/micromamba/envs/NETCDF4/lib/python3.11/argparse.py:0: size=12.7 KiB (-4240 B), count=123 (-44), average=105 B
/home/filipe/micromamba/envs/NETCDF4/lib/python3.11/tracemalloc.py:0: size=896 B (+840 B), count=9 (+8), average=100 B
/home/filipe/micromamba/envs/NETCDF4/lib/python3.11/re/_compiler.py:0: size=1448 B (-224 B), count=5 (-4), average=290 B
/home/filipe/micromamba/envs/NETCDF4/lib/python3.11/contextlib.py:0: size=592 B (-128 B), count=3 (-2), average=197 B
/home/filipe/micromamba/envs/NETCDF4/lib/python3.11/re/_parser.py:0: size=0 B (-112 B), count=0 (-2)
<frozen posixpath>:0: size=129 B (-96 B), count=1 (-2), average=129 B
/home/filipe/micromamba/envs/NETCDF4/lib/python3.11/unittest/case.py:0: size=864 B (-64 B), count=7 (-1), average=123 B
/home/filipe/micromamba/envs/NETCDF4/lib/python3.11/unittest/main.py:0: size=1140 B (+0 B), count=9 (+0), average=127 B
/home/filipe/micromamba/envs/NETCDF4/lib/python3.11/unittest/runner.py:0: size=1088 B (+0 B), count=8 (+0), average=136 B

It doesn't seem to produce a leak.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to run this on the CIs - it's enough to know that there is no significant leak now. Thanks for creating that test, will be very useful to have. I think this is now ready to merge.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to keep this test but not run it? Or should I just remove it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep the test but just don't run it

@ocefpaf ocefpaf marked this pull request as ready for review May 24, 2023 15:39
test/run_all.py Outdated
@@ -57,6 +57,11 @@
test_files.remove('tst_cdl.py');
sys.stdout.write('not running tst_cdl.py ...\n')

# Don't run computationally intensive test
if os.getenv('MEMORY_LEAK_TEST'):
Copy link
Collaborator

@jswhit jswhit May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want if not os.getenv('MEMORY_LEAK_TEST'): so the test doesn't run unless the env var is set? Looks like the test did run (and pass) in the latest github actions.

@jswhit jswhit merged commit a910646 into Unidata:master May 25, 2023
@ocefpaf ocefpaf deleted the lru_cache_4_get_variables_by_attributes branch May 25, 2023 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants