Implement lru_cache for get_variables_by_attributes #1253

ocefpaf · 2023-05-18T16:17:51Z

Still a work in progress and should be reviewed after #1252.

I'm not sure this is the best approach b/c the Dataset class will get a nice cached version of it without the memory leak caused if we use the decorator directly. However, that means the MFDataset will get the uncached one. We can do the same we did for Dataset in this PR to implement a cached version for MFDataset but I'm not sure that is worth the extra code (burden).

ocefpaf · 2023-05-18T16:32:34Z

My changes caused the error in the serial variant but I'm not sure how to fix it.

ERROR: runTest (tst_refcount.RefCountTestCase.runTest)
testing garbage collection (issue 218)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/runner/work/netcdf4-python/netcdf4-python/test/tst_refcount.py", line 25, in runTest
    nc = netCDF4.Dataset(self.file, mode='w', format='NETCDF4')
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/netCDF4/_netCDF4.pyx", line 2454, in netCDF4._netCDF4.Dataset.__init__
  File "src/netCDF4/_netCDF4.pyx", line [20](https://github.com/Unidata/netcdf4-python/actions/runs/5015979728/jobs/8992262365?pr=1253#step:6:21)13, in netCDF4._netCDF4._ensure_nc_success
PermissionError: [Errno 13] Permission denied: '/var/folders/[24](https://github.com/Unidata/netcdf4-python/actions/runs/5015979728/jobs/8992262365?pr=1253#step:6:25)/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmpajh6quvt.nc'

jswhit · 2023-05-20T14:09:53Z

My changes caused the error in the serial variant but I'm not sure how to fix it.

ERROR: runTest (tst_refcount.RefCountTestCase.runTest)
testing garbage collection (issue 218)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/runner/work/netcdf4-python/netcdf4-python/test/tst_refcount.py", line 25, in runTest
    nc = netCDF4.Dataset(self.file, mode='w', format='NETCDF4')
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/netCDF4/_netCDF4.pyx", line 2454, in netCDF4._netCDF4.Dataset.__init__
  File "src/netCDF4/_netCDF4.pyx", line [20](https://github.com/Unidata/netcdf4-python/actions/runs/5015979728/jobs/8992262365?pr=1253#step:6:21)13, in netCDF4._netCDF4._ensure_nc_success
PermissionError: [Errno 13] Permission denied: '/var/folders/[24](https://github.com/Unidata/netcdf4-python/actions/runs/5015979728/jobs/8992262365?pr=1253#step:6:25)/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmpajh6quvt.nc'

Maybe clear the LRU cache in the __dealloc__ Dataset method?

ocefpaf · 2023-05-22T16:15:51Z

src/netCDF4/_netCDF4.pyx

-        # close file when there are no references to object left
+        # close file when there are no references to object left and clear the cache.
+        if self.get_variables_by_attributes:
+            self.get_variables_by_attributes.cache_clear()


we do need this but this alone did not fix the failing test. In fact, the __dealloc__ test fails when I add a function object in the class __init__ that is a reference to the defined one, no relation to caching or not. I don't know enough about cython to understand what is wrong there.

ocefpaf · 2023-05-22T16:26:07Z

src/netCDF4/_netCDF4.pyx

@@ -3333,9 +3336,10 @@ of existing (sub-) groups and their variables.
            for group in groups:
                group.set_ncstring_attrs(value) # recurse into subgroups...

+    @functools.lru_cache(maxsize=128)


Adding the decorator here makes things simpler but can lead to memory leaks, see python/cpython#64058. However, b/c I could not make the most common fix for this work here and I believe with the cache clear in the __dealloc__ we cover the cases of opening and closing multiple datasets, I don't think this would be a problem. (I may be wrong here! I'm not an expert and I'm envisioning a subset of the use cases I know.)

What do you think @jswhit?

The alternative would be to factor this function out into a private one that takes self.variables and self._vars, cache that one, and use it in the class.

@ocefpaf I'm OK with this approach unless there are significant memory leaks. I guess you could create a test for memory leaks by repeatedly opening and closing the same file and invoking get_variables_by_attributes multiple times to see what happens to memory usage.

ocefpaf · 2023-05-24T13:37:08Z

test/tst_multiple_open_close.py

@@ -0,0 +1,52 @@
+import os
+import tracemalloc


I'm not sure if I'm using this correctly. I'm out of my depth here. ~~Also, it fails in the CIs but works locally. Not sure why.~~

Edit: moving it before the __main__ works and the reported memory seems OK.

ocefpaf · 2023-05-24T13:38:09Z

test/tst_multiple_open_close.py

+        netcdf_file = os.path.join(os.path.dirname(__file__), "netcdf_dummy_file.nc")
+        snapshot = tracemalloc.take_snapshot()
+
+        # k_times = 1000_000


We definitely don't want a large number on the CIs, or maybe we don't even want this test to be run. With that said, I ran this locally with 1000_000 and it worked out fine. Here are the stats for it:

/home/filipe/Dropbox/pymodules/01-forks/netcdf4-python/test/tst_multiple_open_close.py:0: size=1555 KiB (+1554 KiB), count=15380 (+15377), average=104 B /home/filipe/micromamba/envs/NETCDF4/lib/python3.11/argparse.py:0: size=12.7 KiB (-4240 B), count=123 (-44), average=105 B /home/filipe/micromamba/envs/NETCDF4/lib/python3.11/tracemalloc.py:0: size=896 B (+840 B), count=9 (+8), average=100 B /home/filipe/micromamba/envs/NETCDF4/lib/python3.11/re/_compiler.py:0: size=1448 B (-224 B), count=5 (-4), average=290 B /home/filipe/micromamba/envs/NETCDF4/lib/python3.11/contextlib.py:0: size=592 B (-128 B), count=3 (-2), average=197 B /home/filipe/micromamba/envs/NETCDF4/lib/python3.11/re/_parser.py:0: size=0 B (-112 B), count=0 (-2) <frozen posixpath>:0: size=129 B (-96 B), count=1 (-2), average=129 B /home/filipe/micromamba/envs/NETCDF4/lib/python3.11/unittest/case.py:0: size=864 B (-64 B), count=7 (-1), average=123 B /home/filipe/micromamba/envs/NETCDF4/lib/python3.11/unittest/main.py:0: size=1140 B (+0 B), count=9 (+0), average=127 B /home/filipe/micromamba/envs/NETCDF4/lib/python3.11/unittest/runner.py:0: size=1088 B (+0 B), count=8 (+0), average=136 B

It doesn't seem to produce a leak.

I don't think we need to run this on the CIs - it's enough to know that there is no significant leak now. Thanks for creating that test, will be very useful to have. I think this is now ready to merge.

Do you want to keep this test but not run it? Or should I just remove it?

keep the test but just don't run it

jswhit · 2023-05-24T20:46:35Z

test/run_all.py

@@ -57,6 +57,11 @@
    test_files.remove('tst_cdl.py');
    sys.stdout.write('not running tst_cdl.py ...\n')

+# Don't run computationally intensive test
+if os.getenv('MEMORY_LEAK_TEST'):


I think we want if not os.getenv('MEMORY_LEAK_TEST'): so the test doesn't run unless the env var is set? Looks like the test did run (and pass) in the latest github actions.

ocefpaf mentioned this pull request May 18, 2023

remove MemoizedDataset ioos/compliance-checker#1029

Merged

ocefpaf added 2 commits May 22, 2023 10:46

implement lru_cache for get_variables_by_attributes

11eeb23

clear cache

b739ccc

ocefpaf force-pushed the lru_cache_4_get_variables_by_attributes branch from ad8df0a to b739ccc Compare May 22, 2023 13:46

ocefpaf commented May 22, 2023

View reviewed changes

decorate the class method

91af6c5

ocefpaf force-pushed the lru_cache_4_get_variables_by_attributes branch from 1618ea5 to 91af6c5 Compare May 22, 2023 17:30

ocefpaf marked this pull request as ready for review May 22, 2023 17:38

opening, calling gba, and closing, multiple times

412fbb3

ocefpaf force-pushed the lru_cache_4_get_variables_by_attributes branch from 2512ed3 to 412fbb3 Compare May 24, 2023 13:43

ocefpaf marked this pull request as draft May 24, 2023 14:29

ocefpaf commented May 24, 2023

View reviewed changes

move inside the function

e83c208

ocefpaf marked this pull request as ready for review May 24, 2023 15:39

don't run memory leak test

e08f7bd

jswhit reviewed May 24, 2023

View reviewed changes

fix copy-n-pasta

60cd3a0

jswhit merged commit a910646 into Unidata:master May 25, 2023

ocefpaf deleted the lru_cache_4_get_variables_by_attributes branch May 25, 2023 14:03

ocefpaf mentioned this pull request May 21, 2024

Should fix more but now all "Error in sys.excepthook:" ioos/compliance-checker#1069

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement lru_cache for get_variables_by_attributes #1253

Implement lru_cache for get_variables_by_attributes #1253

ocefpaf commented May 18, 2023

ocefpaf commented May 18, 2023

jswhit commented May 20, 2023

ocefpaf May 22, 2023 •

edited

Loading

ocefpaf May 22, 2023

jswhit May 22, 2023

ocefpaf May 24, 2023 •

edited

Loading

ocefpaf May 24, 2023

jswhit May 24, 2023

ocefpaf May 24, 2023

jswhit May 24, 2023

jswhit May 24, 2023 •

edited

Loading

Implement lru_cache for get_variables_by_attributes #1253

Implement lru_cache for get_variables_by_attributes #1253

Conversation

ocefpaf commented May 18, 2023

ocefpaf commented May 18, 2023

jswhit commented May 20, 2023

ocefpaf May 22, 2023 • edited Loading

Choose a reason for hiding this comment

ocefpaf May 22, 2023

Choose a reason for hiding this comment

jswhit May 22, 2023

Choose a reason for hiding this comment

ocefpaf May 24, 2023 • edited Loading

Choose a reason for hiding this comment

ocefpaf May 24, 2023

Choose a reason for hiding this comment

jswhit May 24, 2023

Choose a reason for hiding this comment

ocefpaf May 24, 2023

Choose a reason for hiding this comment

jswhit May 24, 2023

Choose a reason for hiding this comment

jswhit May 24, 2023 • edited Loading

Choose a reason for hiding this comment

ocefpaf May 22, 2023 •

edited

Loading

ocefpaf May 24, 2023 •

edited

Loading

jswhit May 24, 2023 •

edited

Loading