-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
high memory usage when appending uds
to partitions
list
#287
Comments
What is very interesting is that using a import os
import glob
import xugrid as xu
import xarray as xr
import datetime as dt
from time import sleep
def open_part_ds(file_nc_list, withwith):
print(f'>> xu.open_dataset() with {len(file_nc_list)} partition(s): ',end='')
dtstart = dt.datetime.now()
partitions = []
for iF, file_nc_one in enumerate(file_nc_list):
print(iF+1,end=' ')
if withwith:
with xr.open_mfdataset(file_nc_one, chunks="auto") as ds_one:
uds_one = xu.core.wrap.UgridDataset(ds_one)
partitions.append(uds_one)
else:
ds_one = xr.open_mfdataset(file_nc_one, chunks="auto")
uds_one = xu.core.wrap.UgridDataset(ds_one)
# ds_one.close()
# uds_one.close()
partitions.append(uds_one)
print(': ',end='')
print(f'{(dt.datetime.now()-dtstart).total_seconds():.2f} sec')
print('>> xu.merge_partitions(): ',end='')
dtstart = dt.datetime.now()
uds = xu.merge_partitions(partitions)
print(f'{(dt.datetime.now()-dtstart).total_seconds():.2f} sec')
return uds
dir_model = r"p:\11210284-011-nose-c-cycling\runs_fine_grid\B05_waq_2012_PCO2_ChlC_NPCratios_DenWat_stats_2023.01\B05_waq_2012_PCO2_ChlC_NPCratios_DenWat_stats_2023.01\DFM_OUTPUT_DCSM-FM_0_5nm_waq"
file_nc_pat = os.path.join(dir_model, "DCSM-FM_0_5nm_waq_0*_map.nc")
file_nc_list_all = glob.glob(file_nc_pat)
file_nc_list = file_nc_list_all[:5]
uds = open_part_ds(file_nc_list, withwith=False)
sleep(2) Or From this it can be concluded that it is wise to close the original xarray dataset if not using it anymore. The time/memory consumption by merging will be unaffected by this. I will at least pick this up in Deltares/dfm_tools#968, but it might also be good to add it to the xugrid documentation. Adding it to |
If the user also does another action (like plotting a single timestep) on the merged dataset, the memory usage increases again to the usage that we saw without |
Since this is not an issue with xugrid, this issue can be closed. |
Running the following script called
memory_usage.py
with memory_profiler viamprof run python memory_usage.py
andmprof plot
:Results in this memory usage:

However, when commenting

partitions.append(uds_one)
, we get way less memory usage and we see garbage collection in action:The accumulating memory consumption upon appending is inconvenient, since we want to make a list of partitions for
xu.merge_partitions()
. When callinggc.collect()
afterxr.open_dataset()
(or elsewhere), this does not make a difference.Might be related to:
The text was updated successfully, but these errors were encountered: