Skip to content
This repository has been archived by the owner on Aug 29, 2023. It is now read-only.

Add new instance id #2

Closed
wants to merge 8 commits into from
Closed

Conversation

jbusecke
Copy link
Collaborator

No description provided.

@pangeo-forge-bot
Copy link
Collaborator

🎉 New recipe runs created for the following recipes at sha 0bc9bd48b61ab0e46b91a61dc2c8d01e50114dfe:

@pangeo-forge-bot
Copy link
Collaborator

🎉 New recipe runs created for the following recipes at sha d7716b19dacaa5b70836bcdc6ec951713dd39bad:

@cisaacstern
Copy link
Member

/run recipe-test recipe_run_id=63

@pangeo-forge-bot
Copy link
Collaborator

✨ A test of your recipe CMIP6.CMIP.MOHC.UKESM1-0-LL.historical.r1i1p1f2.SImon.siitdconc.gn.v20200309 is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/63

@pangeo-forge-bot
Copy link
Collaborator

Pangeo Cloud told me that our test of your recipe CMIP6.CMIP.MOHC.UKESM1-0-LL.historical.r1i1p1f2.SImon.siitdconc.gn.v20200309 failed. But don't worry, I'm sure we can fix this!

To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/63

If you haven't yet tried pruning and running your recipe locally, I suggest trying that now.

Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps!

@pangeo-forge-bot
Copy link
Collaborator

🎉 New recipe runs created for the following recipes at sha 676441f956558a9299f86428079d1560ad40dbf3:

@jbusecke
Copy link
Collaborator Author

@cisaacstern can we naively rerun 94? I set the subset value to a quite high number, and just want to check if that is indeed the issue.

@cisaacstern
Copy link
Member

Sure thing. As a side note, I'm confused as to how we jumped from recipe run #s in the low sixties to the low nineties over the weekend, when I don't really see where those additional increments are coming from. But that's another issue, running 94 now!

@cisaacstern
Copy link
Member

/run recipe-test recipe_run_id=94

@pangeo-forge-bot
Copy link
Collaborator

✨ A test of your recipe CMIP6.CMIP.MOHC.UKESM1-0-LL.historical.r1i1p1f2.SImon.siitdconc.gn.v20200309 is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/94

@pangeo-forge-bot
Copy link
Collaborator

Pangeo Cloud told me that our test of your recipe CMIP6.CMIP.MOHC.UKESM1-0-LL.historical.r1i1p1f2.SImon.siitdconc.gn.v20200309 failed. But don't worry, I'm sure we can fix this!

To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/94

If you haven't yet tried pruning and running your recipe locally, I suggest trying that now.

Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps!

@cisaacstern
Copy link
Member

Another killed worker. @jbusecke have you tried running this is in a local notebook? That's probably the best way tweak the subset_inputs kwarg at this point.

@pangeo-forge-bot
Copy link
Collaborator

🎉 New recipe runs created for the following recipes at sha b3a7639a06ffbb53697f17c78bb0dbe4b77d9a69:

@cisaacstern
Copy link
Member

/run recipe-test recipe_run_id=98

@pangeo-forge-bot
Copy link
Collaborator

✨ A test of your recipe CMIP6.CMIP.MOHC.UKESM1-0-LL.historical.r1i1p1f2.SImon.siitdconc.gn.v20200309 is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/98

@pangeo-forge-bot
Copy link
Collaborator

Pangeo Cloud told me that our test of your recipe CMIP6.CMIP.MOHC.UKESM1-0-LL.historical.r1i1p1f2.SImon.siitdconc.gn.v20200309 failed. But don't worry, I'm sure we can fix this!

To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/98

If you haven't yet tried pruning and running your recipe locally, I suggest trying that now.

Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps!

@pangeo-forge-bot
Copy link
Collaborator

🎉 New recipe runs created for the following recipes at sha 988b416d25f9673db023231ae7f8b0d40c77cb04:

@pangeo-forge-bot
Copy link
Collaborator

🎉 New recipe runs created for the following recipes at sha cd914dfa4746ef109443bbffe99720f82c3702d1:

@cisaacstern
Copy link
Member

/run recipe-test recipe_run_id=104

@pangeo-forge-bot
Copy link
Collaborator

✨ A test of your recipe CMIP6.CMIP.MOHC.UKESM1-0-LL.historical.r1i1p1f2.SImon.siitdconc.gn.v20200309 is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/104

@pangeo-forge-bot
Copy link
Collaborator

🎉 New recipe runs created for the following recipes at sha 5ae5ec1f9f1850f0e6924be362f42ac2076d0c36:

@cisaacstern
Copy link
Member

/run recipe-test recipe_run_id=108

@pangeo-forge-bot
Copy link
Collaborator

✨ A test of your recipe CMIP6.CMIP.NOAA-GFDL.GFDL-CM4.historical.r1i1p1f1.SImon.sithick.gn.v20180701 is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/108

@cisaacstern
Copy link
Member

/run recipe-test recipe_run_id=109

@pangeo-forge-bot
Copy link
Collaborator

✨ A test of your recipe CMIP6.CMIP.NOAA-GFDL.GFDL-CM4.historical.r1i1p1f1.SImon.siconc.gn.v20180701 is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/109

1 similar comment
@pangeo-forge-bot
Copy link
Collaborator

✨ A test of your recipe CMIP6.CMIP.NOAA-GFDL.GFDL-CM4.historical.r1i1p1f1.SImon.siconc.gn.v20180701 is now running on Pangeo Forge Cloud!

I'll notify you with a comment on this thread when this test is complete. (This could be a little while...)

In the meantime, you can follow the logs for this recipe run at https://pangeo-forge.org/dashboard/recipe-run/109

@pangeo-forge-bot
Copy link
Collaborator

Pangeo Cloud told me that our test of your recipe CMIP6.CMIP.NOAA-GFDL.GFDL-CM4.historical.r1i1p1f1.SImon.siconc.gn.v20180701 failed. But don't worry, I'm sure we can fix this!

To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/109

If you haven't yet tried pruning and running your recipe locally, I suggest trying that now.

Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps!

1 similar comment
@pangeo-forge-bot
Copy link
Collaborator

Pangeo Cloud told me that our test of your recipe CMIP6.CMIP.NOAA-GFDL.GFDL-CM4.historical.r1i1p1f1.SImon.siconc.gn.v20180701 failed. But don't worry, I'm sure we can fix this!

To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/109

If you haven't yet tried pruning and running your recipe locally, I suggest trying that now.

Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps!

@pangeo-forge-bot
Copy link
Collaborator

Pangeo Cloud told me that our test of your recipe CMIP6.CMIP.NOAA-GFDL.GFDL-CM4.historical.r1i1p1f1.SImon.sithick.gn.v20180701 failed. But don't worry, I'm sure we can fix this!

To see what error caused the failure, please review the logs at https://pangeo-forge.org/dashboard/recipe-run/108

If you haven't yet tried pruning and running your recipe locally, I suggest trying that now.

Please report back on the results of your local testing in a new comment below, and a Pangeo Forge maintainer will help you with next steps!

@cisaacstern
Copy link
Member

@rabernat, Julius and I are seeing the following error when running a test of recipe_4 from this PR on Pangeo Forge Cloud:

AttributeError: 'NoneType' object has no attribute 'lock_release'


2022-04-29T19:56:16.625736338Z stderr F distributed.nanny - INFO -         Start Nanny at: 'tcp://10.60.5.4:40227'
--
  |   | 2022-04-29T19:56:20.833508624Z stderr F distributed.worker - INFO -       Start worker at:      tcp://10.60.5.4:32987
  |   | 2022-04-29T19:56:20.833573941Z stderr F distributed.worker - INFO -          Listening to:      tcp://10.60.5.4:32987
  |   | 2022-04-29T19:56:20.833825584Z stderr F distributed.worker - INFO -          dashboard at:            10.60.5.4:44343
  |   | 2022-04-29T19:56:20.834045602Z stderr F distributed.worker - INFO - Waiting to connect to: tcp://dask-jovyan-d39353ef-8.pangeo-forge-columbia-staging-bakery:8786
  |   | 2022-04-29T19:56:20.834167535Z stderr F distributed.worker - INFO - -------------------------------------------------
  |   | 2022-04-29T19:56:20.885289315Z stderr F distributed.worker - INFO -               Threads:                          1
  |   | 2022-04-29T19:56:20.885424473Z stderr F distributed.worker - INFO -                Memory:                   3.90 GiB
  |   | 2022-04-29T19:56:20.885434672Z stderr F distributed.worker - INFO -       Local Directory: /home/jovyan/dask-worker-space/worker-i_k7r44c
  |   | 2022-04-29T19:56:20.885554599Z stderr F distributed.worker - INFO - -------------------------------------------------
  |   | 2022-04-29T19:56:20.904091335Z stderr F distributed.worker - INFO -         Registered to: tcp://dask-jovyan-d39353ef-8.pangeo-forge-columbia-staging-bakery:8786
  |   | 2022-04-29T19:56:20.904171627Z stderr F distributed.worker - INFO - -------------------------------------------------
  |   | 2022-04-29T19:56:20.905137091Z stderr F distributed.core - INFO - Starting established connection
  |   | 2022-04-29T19:56:36.085992583Z stderr F distributed.core - INFO - Event loop was unresponsive in Worker for 15.04s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
  |   | 2022-04-29T19:56:36.341685239Z stdout F [2022-04-29 19:56:36+0000] INFO - prefect.CloudTaskRunner \| Task 'store_chunk[5]': Starting task run...
  |   | 2022-04-29T19:56:36.460479844Z stderr F INFO:pangeo_forge_recipes.recipes.xarray_zarr:Opening inputs for chunk Index({DimIndex(name='time', index=5, sequence_len=15, operation=<CombineOp.SUBSET: 3>), DimIndex(name='time', index=0, sequence_len=2, operation=<CombineOp.CONCAT: 2>)})
  |   | 2022-04-29T19:56:36.460575417Z stderr F INFO:pangeo_forge_recipes.recipes.xarray_zarr:Opening input with Xarray Index({DimIndex(name='time', index=0, sequence_len=2, operation=<CombineOp.CONCAT: 2>)}): 'http://aims3.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/historical/r1i1p1f1/SImon/sithick/gn/v20180701/sithick_SImon_GFDL-CM4_historical_r1i1p1f1_gn_185001-194912.nc'
  |   | 2022-04-29T19:56:36.460611843Z stderr F INFO:pangeo_forge_recipes.storage:Opening 'http://aims3.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/historical/r1i1p1f1/SImon/sithick/gn/v20180701/sithick_SImon_GFDL-CM4_historical_r1i1p1f1_gn_185001-194912.nc' from cache
  |   | 2022-04-29T19:56:36.460719823Z stderr F DEBUG:pangeo_forge_recipes.storage:file_opener entering first context for <contextlib._GeneratorContextManager object at 0x7ff83a255ac0>
  |   | 2022-04-29T19:56:36.461193991Z stderr F DEBUG:pangeo_forge_recipes.storage:entering fs.open context manager for pfcsb-bucket/cache/d3dad268f59c0449814856790ecb884d-http_aims3.llnl.gov_thredds_fileserver_css03_data_cmip6_cmip_noaa-gfdl_gfdl-cm4_historical_r1i1p1f1_simon_sithick_gn_v20180701_sithick_simon_gfdl-cm4_historical_r1i1p1f1_gn_185001-194912.nc
  |   | 2022-04-29T19:56:36.543465161Z stderr F DEBUG:pangeo_forge_recipes.storage:FSSpecTarget.open yielding <File-like object GCSFileSystem, pfcsb-bucket/cache/d3dad268f59c0449814856790ecb884d-http_aims3.llnl.gov_thredds_fileserver_css03_data_cmip6_cmip_noaa-gfdl_gfdl-cm4_historical_r1i1p1f1_simon_sithick_gn_v20180701_sithick_simon_gfdl-cm4_historical_r1i1p1f1_gn_185001-194912.nc>
  |   | 2022-04-29T19:56:36.543535256Z stderr F DEBUG:pangeo_forge_recipes.storage:file_opener entering second context for <File-like object GCSFileSystem, pfcsb-bucket/cache/d3dad268f59c0449814856790ecb884d-http_aims3.llnl.gov_thredds_fileserver_css03_data_cmip6_cmip_noaa-gfdl_gfdl-cm4_historical_r1i1p1f1_simon_sithick_gn_v20180701_sithick_simon_gfdl-cm4_historical_r1i1p1f1_gn_185001-194912.nc>
  |   | 2022-04-29T19:56:36.543614333Z stderr F DEBUG:pangeo_forge_recipes.recipes.xarray_zarr:about to enter xr.open_dataset context on <File-like object GCSFileSystem, pfcsb-bucket/cache/d3dad268f59c0449814856790ecb884d-http_aims3.llnl.gov_thredds_fileserver_css03_data_cmip6_cmip_noaa-gfdl_gfdl-cm4_historical_r1i1p1f1_simon_sithick_gn_v20180701_sithick_simon_gfdl-cm4_historical_r1i1p1f1_gn_185001-194912.nc>
  |   | 2022-04-29T19:56:42.227516552Z stderr F DEBUG:pangeo_forge_recipes.recipes.xarray_zarr:successfully opened dataset
  |   | 2022-04-29T19:56:42.240394762Z stderr F DEBUG:pangeo_forge_recipes.recipes.xarray_zarr:<xarray.Dataset>
  |   | 2022-04-29T19:56:42.240458564Z stderr F Dimensions:    (bnds: 2, time: 1200, y: 1080, x: 1440, xTe: 1441, yTe: 1081, vertex: 4)
  |   | 2022-04-29T19:56:42.240473445Z stderr F Coordinates:
  |   | 2022-04-29T19:56:42.240497895Z stderr F   * bnds       (bnds) float64 1.0 2.0
  |   | 2022-04-29T19:56:42.240508252Z stderr F   * time       (time) object 1850-01-16 12:00:00 ... 1949-12-16 12:00:00
  |   | 2022-04-29T19:56:42.24051703Z stderr F   * x          (x) float64 -299.7 -299.5 -299.2 -299.0 ... 59.53 59.78 60.03
  |   | 2022-04-29T19:56:42.240540578Z stderr F   * xTe        (xTe) float64 -299.8 -299.6 -299.3 -299.1 ... 59.66 59.91 60.16
  |   | 2022-04-29T19:56:42.24055001Z stderr F   * y          (y) float64 -80.39 -80.31 -80.23 -80.15 ... 89.73 89.84 89.95
  |   | 2022-04-29T19:56:42.24055787Z stderr F   * yTe        (yTe) float64 -80.43 -80.35 -80.27 -80.19 ... 89.78 89.89 90.0
  |   | 2022-04-29T19:56:42.240565975Z stderr F     lon        (y, x) float32 ...
  |   | 2022-04-29T19:56:42.24057309Z stderr F     lat        (y, x) float32 ...
  |   | 2022-04-29T19:56:42.240580755Z stderr F Dimensions without coordinates: vertex
  |   | 2022-04-29T19:56:42.240587148Z stderr F Data variables:
  |   | 2022-04-29T19:56:42.240593795Z stderr F     sithick    (time, y, x) float32 ...
  |   | 2022-04-29T19:56:42.240600735Z stderr F     time_bnds  (time, bnds) object ...
  |   | 2022-04-29T19:56:42.240606969Z stderr F     lat_bnds   (y, x, vertex) float32 ...
  |   | 2022-04-29T19:56:42.240614762Z stderr F     lon_bnds   (y, x, vertex) float32 ...
  |   | 2022-04-29T19:56:42.240621578Z stderr F Attributes: (12/46)
  |   | 2022-04-29T19:56:42.240628795Z stderr F     history:                File was processed by fremetar (GFDL analog of CM...
  |   | 2022-04-29T19:56:42.240635632Z stderr F     table_id:               SImon
  |   | 2022-04-29T19:56:42.24064225Z stderr F     activity_id:            CMIP
  |   | 2022-04-29T19:56:42.240648831Z stderr F     branch_method:          standard
  |   | 2022-04-29T19:56:42.240655586Z stderr F     branch_time_in_child:   0.0
  |   | 2022-04-29T19:56:42.240662247Z stderr F     comment:                <null ref>
  |   | 2022-04-29T19:56:42.240669499Z stderr F     ...                     ...
  |   | 2022-04-29T19:56:42.240676278Z stderr F     variable_id:            sithick
  |   | 2022-04-29T19:56:42.240683107Z stderr F     variant_info:           N/A
  |   | 2022-04-29T19:56:42.240691358Z stderr F     references:             see further_info_url attribute
  |   | 2022-04-29T19:56:42.240698917Z stderr F     variant_label:          r1i1p1f1
  |   | 2022-04-29T19:56:42.240705843Z stderr F     branch_time_in_parent:  36500.0
  |   | 2022-04-29T19:56:42.240713216Z stderr F     parent_time_units:      days since 0001-1-1
  |   | 2022-04-29T19:56:42.24103179Z stderr F INFO:pangeo_forge_recipes.recipes.xarray_zarr:Subsetting input according to time-5
  |   | 2022-04-29T19:56:42.241106002Z stderr F DEBUG:pangeo_forge_recipes.recipes.xarray_zarr:Subsetting dataset with indexer {'time': slice(400, 480, None)}
  |   | 2022-04-29T19:56:42.247336395Z stderr F INFO:pangeo_forge_recipes.recipes.xarray_zarr:Combining inputs for chunk 'Index({DimIndex(name='time', index=5, sequence_len=15, operation=<CombineOp.SUBSET: 3>), DimIndex(name='time', index=0, sequence_len=2, operation=<CombineOp.CONCAT: 2>)})'
  |   | 2022-04-29T19:56:42.287602094Z stderr F DEBUG:pangeo_forge_recipes.recipes.xarray_zarr:<xarray.Dataset>
  |   | 2022-04-29T19:56:42.287656856Z stderr F Dimensions:    (bnds: 2, time: 80, y: 1080, x: 1440, xTe: 1441, yTe: 1081, vertex: 4)
  |   | 2022-04-29T19:56:42.287686494Z stderr F Coordinates:
  |   | 2022-04-29T19:56:42.287695499Z stderr F   * bnds       (bnds) float64 1.0 2.0
  |   | 2022-04-29T19:56:42.287704274Z stderr F   * time       (time) object 1883-05-16 12:00:00 ... 1889-12-16 12:00:00
  |   | 2022-04-29T19:56:42.287712385Z stderr F   * x          (x) float64 -299.7 -299.5 -299.2 -299.0 ... 59.53 59.78 60.03
  |   | 2022-04-29T19:56:42.287720188Z stderr F   * xTe        (xTe) float64 -299.8 -299.6 -299.3 -299.1 ... 59.66 59.91 60.16
  |   | 2022-04-29T19:56:42.287728Z stderr F   * y          (y) float64 -80.39 -80.31 -80.23 -80.15 ... 89.73 89.84 89.95
  |   | 2022-04-29T19:56:42.287734388Z stderr F   * yTe        (yTe) float64 -80.43 -80.35 -80.27 -80.19 ... 89.78 89.89 90.0
  |   | 2022-04-29T19:56:42.287740812Z stderr F     lon        (y, x) float32 dask.array<chunksize=(1080, 1440), meta=np.ndarray>
  |   | 2022-04-29T19:56:42.287747939Z stderr F     lat        (y, x) float32 dask.array<chunksize=(1080, 1440), meta=np.ndarray>
  |   | 2022-04-29T19:56:42.287755222Z stderr F Dimensions without coordinates: vertex
  |   | 2022-04-29T19:56:42.287761484Z stderr F Data variables:
  |   | 2022-04-29T19:56:42.287769416Z stderr F     sithick    (time, y, x) float32 dask.array<chunksize=(80, 1080, 1440), meta=np.ndarray>
  |   | 2022-04-29T19:56:42.287786406Z stderr F     time_bnds  (time, bnds) object dask.array<chunksize=(80, 2), meta=np.ndarray>
  |   | 2022-04-29T19:56:42.287794244Z stderr F     lat_bnds   (y, x, vertex) float32 dask.array<chunksize=(1080, 1440, 4), meta=np.ndarray>
  |   | 2022-04-29T19:56:42.287800505Z stderr F     lon_bnds   (y, x, vertex) float32 dask.array<chunksize=(1080, 1440, 4), meta=np.ndarray>
  |   | 2022-04-29T19:56:42.287807212Z stderr F Attributes: (12/46)
  |   | 2022-04-29T19:56:42.287814043Z stderr F     history:                File was processed by fremetar (GFDL analog of CM...
  |   | 2022-04-29T19:56:42.287820676Z stderr F     table_id:               SImon
  |   | 2022-04-29T19:56:42.287827816Z stderr F     activity_id:            CMIP
  |   | 2022-04-29T19:56:42.28783502Z stderr F     branch_method:          standard
  |   | 2022-04-29T19:56:42.287842076Z stderr F     branch_time_in_child:   0.0
  |   | 2022-04-29T19:56:42.287849585Z stderr F     comment:                <null ref>
  |   | 2022-04-29T19:56:42.28785678Z stderr F     ...                     ...
  |   | 2022-04-29T19:56:42.287863685Z stderr F     variable_id:            sithick
  |   | 2022-04-29T19:56:42.287870506Z stderr F     variant_info:           N/A
  |   | 2022-04-29T19:56:42.287878045Z stderr F     references:             see further_info_url attribute
  |   | 2022-04-29T19:56:42.287884675Z stderr F     variant_label:          r1i1p1f1
  |   | 2022-04-29T19:56:42.287891102Z stderr F     branch_time_in_parent:  36500.0
  |   | 2022-04-29T19:56:42.287897495Z stderr F     parent_time_units:      days since 0001-1-1
  |   | 2022-04-29T19:56:42.64369656Z stderr F DEBUG:pangeo_forge_recipes.recipes.xarray_zarr:Converting variable sithick of 497664000 bytes to `numpy.ndarray`
  |   | 2022-04-29T19:56:51.028790066Z stderr F DEBUG:pangeo_forge_recipes.recipes.xarray_zarr:Acquiring locks ['sithick-time-33']
  |   | 2022-04-29T19:56:51.092226537Z stderr F DEBUG:pangeo_forge_recipes.utils:Acquiring lock pangeo-forge-sithick-time-33...
  |   | 2022-04-29T19:56:51.093594791Z stderr F DEBUG:pangeo_forge_recipes.utils:Acquired lock pangeo-forge-sithick-time-33
  |   | 2022-04-29T19:56:51.093881024Z stderr F INFO:pangeo_forge_recipes.recipes.xarray_zarr:Storing variable sithick chunk Index({DimIndex(name='time', index=5, sequence_len=15, operation=<CombineOp.SUBSET: 3>), DimIndex(name='time', index=0, sequence_len=2, operation=<CombineOp.CONCAT: 2>)}) to Zarr region (slice(400, 480, None), slice(None, None, None), slice(None, None, None))
  |   | 2022-04-29T19:56:57.438654101Z stderr F DEBUG:pangeo_forge_recipes.utils:Released lock pangeo-forge-sithick-time-33
  |   | 2022-04-29T19:56:57.520012731Z stderr F DEBUG:pangeo_forge_recipes.recipes.xarray_zarr:Converting variable time of 640 bytes to `numpy.ndarray`
  |   | 2022-04-29T19:56:57.522017763Z stderr F DEBUG:pangeo_forge_recipes.recipes.xarray_zarr:Acquiring locks ['time-time-33']
  |   | 2022-04-29T19:56:57.522829331Z stderr F DEBUG:pangeo_forge_recipes.utils:Acquiring lock pangeo-forge-time-time-33...
  |   | 2022-04-29T19:56:57.524663995Z stderr F DEBUG:pangeo_forge_recipes.utils:Acquired lock pangeo-forge-time-time-33
  |   | 2022-04-29T19:56:57.524769957Z stderr F INFO:pangeo_forge_recipes.recipes.xarray_zarr:Storing variable time chunk Index({DimIndex(name='time', index=5, sequence_len=15, operation=<CombineOp.SUBSET: 3>), DimIndex(name='time', index=0, sequence_len=2, operation=<CombineOp.CONCAT: 2>)}) to Zarr region (slice(400, 480, None),)
  |   | 2022-04-29T19:56:57.749296599Z stderr F distributed.worker - INFO - Stopping worker at tcp://10.60.5.4:32987
  |   | 2022-04-29T19:56:57.77712317Z stdout F [2022-04-29 19:56:57+0000] ERROR - prefect.CloudTaskRunner \| Task 'store_chunk[5]': Exception encountered during task execution!
  |   | 2022-04-29T19:56:57.777637929Z stdout F Traceback (most recent call last):
  |   | 2022-04-29T19:56:57.777662797Z stdout F   File "/srv/conda/envs/notebook/lib/python3.9/site-packages/prefect/engine/task_runner.py", line 861, in get_task_run_state
  |   | 2022-04-29T19:56:57.777672349Z stdout F     value = prefect.utilities.executors.run_task_with_timeout(
  |   | 2022-04-29T19:56:57.777584345Z stderr F ERROR:prefect.CloudTaskRunner:Task 'store_chunk[5]': Exception encountered during task execution!
  |   | 2022-04-29T19:56:57.778682249Z stdout F   File "/srv/conda/envs/notebook/lib/python3.9/site-packages/prefect/utilities/executors.py", line 323, in run_task_with_timeout
  |   | 2022-04-29T19:56:57.778715778Z stdout F     return task.run(*args, **kwargs)  # type: ignore
  |   | 2022-04-29T19:56:57.77874032Z stdout F   File "/usr/local/lib/python3.9/site-packages/registrar/flow.py", line 113, in wrapper
  |   | 2022-04-29T19:56:57.778749981Z stdout F   File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 573, in store_chunk
  |   | 2022-04-29T19:56:57.778756568Z stdout F     zarr_array[zarr_region] = data
  |   | 2022-04-29T19:56:57.778763299Z stdout F   File "/srv/conda/envs/notebook/lib/python3.9/contextlib.py", line 126, in __exit__
  |   | 2022-04-29T19:56:57.778769801Z stdout F     next(self.gen)
  |   | 2022-04-29T19:56:57.778776439Z stdout F   File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/utils.py", line 118, in lock_for_conflicts
  |   | 2022-04-29T19:56:57.778783258Z stdout F     lock.release()
  |   | 2022-04-29T19:56:57.778803439Z stdout F   File "/srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/lock.py", line 151, in release
  |   | 2022-04-29T19:56:57.778810718Z stdout F     self.client.scheduler.lock_release, name=self.name, id=self.id
  |   | 2022-04-29T19:56:57.778817494Z stdout F AttributeError: 'NoneType' object has no attribute 'lock_release'
  |   | 2022-04-29T19:56:57.778705956Z stderr F Traceback (most recent call last):
  |   | 2022-04-29T19:56:57.778831704Z stderr F   File "/srv/conda/envs/notebook/lib/python3.9/site-packages/prefect/engine/task_runner.py", line 861, in get_task_run_state
  |   | 2022-04-29T19:56:57.778838359Z stderr F     value = prefect.utilities.executors.run_task_with_timeout(
  |   | 2022-04-29T19:56:57.778845231Z stderr F   File "/srv/conda/envs/notebook/lib/python3.9/site-packages/prefect/utilities/executors.py", line 323, in run_task_with_timeout
  |   | 2022-04-29T19:56:57.778852071Z stderr F     return task.run(*args, **kwargs)  # type: ignore
  |   | 2022-04-29T19:56:57.778858386Z stderr F   File "/usr/local/lib/python3.9/site-packages/registrar/flow.py", line 113, in wrapper
  |   | 2022-04-29T19:56:57.778865144Z stderr F   File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 573, in store_chunk
  |   | 2022-04-29T19:56:57.778871675Z stderr F     zarr_array[zarr_region] = data
  |   | 2022-04-29T19:56:57.778878222Z stderr F   File "/srv/conda/envs/notebook/lib/python3.9/contextlib.py", line 126, in __exit__
  |   | 2022-04-29T19:56:57.778884679Z stderr F     next(self.gen)
  |   | 2022-04-29T19:56:57.778893125Z stderr F   File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/utils.py", line 118, in lock_for_conflicts
  |   | 2022-04-29T19:56:57.778900143Z stderr F     lock.release()
  |   | 2022-04-29T19:56:57.778906574Z stderr F   File "/srv/conda/envs/notebook/lib/python3.9/site-packages/distributed/lock.py", line 151, in release
  |   | 2022-04-29T19:56:57.778935059Z stderr F     self.client.scheduler.lock_release, name=self.name, id=self.id
  |   | 2022-04-29T19:56:57.778942054Z stderr F AttributeError: 'NoneType' object has no attribute 'lock_release'
  |   | 2022-04-29T19:56:57.780001303Z stderr F distributed.dask_worker - INFO - Exiting on signal 15
  |   | 2022-04-29T19:56:57.780026469Z stderr F distributed.nanny - INFO - Closing Nanny at 'tcp://10.60.5.4:40227'
  |   | 2022-04-29T19:56:57.948725733Z stdout F [2022-04-29 19:56:57+0000] INFO - prefect.CloudTaskRunner \| Task 'store_chunk[5]': Finished task run for task with final state: 'Failed'
  |   | 2022-04-29T19:56:57.948866006Z stderr F INFO:prefect.CloudTaskRunner:Task 'store_chunk[5]': Finished task run for task with final state: 'Failed'
  |   | 2022-04-29T19:56:57.950702711Z stderr F distributed.nanny - INFO - Worker closed
  |   | 2022-04-29T19:56:59.385116591Z stderr F distributed.nanny - WARNING - Worker process still alive after 1 seconds, killing
  |   | 2022-04-29T19:56:59.386855884Z stderr F distributed.dask_worker - INFO - End worker
  |   | 2022-04-29T19:56:59.387340386Z stderr F distributed.process - INFO - reaping stray process <SpawnProcess name='Dask Worker process (from Nanny)' pid=315 parent=1 started daemon>
  |   | 2022-04-29T19:56:59.405694188Z stderr F Exception in thread AsyncProcess Dask Worker process (from Nanny) watch process join:
  |   | 2022-04-29T19:56:59.40574961Z stderr F Traceback (most recent call last):
  |   | 2022-04-29T19:56:59.405761984Z stderr F   File "/srv/conda/envs/notebook/lib/python3.9/threading.py", line 973, in _bootstrap_inner

Could this be related to pangeo-forge/noaa-coastwatch-geopolar-sst-feedstock#2 (comment) ?

We've just tried to reproduce in a notebook with

import prefect

flow = recipe_4.copy_pruned().to_prefect()
flow.executor = prefect.executors.DaskExecutor()

but were unsuccessful, perhaps because there were not enough workers in the LocalCluster to induce the locking issue?

@cisaacstern
Copy link
Member

I've opened #6 as a replacement for this PR, so going to close this now.

@cisaacstern cisaacstern closed this May 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants