Add Viewer support #9

forsyth2 · 2024-11-26T20:30:22Z

Issue resolution

Closes [Feature]: Create HTML page for global time series zppy#601. Follow-up to Create global time series Viewers zppy#616.
Corresponding zppy PR: Add Viewer support zppy#648

Select one: This pull request is...

a bug fix: increment the patch version
a new feature: increment the minor version
an incompatible (non-backwards compatible) API change: increment the major version

1. Does this do what we want it to do?

Objectives:

Add Viewers for global_time_series, similar to those of E3SM Diags
Add support for Land global_time_series plots

Required:

Product Management: I have confirmed with the stakeholders that the objectives above are correct and complete.
Testing: I have considered likely and/or severe edge cases and have included them in testing.

2. Are the implementation details accurate & efficient?

Required:

Logic: I have visually inspected the entire pull request myself.
Logic: I have left GitHub comments highlighting important pieces of code logic. I have had these code blocks reviewed by at least one other team member.

If applicable:

Dependencies: This pull request introduces a new dependency. I have discussed this requirement with at least one other team member. The dependency is noted in zppy-interfaces/conda, not just an import statement.
- beautifulsoup4, lxml, output_viewer

3. Is this well documented?

Required:

Documentation: by looking at the docs, a new user could easily understand the functionality introduced by this pull request.
zppy-interfaces doesn't have docs yet

4. Is this code clean?

Required:

Readability: The code is as simple as possible and well-commented, such that a new team member could understand what's happening.
Pre-commit checks: All the pre-commits checks have passed.

If applicable:

Software architecture: I have discussed relevant trade-offs in design decisions with at least one other team member. It is unlikely that this pull request will increase tech debt.

forsyth2

@xylar I'm trying to port over E3SM-Project/zppy#616 to post-refactored code. It's actually an excellent example of why it was a good idea to pull the code out into a separate package in the first place: there are three packages that my editor is showing as unfound, meaning pre-refactor I must have just been picking them up from Unified.

Is every Unified dependency included in https://github.com/E3SM-Project/e3sm-unified/blob/main/recipes/e3sm-unified/meta.yaml? I'm wondering if some of these packages are inside others. (I see output_viewer but not bs4 or distutils).

forsyth2 · 2024-11-26T20:48:14Z

zppy_interfaces/global_time_series/coupled_global.py

@@ -1,7 +1,10 @@
 # Script to plot some global atmosphere and ocean time series
+import csv
+import distutils.dir_util


Could not be resolved

You should not be using distutils anymore. Try to find a python 3.13 compatible alternative to this function.

https://docs.python.org/3.10/library/distutils.html

I've asked @altheaden if she would be willing to look for the preferred replacement for this function. Hopefully, she can come up with something.

zppy_interfaces/global_time_series/coupled_global.py

forsyth2 · 2024-11-26T20:48:57Z

zppy_interfaces/global_time_series/coupled_global.py

+from output_viewer.build import build_page, build_viewer
+from output_viewer.index import (
+    OutputFile,
+    OutputGroup,
+    OutputIndex,
+    OutputPage,
+    OutputRow,
+)
+from output_viewer.utils import rechmod


output_viewer could not be resolved.

xylar · 2024-11-26T20:58:12Z

Is every Unified dependency included in https://github.com/E3SM-Project/e3sm-unified/blob/main/recipes/e3sm-unified/meta.yaml?

No, that only shows direct dependencies. There are also dependencies-of-dependencies and so on. There isn't a great way to know what those are or what their constraints are.

xylar · 2024-11-26T20:59:27Z

Is bs4 a different implementation of beautiful_soup? If so, what a mess!

forsyth2 · 2024-11-26T21:03:32Z

bs4 is how e3sm_diags imports it: https://github.com/E3SM-Project/e3sm_diags/blob/ca41b0e5d913610c88410928951f1ed11c75663f/e3sm_diags/viewer/main.py#L4

xylar · 2024-11-26T21:45:34Z

It's actually an excellent example of why it was a good idea to pull the code out into a separate package in the first place: there are three packages that my editor is showing as unfound, meaning pre-refactor I must have just been picking them up from Unified.

Glad that this work is proving its value!

forsyth2

I now have a viewer index pointing to both the atm and lnd viewers! (Images below).

However, I have 5 remaining issues I need some help on (described in this review's comments).

@chengzhuzhang -- Issues 2,3
@tomvothecoder -- Issues 1,2,4,5
@xylar -- Issue 4 (and maybe some of the others too)

Thanks!

forsyth2 · 2024-11-27T20:12:23Z

.github/workflows/build_workflow.yml

@@ -47,6 +47,7 @@ jobs:
      # since the action is run on a branch in detached head state.
      # This is the equivalent of running "pre-commit run --all-files" locally.
      # If you commit with the `--no-verify` flag, this check may fail.
+      # TODO: this doesn't seem to run when I run `git commit` locally. I always have to run `pre-commit run --all-files` manually.


Issue 1: When I run git commit it doesn't actually do the pre-commit checks. I don't get the passed messages. If I run pre-commit run --all-files afterward, there are in fact changes made, which I then need to amend in to the commit if I didn't think to run pre-commit run --all-files before committing. I'm not sure why it's not working.

@tomvothecoder Not sure if you've had a chance to look at this yet, but I'm not quite sure how to fix this.

Did you run pre-commit install beforehand?

Ah, that fixes this, thanks!

forsyth2 · 2024-11-27T20:16:00Z

zppy_interfaces/global_time_series/coupled_global.py

+    header = True
+    # TODO: how do we make sure the csv is actually accessible????
+    # The current directory is where we ran the code from, which is not necessarily where the csv is.
+    csv_path = INCLUSIONS_DIR


Issue 4a: See Issue 4 comment.

forsyth2 · 2024-11-27T20:17:56Z

zppy_interfaces/global_time_series/coupled_global.py

+            annual_average_dataset_for_var: xarray.core.dataset.Dataset
            if metric == Metric.AVERAGE:
-                annual_average_dataset_for_var: xarray.core.dataset.Dataset = (
-                    self.f.temporal.group_average(var, "year")
+                annual_average_dataset_for_var = self.f.temporal.group_average(
+                    var, "year"
                )
                data_array = annual_average_dataset_for_var.data_vars[var]
+            elif metric == Metric.TOTAL:
+                annual_average_dataset_for_var = self.f.temporal.group_average(
+                    var, "year"
+                )
+                data_array = annual_average_dataset_for_var.data_vars[var]
+                # import pprint
+                # pprint.pprint(
+                #     f"annual_average_dataset_for_var attributes={annual_average_dataset_for_var.attrs}"
+                # )
+                # pprint.pprint(f"data_array attributes={data_array.attrs}")
+                # data_array *= area*landfrac
+                # TODO: Determine how to get area and landfrac


Issue 2: I know we're aiming for data_array *= area*landfrac for the Metric.TOTAL calculation, but I'm still a little confused about how to extract those. Something like the following?

self.f.temporal.group_average(var, "area") self.f.temporal.group_average(var, "landfrac")

But those are known scalars, not computed averages, right?

Use xarray to extract area or landfrac -- specify variable with [] for indexing. In annual_average_dataset_for_var.data_vars[var], instead of var use "area" or "landfrac"

forsyth2 · 2024-11-27T20:19:42Z

zppy_interfaces/global_time_series/coupled_global.py

 def coupled_global(parameters: Parameters) -> None:
    requested_variables = RequestedVariables(parameters)
    for rgn in parameters.regions:
        run(parameters, requested_variables, rgn)
+    plots_per_page = parameters.nrows * parameters.ncols
+    # TODO: Is this how we want to determine when to make a viewer or should we have a `make_viewer` parameter in the cfg?


Issue 3: this is a design decision question. I guess it would be best to have a specific make_viewer parameter rather than relying on a user specifying they want 1 row and 1 column in the parameters. It's possible a user would want single plot images, but not a Viewer. (Although, the inverse doesn't work; we need 1x1 images to use in the Viewer). Once #3 is implemented too, that will make parameter specification less clunky.

Upon further thought, I think I should just go ahead and add the make_viewer parameter. It's clearer to understand, and allows for the case of wanting single plot images but not a Viewer.

forsyth2 · 2024-11-27T20:21:08Z

zppy_interfaces/global_time_series/coupled_global_utils.py

@@ -0,0 +1,42 @@
+from enum import Enum
+
+# TODO: how to determine this automatically?


Issue 4: There are two cases where we need to access non-python files. (See the Issue 4a and 4b comments). The code can't seem to access these files without me hard-coding a path.

@forsyth2, I think you need a manifest.in.

https://github.com/E3SM-Project/polaris/blob/a44d4b8b308067287c007d2b17176d827fa39c9d/MANIFEST.in

@altheaden and I worked hard to find an alternative with luck.

Interesting, good to know, thanks!

forsyth2 · 2024-11-27T20:23:39Z

zppy_interfaces/global_time_series/coupled_global_viewer.py

+    # import sys
+    # logger.debug(f"sys.prefix: {sys.prefix}, ls sys.prefix: {os.listdir(sys.prefix)}")
+    # TODO: figure out install_path
+    install_path: str = INCLUSIONS_DIR
+    path: str = os.path.join(install_path, "index_template.html")


Issue 4b: See Issue 4 comment.

E3SM Diags uses sys.prefix (INSTALL_PATH = os.path.join(sys.prefix, "share/e3sm_diags/") in e3sm_diags/__init__.py, but there's no zppy-interfaces in my {sys.prefix}/share dir)

zppy_interfaces/multi_utils/logger.py

forsyth2 · 2024-11-27T20:38:20Z

zppy_interfaces/multi_utils/viewer.py

@@ -0,0 +1,126 @@
+import os


@zhangshixuan1987 For E3SM-Project/zppy#647, this Viewer file may be useful for generating Viewers for PCMDI Diags. (This will be in the main branch of zppy-interfaces after this PR merges).

forsyth2

@xylar I tried adding MANIFEST.in in the latest commit (b9d16ca), but I'm still getting FileNotFoundError: [Errno 2] No such file or directory.

I just need to do the pip install . step, right? Or does this need to be loaded into conda? https://github.com/E3SM-Project/polaris/pull/64/files appears to literally just add MANIFEST.in and nothing else.

forsyth2 · 2024-11-27T21:56:06Z

MANIFEST.in

+recursive-include zppy-interfaces zppy_interfaces/global_time_series/zppy_land_fields.csv
+recursive-include zppy-interfaces zppy_interfaces/global_time_series/index_template.html


I tried zppy-interfaces, zppy_interfaces, and zi-global-time-series here.

xylar · 2024-11-27T22:32:10Z

MANIFEST.in

+recursive-include zppy-interfaces zppy_interfaces/global_time_series/zppy_land_fields.csv
+recursive-include zppy-interfaces zppy_interfaces/global_time_series/index_template.html


This command is for recursively including all files of a given type in the package:

Suggested change

recursive-include zppy-interfaces zppy_interfaces/global_time_series/zppy_land_fields.csv

recursive-include zppy-interfaces zppy_interfaces/global_time_series/index_template.html

recursive-include zppy_interfaces *.csv

recursive-include zppy_interfaces *.html

This is not the syntax to use for including individual files, which I would not recommend. You should not have cvs or html files in the python package that you don't want to include in the conda package.

@xylar Hmm I'm still getting the same error even with this change.

@forsyth2, I see. I think the problem is 2-fold and I hadn't realized that. I think we have solved the first problem. Can you verify that, after calling pip install ., you can find the place where zppy-interfaces is installed and that both the csv and html file are included there along with the python files? I think that will be the case with the MANIFEST.in as we have it now.

The next problem is that you are looking for files in a location that e3sm_diags would move them to, but this is not how zppy-interfaces works. You need to use importlib-resources to find the files within the zppy-interfaces package. I will try to make the relevant suggested changes.

xylar

Here is how I would handle finding the csv and html files in the package. INCLUSIONS_DIR should not be needed.

zppy_interfaces/global_time_series/coupled_global_viewer.py

xylar · 2024-12-03T07:59:27Z

zppy_interfaces/global_time_series/coupled_global_viewer.py

+from bs4 import BeautifulSoup
+
+from zppy_interfaces.global_time_series.coupled_global_utils import (
+    INCLUSIONS_DIR,


Suggested change

INCLUSIONS_DIR,

xylar · 2024-12-03T07:59:37Z

zppy_interfaces/global_time_series/coupled_global_viewer.py

+        td.append(a)
+        row_obj.append(td)
+
+    path: str = os.path.join(INCLUSIONS_DIR, "index_template.html")


Suggested change

path: str = os.path.join(INCLUSIONS_DIR, "index_template.html")

path: str = str(imp_res.files("zppy_interfaces.global_time_series") /

"index_template.html")

xylar · 2024-12-03T08:00:00Z

zppy_interfaces/global_time_series/coupled_global_utils.py

+# Relies on MANIFEST.in to include files
+INCLUSIONS_DIR = "zppy_interfaces/global_time_series"
+


Suggested change

# Relies on MANIFEST.in to include files

INCLUSIONS_DIR = "zppy_interfaces/global_time_series"

xylar · 2024-12-03T08:00:22Z

zppy_interfaces/global_time_series/coupled_global.py


+from zppy_interfaces.global_time_series.coupled_global_plotting import make_plot_pdfs
+from zppy_interfaces.global_time_series.coupled_global_utils import (
+    INCLUSIONS_DIR,


Suggested change

INCLUSIONS_DIR,

zppy_interfaces/global_time_series/coupled_global.py

xylar · 2024-12-03T08:03:00Z

zppy_interfaces/global_time_series/coupled_global.py

+def construct_land_variables(requested_vars: List[str]) -> List[Variable]:
+    var_list: List[Variable] = []
+    header = True
+    with open(f"{INCLUSIONS_DIR}/zppy_land_fields.csv", newline="") as csv_file:


Suggested change

with open(f"{INCLUSIONS_DIR}/zppy_land_fields.csv", newline="") as csv_file:

path: csv_filename = str(imp_res.files("zppy_interfaces.global_time_series") /

"zppy_land_fields.html")

with open(csv_filename, newline="") as csv_file:

forsyth2 · 2024-12-03T21:29:08Z

@xylar Thanks, those changes work!

xylar · 2024-12-03T21:57:48Z

Wonderful! Glad I could help.

forsyth2 · 2024-12-03T22:50:28Z

zppy_interfaces/global_time_series/coupled_global_dataset_wrapper.py

+                # ['AR', 'time_bounds', 'CWDC', 'FSH', 'GPP', 'H2OSNO', 'HR', 'LAISHA', 'LAISUN', 'NBP', 'QINTR', 'QOVER', 'QRUNOFF', 'QSOIL', 'QVEGE', 'QVEGT', 'RH2M', 'SOIL1C', 'SOIL2C', 'SOIL3C', 'SOIL4C', 'SOILWATER_10CM', 'TOTLITC', 'TOTVEGC', 'TSA', 'WOOD_HARVESTC', 'lon_bnds', 'lat_bnds']
+                # TODO: looks like we don't actually have area or landfrac in the dataset


It looked like we didn't have area or landfrac in our test input data (see the results of this logger line).

I tried adding the extra_vars = "area,landfrac" line below in the zppy cfg, to produce a new set of test input data, but zppy-interfaces is still showing that the key "area" doesn't exist: [ERROR]: coupled_global.py(set_var:184) >> "No variable named 'area'.

[[ lnd_monthly_glb ]] extra_vars = "area,landfrac" frequency = "monthly" input_files = "elm.h0" input_subdir = "archive/lnd/hist" mapping_file = "glb" vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR" years = "1985:1995:5",

These variables exist in the zppy input though:

cd /lcrc/group/e3sm2/ac.wlin/E3SMv3/v3.LR.historical_0051/archive/lnd/hist ncdump -h v3.LR.historical_0051.elm.h0.1850-01.nc | grep "float area" # float area(lat, lon) ; ncdump -h v3.LR.historical_0051.elm.h0.1850-01.nc | grep "float landfrac" # float landfrac(lat, lon) ;

Maybe they need to be added to vars rather than extra_vars.

@forsyth2 do you have a copy of the global time series files that ncclimo generated? I thought both should be fixed variables included there, if not we should try include both as extra variables.

do you have a copy of the global time series files that ncclimo generated

Yeah, that's the input data I'm testing on.

$ cd /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post_until_20241203/lnd/glb/ts/monthly/5yr $ ls AR_198501_198912.nc H2OSNO_199001_199412.nc QINTR_198501_198912.nc QVEGE_199001_199412.nc SOIL3C_198501_198912.nc TOTVEGC_199001_199412.nc AR_199001_199412.nc HR_198501_198912.nc QINTR_199001_199412.nc QVEGT_198501_198912.nc SOIL3C_199001_199412.nc TSA_198501_198912.nc CWDC_198501_198912.nc HR_199001_199412.nc QOVER_198501_198912.nc QVEGT_199001_199412.nc SOIL4C_198501_198912.nc TSA_199001_199412.nc CWDC_199001_199412.nc LAISHA_198501_198912.nc QOVER_199001_199412.nc RH2M_198501_198912.nc SOIL4C_199001_199412.nc WOOD_HARVESTC_198501_198912.nc FSH_198501_198912.nc LAISHA_199001_199412.nc QRUNOFF_198501_198912.nc RH2M_199001_199412.nc SOILWATER_10CM_198501_198912.nc WOOD_HARVESTC_199001_199412.nc FSH_199001_199412.nc LAISUN_198501_198912.nc QRUNOFF_199001_199412.nc SOIL1C_198501_198912.nc SOILWATER_10CM_199001_199412.nc GPP_198501_198912.nc LAISUN_199001_199412.nc QSOIL_198501_198912.nc SOIL1C_199001_199412.nc TOTLITC_198501_198912.nc GPP_199001_199412.nc NBP_198501_198912.nc QSOIL_199001_199412.nc SOIL2C_198501_198912.nc TOTLITC_199001_199412.nc H2OSNO_198501_198912.nc NBP_199001_199412.nc QVEGE_198501_198912.nc SOIL2C_199001_199412.nc TOTVEGC_198501_198912.nc $ cd /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post_20241203_area_landfrac_as_extra_vars/lnd/glb/ts/monthly/5yr $ ls AR_198501_198912.nc H2OSNO_199001_199412.nc QINTR_198501_198912.nc QVEGE_199001_199412.nc SOIL3C_198501_198912.nc TOTVEGC_199001_199412.nc AR_199001_199412.nc HR_198501_198912.nc QINTR_199001_199412.nc QVEGT_198501_198912.nc SOIL3C_199001_199412.nc TSA_198501_198912.nc CWDC_198501_198912.nc HR_199001_199412.nc QOVER_198501_198912.nc QVEGT_199001_199412.nc SOIL4C_198501_198912.nc TSA_199001_199412.nc CWDC_199001_199412.nc LAISHA_198501_198912.nc QOVER_199001_199412.nc RH2M_198501_198912.nc SOIL4C_199001_199412.nc WOOD_HARVESTC_198501_198912.nc FSH_198501_198912.nc LAISHA_199001_199412.nc QRUNOFF_198501_198912.nc RH2M_199001_199412.nc SOILWATER_10CM_198501_198912.nc WOOD_HARVESTC_199001_199412.nc FSH_199001_199412.nc LAISUN_198501_198912.nc QRUNOFF_199001_199412.nc SOIL1C_198501_198912.nc SOILWATER_10CM_199001_199412.nc GPP_198501_198912.nc LAISUN_199001_199412.nc QSOIL_198501_198912.nc SOIL1C_199001_199412.nc TOTLITC_198501_198912.nc GPP_199001_199412.nc NBP_198501_198912.nc QSOIL_199001_199412.nc SOIL2C_198501_198912.nc TOTLITC_199001_199412.nc H2OSNO_198501_198912.nc NBP_199001_199412.nc QVEGE_198501_198912.nc SOIL2C_199001_199412.nc TOTVEGC_198501_198912.nc $ cd /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post_20241203_area_landfrac_as_vars # no lnd directory $ cd scripts $ grep -in "error" *.o* #ts_lnd_monthly_glb_1985-1989-0005.o639437#:15:ncrcat: ERROR no variables fit criteria for processing #ts_lnd_monthly_glb_1985-1989-0005.o639437#:17:ncrcat: ERROR no variables fit criteria for processing #ts_lnd_monthly_glb_1985-1989-0005.o639437#:19:ncclimo: ERROR Failed to split. cmd_sbs[0] failed. Debug this: ts_lnd_monthly_glb_1985-1989-0005.o639437:15:ncrcat: ERROR no variables fit criteria for processing ts_lnd_monthly_glb_1985-1989-0005.o639437:17:ncrcat: ERROR no variables fit criteria for processing ts_lnd_monthly_glb_1985-1989-0005.o639437:19:ncclimo: ERROR Failed to split. cmd_sbs[0] failed. Debug this: ts_lnd_monthly_glb_1990-1994-0005.o639438:15:ncrcat: ERROR no variables fit criteria for processing ts_lnd_monthly_glb_1990-1994-0005.o639438:17:ncrcat: ERROR no variables fit criteria for processing ts_lnd_monthly_glb_1990-1994-0005.o639438:19:ncclimo: ERROR Failed to split. cmd_sbs[0] failed. Debug this:

yep. You are right, neither area and landfrac in global mean datasets. I think it makes sense to try if both can be added to vars.

But adding to vars doesn't work either -- that's that third ls block above.

From ts_lnd_monthly_glb_1985-1989-0005.o639437:

ncrcat: ERROR no variables fit criteria for processing ncrcat: HINT Extraction list must contain a record variable to concatenate. A record variable is a variable defined with a record dimension. Often the record dimension\ , aka unlimited dimension, refers to time. To change an existing dimension from a fixed to a record dimensions see http://nco.sf.net/nco.html#mk_rec_dmn or to add a ne\ w record dimension to all variables see http://nco.sf.net/nco.html#ncecat_rnm ncrcat: ERROR no variables fit criteria for processing ncrcat: HINT Extraction list must contain a record variable to concatenate. A record variable is a variable defined with a record dimension. Often the record dimension\ , aka unlimited dimension, refers to time. To change an existing dimension from a fixed to a record dimensions see http://nco.sf.net/nco.html#mk_rec_dmn or to add a ne\ w record dimension to all variables see http://nco.sf.net/nco.html#ncecat_rnm ncclimo: ERROR Failed to split. cmd_sbs[0] failed. Debug this:

That is, area, and landfrac clearly are included in the original nc files but I just can't seem to extract them either through extra_vars or vars.

@forsyth2 thanks for testing both. It looks like extra_vars does work with generic variable splitting (e.x. land_monthly task) but not when --glb flag is on (e.x. land_monthly_global). I think there are a few possibilities for moving forward:

include a [land_monthly] task to get both area and landfrac, and use both in downstream.

based on @czender 's comment , there is a new version of ncclimo that supports global integral scaling. To do so, we need additional logic to filter through variables in the time-series task.

we can ask @czender if it is possible to add extra-vars support in --glb mode, to keep area and landfrac in the global time-series files. In the mean time, I think you could tentatively use the first approach to get values of two variables..

forsyth2 · 2024-12-04T02:53:04Z

zppy_interfaces/global_time_series/coupled_global_dataset_wrapper.py

+                    logger.error(f"area.shape={area.shape}")  # area.shape=(180, 360)
+                    logger.error(
+                        f"landfrac.shape={landfrac.shape}"
+                    )  # landfrac.shape=(180, 360)
+                    logger.error(
+                        f"data_array.shape={data_array.shape}"
+                    )  # data_array.shape=(10, 3)
+                    # e: dimensions cannot change for in-place operations


@chengzhuzhang Ok, using the separate land_monthly non-global ts task (option 1 above), I can get area and landfrac, but now I'm having issues with the dimensions. I'm not sure how to take (180,360) to (10,3).

Ah, ok I think I have a reasonable dimensionality reduction:

if metric == Metric.TOTAL: logger.debug( f"self.extra_var_dataset.keys()={list(self.extra_var_dataset.keys())}" ) area: xarray.core.dataarray.DataArray = self.extra_var_dataset["area"] landfrac: xarray.core.dataarray.DataArray = self.extra_var_dataset["landfrac"] # area.shape() = (180, 360) total_area = area.sum() # Sum over all dimensions # landfrac.shape=(180, 360) average_landfrac = landfrac.mean() # Mean over all dimensions total_land_area = total_area * average_landfrac # data_array.shape = (number of years, number of regions) data_array *= total_land_area logger.info( "for Metric.TOTAL, data_array has been scaled by total land area" )

Hello @forsyth2 and @chengzhuzhang, I'm trying to understand the change(s) you would like in ncclimo. As you know, area and landfrac can be output in every default (non-spatially averaged) timeseries by requesting them with the --var_xtr=area,landfrac. However, as @forsyth2 has discovered, this behavior changes when requesting global/regional-average timeseries. Currently the spatial-mean timeseries output includes only the geophysical field of interest, no matter what was requested with --var_xtr.

I would be happy to change that behavior so that in spatial-averaged timeseries files ...

Always contain the full (i.e., non-spatially averaged) variables requested with --var_xtr OR

Always contain the full area and landfrac variables OR

Behave as currently by default, and contain the full area and landfrac variables when an additional switch is provided

Behave as currently by default, and contain the full variables requested with --var_xtr when an additional switch is provided.

These changes would be relatively straightforward to implement. Keep in mind that the current behavior was implemented so that spatial-mean timeseries files are orders of magnitude smaller than their non-spatial-average counterparts. Adding full fields to spatial average timeseries will significantly inflate their size. However, area and landfrac are time-constant variables so the resulting filesizes would still be much smaller than full timeseries files if only these variables are added.

Also happy to discuss other options, like providing a sane way for zppy to provide the desired scale factors to ncclimo which would then apply them to the fields during spatial-average timeseries generation. Feedback welcome.

@forsyth2 The "dimensionality reduction" you mention above triggers some red flags with me. In particular this line
total_land_area = total_area * average_landfrac is questionable because area and landfrac can vary indepently so I would expect the total land area to be defined by total_land_area = (area*landfrac).sum. If I am right, the former method would produce small errors that might not be noticed (e.g., 1-2%) until/unless carefully examined. Or I could be totally off base because I'm not sure how you're using the variables in the code :) In any case, you might try both methods and then compare against a known-to-be-correct answer.

I personally would say (1) or (4) since it allows users to set whatever extra variable they want rather than hard-coding in area or landfrac.

I appear to have a workaround here working now without these changes in ncclimo, but it is a bit clunky.

@czender I was thinking the same, to have the product area_times_landfrac would be easiest for zppy in this use case.

and perhaps making the new additon default would be fine, cause the new variables is small.

OK I will add a field called something likearea_times_landfrac to the output of all spatially-averaged timeseries. I'm headed to AGU on Friday and will return 12/16. Hopefully finish and release this in NCO 5.3.0 sometime that week.

Thanks for your help, Charlie! Have a nice trip to DC.

Yes, thank you @czender!

forsyth2

@chengzhuzhang I think this is getting pretty close to done. I just have a few comments, and then I'll ask the Land team to review further. I also want to update the unit tests too.

Results are visible at https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zi-test-webdir/global_time_series_1985-1995_results_viewers/ (note: that's not a stable link, meaning it will update with every time I run updated code). How does that look?

Corresponding changes on the zppy side are at https://github.com/E3SM-Project/zppy/pull/648/files

forsyth2 · 2024-12-05T00:27:22Z

.github/workflows/build_workflow.yml

@@ -47,6 +47,7 @@ jobs:
      # since the action is run on a branch in detached head state.
      # This is the equivalent of running "pre-commit run --all-files" locally.
      # If you commit with the `--no-verify` flag, this check may fail.
+      # TODO: this doesn't seem to run when I run `git commit` locally. I always have to run `pre-commit run --all-files` manually.


@tomvothecoder Not sure if you've had a chance to look at this yet, but I'm not quite sure how to fix this.

forsyth2 · 2024-12-05T00:29:16Z

zppy_interfaces/global_time_series/coupled_global_dataset_wrapper.py

+                if "area_times_landfrac" in keys:
+                    total_land_area = self.extra_var_dataset["area_times_landfrac"]
+                else:
+                    area: xarray.core.dataarray.DataArray = self.extra_var_dataset[
+                        "area"
+                    ]
+                    landfrac: xarray.core.dataarray.DataArray = self.extra_var_dataset[
+                        "landfrac"
+                    ]
+                    # area.shape() = (180, 360)
+                    # landfrac.shape() =(180, 360)
+                    total_land_area = (area * landfrac).sum()  # Sum over all dimensions
+                # data_array.shape = (number of years, number of regions)
+                data_array *= total_land_area


@chengzhuzhang This is how I'm handling the total_land_area scaling now.

@forsyth2 it is okay to use this logic for now for demonstration. The scaling is only good for global area, but not for N./S. hemisphere, which we can update later after ncclimo includes the scaling factor.

I think it is possible to get N_land_area and S_land_area, but sub-select with lat >= 0 or <= 0 for both area and landfrac.

I tried updating to the following but I'm still getting some blank plots (strangely it's inconsistent if it's the glb or n or s plot that's empty).

if metric == Metric.TOTAL: keys = list(self.extra_var_dataset.keys()) logger.debug(f"self.extra_var_dataset.keys()={keys}") if "area_times_landfrac" in keys: total_land_area = self.extra_var_dataset["area_times_landfrac"] else: area: xarray.core.dataarray.DataArray = self.extra_var_dataset[ "area" ] landfrac: xarray.core.dataarray.DataArray = self.extra_var_dataset[ "landfrac" ] # area.shape() = (180, 360) # landfrac.shape() = (180, 360) total_land_area = (area * landfrac).sum() # Sum over all dimensions # Account for hemispheric plots: north_area = area.where(area.lat >= 0) south_area = area.where(area.lat <= 0) north_landfrac = landfrac.where(landfrac.lat >= 0) south_landfrac = landfrac.where(landfrac.lat <= 0) north_land_area = (north_area * north_landfrac).sum() south_land_area = (south_area * south_landfrac).sum() # data_array.shape = (number of years, number of regions) # We want to keep those dimensions, but with these values: # (glb*total_land_area, n*north_land_area, s*south_land_area) data_array[:, 0] *= total_land_area data_array[:, 1] *= north_land_area data_array[:, 2] *= south_land_area

forsyth2 · 2024-12-05T00:30:21Z

zppy_interfaces/global_time_series/coupled_global_dataset_wrapper.py

+        if extra_vars_directory:
+            # If an extra_vars_directory is provided, we'll use that to open a new dataset
+            self.extra_var_dataset = xcdat.open_mfdataset(
+                f"{extra_vars_directory}*.nc",
+                center_times=True,
+            )
+        else:
+            # Otherwise, we'll use the same dataset.
+            self.extra_var_dataset = self.dataset


@chengzhuzhang Until ncclimo is updated to allow extra vars in glb runs, we can just specify an alternative path (i.e., to the [land_monthly] output).

zppy_interfaces/multi_utils/logger.py

chengzhuzhang · 2024-12-05T17:58:49Z

@chengzhuzhang I think this is getting pretty close to done. I just have a few comments, and then I'll ask the Land team to review further. I also want to update the unit tests too.

Results are visible at https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zi-test-webdir/global_time_series_1985-1995_results_viewers/ (note: that's not a stable link, meaning it will update with every time I run updated code). How does that look?

Corresponding changes on the zppy side are at https://github.com/E3SM-Project/zppy/pull/648/files

This looks good! I would suggest to update Results to something more descriptive, for instance, "Zppy global time-series plot: v3.LR.historical_0051 (1985-1995)", so that it is clear to users when comparing with other viewers generated from other runs. Free free to share with land team to get feedback. Thank you!

zppy_interfaces/global_time_series/coupled_global_dataset_wrapper.py

forsyth2

@thorntonpe @BunnyVon (and maybe also @dmricciuto @wlin7) This is ready for review.

For background context, we recently decided to split zppy into two packages: zppy itself will do its original purpose of orchestrating workflows while this new package zppy-interfaces will handle the "last mile" stretches of glue code to plot or otherwise post-process data. (I.e. zppy should not be doing any post-processing itself, only coordinating other post-processing tools.)

I have test results at https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zi-test-webdir/global_time_series_1985-1995_results_viewers/.

How to run your own tests

I think you also wanted to run some of your own test cases before we merge this PR. To do that:

# Set up zppy-interfaces
cd ~
git clone git@github.com:E3SM-Project/zppy-interfaces.git
cd zppy-interfaces
git fetch origin issue-601-viewers
git checkout -b test_land_viewers origin/issue-601-viewers
conda clean --all --y
conda env create -f conda/dev.yml -n my_zi_env
conda activate my_zi_env
pip install .

# Set up zppy
cd ~
git clone git@github.com:E3SM-Project/zppy.git
cd zppy
git fetch origin issue-601-viewers
git checkout -b test_land_viewers origin/issue-601-viewers
conda clean --all --y
conda env create -f conda/dev.yml -n my_zppy_env
conda activate my_zppy_env
pip install .

# Create a cfg
cd ~
emacs test_land_viewers.cfg
zppy -c test_land_viewers.cfg

Sample cfg

Below is an example cfg that you could modify with your own input data and settings. (In particular, these parameters will likely need changing: case, input, output, www, experiment_name, figstr)

[default]
case = "v3.LR.historical_0051"
constraint = ""
dry_run = "False"
environment_commands = ""
fail_on_dependency_skip = True
guess_path_parameters = False
guess_section_parameters = False
input = /lcrc/group/e3sm2/ac.wlin/E3SMv3/v3.LR.historical_0051
input_subdir = archive/atm/hist
mapping_file = "map_ne30pg2_to_cmip6_180x360_aave.20200201.nc"
output = "/lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_output/unique_id/v3.LR.historical_0051"
partition = "debug"
qos = "regular"
www = "/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_www/unique_id"
years = "1985:1989:2",

[ts]
active = True
e3sm_to_cmip_environment_commands = ""
walltime = "00:30:00"

  [[ land_monthly ]]
  extra_vars = "area,landfrac"
  frequency = "monthly"
  input_files = "elm.h0"
  input_subdir = "archive/lnd/hist"
  mapping_file = "map_r05_to_cmip6_180x360_aave.20231110.nc"
  years = "1985:1995:5",
  vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"

  [[ atm_monthly_glb ]]
  # Note global average won't work for 3D variables.
  frequency = "monthly"
  input_files = "eam.h0"
  input_subdir = "archive/atm/hist"
  mapping_file = "glb"
  years = "1985:1995:5",

  [[ lnd_monthly_glb ]]
  frequency = "monthly"
  input_files = "elm.h0"
  input_subdir = "archive/lnd/hist"
  mapping_file = "glb"
  vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
  years = "1985:1995:5",

[global_time_series]
active = True
climo_years = "1985-1989", "1990-1995",
environment_commands = "source <INSERT PATH TO CONDA>/conda.sh; conda activate my_zi_env"
experiment_name = "v3.LR.historical_0051"
figstr = "v3.LR.historical_0051"
make_viewer = True
num_cols = 1
num_rows = 1
plots_lnd = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
ts_num_years = 5
ts_years = "1985-1989", "1985-1995",
walltime = "00:30:00"
years = "1985-1995",

Reviewing code changes

I'm not sure how relevant the code changes themselves are to you, but I've marked some important areas as part of this review. Notably, the bulk of the changes are in this PR, but there are also a few changes on the zppy side: https://github.com/E3SM-Project/zppy/pull/648/files

forsyth2 · 2024-12-09T21:44:44Z

zppy_interfaces/global_time_series/coupled_global_dataset_wrapper.py

+            # Non-derived variables
+            annual_average_dataset_for_var: xarray.core.dataset.Dataset = (
+                self.dataset.temporal.group_average(var, "year")
+            )
+            data_array = annual_average_dataset_for_var.data_vars[var]
+            if metric == Metric.TOTAL:
+                keys = list(self.extra_var_dataset.keys())
+                logger.debug(f"self.extra_var_dataset.keys()={keys}")
+                if "area_times_landfrac" in keys:
+                    total_land_area = self.extra_var_dataset["area_times_landfrac"]
+                else:
+                    area: xarray.core.dataarray.DataArray = self.extra_var_dataset[
+                        "area"
+                    ]
+                    landfrac: xarray.core.dataarray.DataArray = self.extra_var_dataset[
+                        "landfrac"
+                    ]
+                    # area.shape() = (180, 360)
+                    # landfrac.shape() = (180, 360)
+                    total_land_area = (area * landfrac).sum()  # Sum over all dimensions
+
+                    # Account for hemispheric plots:
+                    north_area = area.where(area.lat >= 0)
+                    south_area = area.where(area.lat <= 0)
+                    north_landfrac = landfrac.where(landfrac.lat >= 0)
+                    south_landfrac = landfrac.where(landfrac.lat <= 0)
+                    north_land_area = (north_area * north_landfrac).sum()
+                    south_land_area = (south_area * south_landfrac).sum()
+
+                # data_array.shape = (number of years, number of regions)
+                # We want to keep those dimensions, but with these values:
+                # (glb*total_land_area, n*north_land_area, s*south_land_area)
+                data_array[:, 0] *= total_land_area
+                data_array[:, 1] *= north_land_area
+                data_array[:, 2] *= south_land_area
+            units = data_array.units
+            # `units` will be "1" if it's a dimensionless quantity
+            if (units != "1") and (original_units != "") and original_units != units:
+                raise ValueError(
+                    f"Units don't match up: Have {units} but expected {original_units}. This renders the supplied scale_factor ({scale_factor}) unusable."
+                )
+            if (scale_factor != 1) and (final_units != ""):
+                data_array *= scale_factor
+                units = final_units
+        return data_array, units


This is the code block where we do average or total calculations.

forsyth2 · 2024-12-09T21:47:31Z

zppy_interfaces/global_time_series/coupled_global.py

+            valid_vars,
+            invalid_vars,
+            rgn,
+            extra_vars_dict=exp["land"].replace("glb", "180x360_aave"),


Right now, ncclimo can't generate area and landfrac for globally averaged datasets. So, we're currently relying on having a separate land_monthly task defined in the cfg to set up these variables.

czender · 2024-12-17T21:09:40Z

@forsyth2 and @chengzhuzhang,

While area_times_landfrac is a descriptive name, is is ELM-specific.
Other models have different names for the valid fraction of gridcells.
I'm mainly thinking of MPAS-SI and CICE, though there may be more.
And the zppy feature of scaling sums should therefore work for all models.

Hence I suggest that ncclimo name the new field valid_area_per_gridcell where:
valid_area_per_gridcell=area*landfrac for ELM
valid_area_per_gridcell=area*1 for EAM
valid_area_per_gridcell=area*sgs_frc when invoked with --sgs_frc option.

The latest ncclimo snapshot contains this modification.
Let me know if you disagree and prefer a different name.

It is unclear whether you want ncclimo to output variables specified with --var_xtr in global timeseries mode.
That would be easy to enable. Currently that is disabled just to keep the global timeseries files small and tidy.
However, this behavior is not as flexible, and prevents users from requesting, e.g., --var_xtr=area,landfrac for scaling the global timeseries themselves as they see fit.

Feedback welcome.

Charlie

chengzhuzhang · 2024-12-17T23:59:20Z

Thank you for the update, Charlie @czender
I think the current implementation looks good, and can help simply Ryan 's workflow to generate global time series plots. @forsyth2 could you test Charlie's update. Charlie can correct me but I believe for testing the new snapshot, what it takes is to replace the ncclimo call with /global/homes/z/zender/bin_perlmutter/ncclimo --npo as what is used in here

forsyth2 · 2024-12-18T23:25:34Z

@czender Thanks, I like the valid_area_per_gridcell name.

this behavior is not as flexible, and prevents users from requesting, e.g.,--var_xtr=area,landfrac for scaling the global timeseries themselves as they see fit.

My personal preference would be to enable that functionality. It's nice to have the flexibility.

replace the ncclimo call with /global/homes/z/zender/bin_perlmutter/ncclimo --npo

I'm working on Chrysalis, so I'll try /home/ac.zender/bin_chrysalis/ncclimo --npo

forsyth2 · 2024-12-19T00:40:13Z

@czender I get the following regardless if I set vars = "valid_area_per_gridcell,... or extra_vars = "valid_area_per_gridcell"

The following have been reloaded with a version change:
  1) openmpi/4.1.3-sxfyy4k => openmpi/4.0.4-hpcx-hghvhj5

ncclimo: ERROR /home/ac.zender/bin_chrysalis/ncks dies with error message on next line:
ncks: error while loading shared libraries: libgsl.so.25: cannot open shared object file: No such file or directory

czender · 2024-12-19T15:50:59Z

@forsyth2 That's because I haven't built NCO on Chrysalis (as opposed to Perlmutter) in a long time. I will try to do so today and ping you then.

czender · 2024-12-19T23:55:54Z

@forsyth2 Please try again on Chrysalis (and/or Perlmutter). I have rebuilt the latest snapshot there. BTW, the latest snapshot always includes area and landfrac variables in global timeseries output. No need to request them specially with --xtr_var. Is that OK with you or do you want to have to request them with --xtr_var?

forsyth2 · 2024-12-20T20:21:36Z

@czender Hmm now I'm getting

ncrcat: ERROR nco_xtr_mk() reports user-supplied variable name (or regular expression) 'valid_area_per_gridcell' is not in (or rx does not match any) contents of input file

That's with this cfg snippet:

[ts]

  [[ land_monthly ]]
  vars = "valid_area_per_gridcell,..."

Alternatively the error in #9 (comment) appears when using:

[ts]

  [[ land_monthly ]]
  vars_xtra = "valid_area_per_gridcell"

What's the proper usage?

`valid_area_per_gridcell`	...in `land_monthly` subtask	...in `lnd_monthly_glb` subtask
include in `var_xtr`...	shared library error	shared library error
include in `vars`...	not in contents of input file	shared library error
don't include at all...	shared library error	shared library error

The one change in ts.bash I have is:

-cat input.txt | ncclimo \
+cat input.txt | /home/ac.zender/bin_chrysalis/ncclimo --npo \

always includes area and landfrac variables in global timeseries output

That seems reasonable to me.

czender · 2024-12-20T20:31:27Z

@forsyth2 Sorry for the problems. The intended proper usage is your last row, don't add any switches at all, and everything you might want will be in the output files in global timeseries mode. As for why you get shared library errors when invoking ncclimo --npo...I'm unsure. Let me see investigate...but please paste samples of the errors below since it's hard for me to pretend to be a different user of my own binaries.

forsyth2 · 2024-12-20T20:44:29Z

Thanks @czender!

To put everything in one place, I just ran zppy -c tests/integration/generated/test_min_case_global_time_series_comprehensive_v3_setup_only_chrysalis.cfg. That produces output here:

cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_output/zppy_601_viewers_updated_nco_no_extra_vars/v3.LR.historical_0051/post/scripts

grep -v "OK" *status
# ts_lnd_monthly_glb_1985-1989-0005.status:ERROR (2)
# ts_lnd_monthly_glb_1990-1994-0005.status:ERROR (2)

cat ts_lnd_monthly_glb_19*.o*

That gives:

ts_only

The following have been reloaded with a version change:
  1) openmpi/4.1.3-sxfyy4k => openmpi/4.0.4-hpcx-hghvhj5

ncclimo: ERROR /home/ac.zender/bin_chrysalis/ncks dies with error message on next line:
ncks: error while loading shared libraries: libgsl.so.25: cannot open shared object file: No such file or directory
ts_only

The following have been reloaded with a version change:
  1) openmpi/4.1.3-sxfyy4k => openmpi/4.0.4-hpcx-hghvhj5

ncclimo: ERROR /home/ac.zender/bin_chrysalis/ncks dies with error message on next line:
ncks: error while loading shared libraries: libgsl.so.25: cannot open shared object file: No such file or directory

(For reference, E3SM-Project/zppy@ee60560 includes the changes for this)

forsyth2 · 2024-12-20T20:45:59Z

And the cfg is tests/integration/generated/test_min_case_global_time_series_comprehensive_v3_setup_only_chrysalis.cfg:

[default]
case = "v3.LR.historical_0051"
constraint = ""
dry_run = "False"
environment_commands = ""
fail_on_dependency_skip = True
guess_path_parameters = False
guess_section_parameters = False
input = /lcrc/group/e3sm2/ac.wlin/E3SMv3/v3.LR.historical_0051
input_subdir = archive/atm/hist
mapping_file = "map_ne30pg2_to_cmip6_180x360_aave.20200201.nc"
output = "/lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_output/zppy_601_viewers_updated_nco_no_extra_vars/v3.LR.historical_0051"
partition = "debug"
qos = "regular"
www = "/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_www/zppy_601_viewers_updated_nco_no_extra_vars"
years = "1985:1989:2",

[ts]
active = True
e3sm_to_cmip_environment_commands = ""
walltime = "00:30:00"

  [[ atm_monthly_glb ]]
  active = False
  # Note global average won't work for 3D variables.
  frequency = "monthly"
  input_files = "eam.h0"
  input_subdir = "archive/atm/hist"
  mapping_file = "glb"
  years = "1985:1995:5",

  [[ lnd_monthly_glb ]]
  frequency = "monthly"
  input_files = "elm.h0"
  input_subdir = "archive/lnd/hist"
  mapping_file = "glb"
  vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
  years = "1985:1995:5",

[mpas_analysis]
active = False
anomalyRefYear = 1985
climo_years = "1985-1989", "1990-1995",
enso_years = "1985-1989", "1990-1995",
mesh = "IcoswISC30E3r5"
parallelTaskCount = 6
partition = "compute"
qos = "regular"
shortTermArchive = True
ts_years = "1985-1989", "1985-1995",
walltime = "00:30:00"

# (This cfg is the setup portion only)
# [global_time_series]
# active = True
# climo_years = "1985-1989", "1990-1995",
# environment_commands = "source <INSERT PATH TO CONDA>/conda.sh; conda activate <INSERT ENV NAME>"
# experiment_name = "v3.LR.historical_0051"
# figstr = "v3.LR.historical_0051"
# moc_file=mocTimeSeries_1985-1995.nc
# plots_lnd = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
# ts_num_years = 5
# ts_years = "1985-1989", "1985-1995",
# walltime = "00:30:00"
# years = "1985-1995",

czender · 2024-12-20T21:20:47Z

You have given me all the information I need. I think. I altered the --npo paths. This might fix the problem you encountered. Please try again on Chrysalis now...

czender · 2024-12-20T22:39:32Z

@forsyth2 Did those latest --npo changes work for you and zppy? Would like to release NCO 5.3.0 now if they did...

forsyth2 · 2024-12-20T23:14:55Z

@czender Thanks, I've gotten a lot further, but still seeing blank plots on the zppy-interfaces side. However, that could be an implementation problem on my end*. I'm still debugging. It does seem like valid_area_per_gridcell is available to use though, so if that's what's relevant for releasing NCO 5.3.0, then I suppose you can do that.

*For reference, the latest relevant logic is:

                if "valid_area_per_gridcell" in keys:
                    logger.debug("Metric.TOTAL -- Using valid_area_per_gridcell")
                    land_area_per_gridcell = self.dataset["valid_area_per_gridcell"]
                    total_land_area = land_area_per_gridcell.sum()  # Sum over all dimensions
                    north_land_area = land_area_per_gridcell.where(land_area_per_gridcell.lat >= 0)
                    south_land_area = land_area_per_gridcell.where(land_area_per_gridcell.lat <= 0)

czender · 2024-12-20T23:27:11Z

@forsyth2 Your logic looks good. Not sure why plots are blank. I will hold-off releasing until this works in zppy.

However, these lines will double-count gridcells centered on the equator so that north+south != total:

           total_land_area = land_area_per_gridcell.sum()  # Sum over all dimensions
                   north_land_area = land_area_per_gridcell.where(land_area_per_gridcell.lat >= 0)
                   south_land_area = land_area_per_gridcell.where(land_area_per_gridcell.lat <= 0)

To avoid this NCO defines south < 0.0 and north >= 0.0 latitude. I suggest zppy do likewise.

forsyth2 · 2024-12-21T00:28:59Z

@czender I'm still running into issues with properly scaling the data by the valid area. I do feel like the problem is on the zppy-interfaces side, but obviously can't fully confirm that yet. I'm going to have to look into this more next week.

As far as the release schedule, mainly we just need the new NCO by the Jan. 15 release candidate deadline for E3SM Unified, for testing.

forsyth2 · 2024-12-21T00:50:35Z

Mostly for my reference:

zppy steps

Relevant ts.bash change:

- cat input.txt | ncclimo \
+ cat input.txt | /home/ac.zender/bin_chrysalis/ncclimo --npo \

Ran pip install .
Set UNIQUE_ID = "zppy_601_viewers_updated_nco_no_extra_vars_v3" in python tests/integration/utils.py
Ran python tests/integration/utils.py && zppy -c tests/integration/generated/test_min_case_global_time_series_comprehensive_v3_setup_only_chrysalis.cfg

zppy-interfaces steps

Update the test input data with the output of the zppy run using the latest NCO:

mv /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post/ /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post_until_20241220
mv /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_output/zppy_601_viewers_updated_nco_no_extra_vars_v3/v3.LR.historical_0051/post/ /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post/

Changes in 944966f
Run:

cd tests/integration/global_time_series
pip install ~/ez/zppy-interfaces && python cases_global_time_series.py

Plots are always showing up blank, even after deleting cache.

forsyth2 · 2024-12-23T16:38:10Z

Turns out it may have been a cache problem after all. I'm seeing plots now on https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zi-test-webdir/global_time_series_1985-1995_results_viewers/table_lnd/index.html. Maybe I needed to delete more cached data?? Or use a different browser?

forsyth2 · 2024-12-23T16:48:43Z

@czender As far as I can tell, the NCO fix is in fact working. The cached empty plot issue was very unfortunate; it turns out the NCO fixes were working on Friday after all.

czender · 2024-12-23T20:02:06Z

FYI NCO 5.3.0 has been released and is in my personal directories on Perlmutter, Chrsalis, and Acme1.

forsyth2 · 2025-01-14T19:32:48Z

An example of our new commit paradigm, discussed on E3SM-Project/zstash#355:

Rebase steps

git checkout issue-601-viewers

# This part is unnecessary, but I like to create a backup branch in case I mess something up
git checkout -b issue-601-viewers-until-20250114
git checkout issue-601-viewers

git log
# We have 19 commits here

# We can probably group them as follows:

# 1. Add Viewer support
# 2. Working land viewer

# 3. Atm viewer exists but is not linked to

# 4. Index points to atm and lnd viewers

# 5. Misc updates
# 6. Refactored coupled_global
# 7. Misc updates
# 8. MANIFEST.in not working
# 9. Updates

# 10. Attempt adding extra_vars
# 11. Working Total plots
# 12. Improved Total plotting

# 13. Add make_viewer parameter

# 14. Only show non empty viewers

# 15. Address comments
# 16. Fix unit tests
# 17. Clean up code

# 18. Attempt using new NCO
# 19. TOTAL plots working

# So, now we can rebase accordingly
git rebase -i 87c9e54c13afa470b879ae4672e5ecfb06cbb514

That gives:

pick 712ed30 Add Viewer support
pick 4a739a2 Working land viewer
pick 4f29d31 Atm viewer exists but is not linked to
pick 8092658 Index points to atm and lnd viewers
pick 22a7a7a Misc updates
pick e8e82be Refactored coupled_global
pick 35042d0 Misc updates
pick b9d16ca MANIFEST.in not working
pick f47f8a9 Updates
pick 9b7430f Attempt adding extra_vars
pick be48809 Working Total plots
pick b2f1dd9 Improved Total plotting
pick e79aaa1 Add make_viewer parameter
pick 77658c1 Only show non empty viewers
pick bff08ed Address comments
pick a9d2bb1 Fix unit tests
pick 13c6245 Clean up code
pick 944966f Attempt using new NCO
pick d09c1a1 TOTAL plots working

Which we'll change to:

pick 712ed30 Add Viewer support
f 4a739a2 Working land viewer
pick 4f29d31 Atm viewer exists but is not linked to
pick 8092658 Index points to atm and lnd viewers
pick 22a7a7a Misc updates
f e8e82be Refactored coupled_global
f 35042d0 Misc updates
f b9d16ca MANIFEST.in not working
f f47f8a9 Updates
pick 9b7430f Attempt adding extra_vars
f be48809 Working Total plots
f b2f1dd9 Improved Total plotting
pick e79aaa1 Add make_viewer parameter
pick 77658c1 Only show non empty viewers
pick bff08ed Address comments
f a9d2bb1 Fix unit tests
f 13c6245 Clean up code
pick 944966f Attempt using new NCO
f d09c1a1 TOTAL plots working

git log
# Now, we're down to 9 commits

# Let's give these commits better names
git rebase -i 87c9e54c13afa470b879ae4672e5ecfb06cbb514

We have:

pick c1031e7 Add Viewer support
pick 75f6a6e Atm viewer exists but is not linked to
pick afae0ef Index points to atm and lnd viewers
pick 74f3f82 Misc updates
pick 51a0285 Attempt adding extra_vars
pick 371cc72 Add make_viewer parameter
pick 52c5aca Only show non empty viewers
pick a8db102 Address comments
pick 113e524 Attempt using new NCO

Let's do:

r c1031e7 Add Viewer support
pick 75f6a6e Atm viewer exists but is not linked to
pick afae0ef Index points to atm and lnd viewers
r 74f3f82 Misc updates
r 51a0285 Attempt adding extra_vars
pick 371cc72 Add make_viewer parameter
pick 52c5aca Only show non empty viewers
pick a8db102 Address comments
r 113e524 Attempt using new NCO

Rename those 4 to:

Working land viewer
Refactored coupled_global
Working Total plots
Total plots working with new NCO

GitHub tells us we're expecting a merge conflict: tests/integration/global_time_series/cases_global_time_series.py

git fetch upstream
git rebase upstream/main

We have:

<<<<<<< HEAD
        "plots_atm": "TREFHT,AODDUST",
=======
        "atmosphere_only": "False",
        "plots_atm": "TREFHT",
>>>>>>> 3a7eee7 (Working land viewer)

Change to:

        "plots_atm": "TREFHT",

git grep -n "<<<" tests
# No more diffs
git add tests/integration/global_time_series/cases_global_time_series.py
git rebase --continue
git push -f upstream issue-601-viewers

forsyth2 · 2025-01-14T19:40:36Z

E3SM-Project/zppy#654 removed scratch, atmosphere_only, and plot_names. None of these occur on https://github.com/E3SM-Project/zppy-interfaces/pull/9/files, except for:

"atmosphere_only": "False",

in tests/integration/global_time_series/cases_global_time_series.py

forsyth2 self-assigned this Nov 26, 2024

This was referenced Nov 26, 2024

Add Viewer support E3SM-Project/zppy#648

Merged

Create global time series Viewers E3SM-Project/zppy#616

Closed

forsyth2 commented Nov 26, 2024

View reviewed changes

forsyth2 mentioned this pull request Nov 26, 2024

Add PCMDI Diags to zppy E3SM-Project/zppy#647

Draft

15 tasks

forsyth2 commented Nov 27, 2024

View reviewed changes

xylar reviewed Nov 27, 2024

View reviewed changes

xylar reviewed Dec 3, 2024

View reviewed changes

forsyth2 commented Dec 3, 2024

View reviewed changes

forsyth2 commented Dec 4, 2024

View reviewed changes

forsyth2 commented Dec 5, 2024

View reviewed changes

forsyth2 commented Dec 6, 2024

View reviewed changes

zppy_interfaces/global_time_series/coupled_global_dataset_wrapper.py Outdated Show resolved Hide resolved

forsyth2 commented Dec 9, 2024

View reviewed changes

forsyth2 marked this pull request as ready for review December 9, 2024 21:58

forsyth2 force-pushed the issue-601-viewers branch from d09c1a1 to dbcd011 Compare January 14, 2025 19:22

forsyth2 added 9 commits January 14, 2025 13:27

Working land viewer

520b3f4

Atm viewer exists but is not linked to

956a7eb

Index points to atm and lnd viewers

c7352c7

Refactored coupled_global

d9273b3

Working Total plots

57f14c3

Add make_viewer parameter

cfe5c2e

Only show non empty viewers

d2d4232

Address comments

dd6271c

Total plots working with new NCO

f26e7ab

forsyth2 force-pushed the issue-601-viewers branch from dbcd011 to f26e7ab Compare January 14, 2025 19:28

forsyth2 added 2 commits January 14, 2025 13:41

Remove deprecated parameter

2e50c84

Clean up code

c747ba4

forsyth2 merged commit 02e11c9 into main Jan 14, 2025
4 checks passed

forsyth2 deleted the issue-601-viewers branch January 14, 2025 19:59

		@@ -0,0 +1,42 @@
		from enum import Enum

		# TODO: how to determine this automatically?

		recursive-include zppy-interfaces zppy_interfaces/global_time_series/zppy_land_fields.csv
		recursive-include zppy-interfaces zppy_interfaces/global_time_series/index_template.html

	path: str = os.path.join(INCLUSIONS_DIR, "index_template.html")
	path: str = str(imp_res.files("zppy_interfaces.global_time_series") /
	"index_template.html")

		# Relies on MANIFEST.in to include files
		INCLUSIONS_DIR = "zppy_interfaces/global_time_series"

-    with open(f"{INCLUSIONS_DIR}/zppy_land_fields.csv", newline="") as csv_file:
+    path: csv_filename = str(imp_res.files("zppy_interfaces.global_time_series") /
+                             "zppy_land_fields.html")
+    with open(csv_filename, newline="") as csv_file:

		# ['AR', 'time_bounds', 'CWDC', 'FSH', 'GPP', 'H2OSNO', 'HR', 'LAISHA', 'LAISUN', 'NBP', 'QINTR', 'QOVER', 'QRUNOFF', 'QSOIL', 'QVEGE', 'QVEGT', 'RH2M', 'SOIL1C', 'SOIL2C', 'SOIL3C', 'SOIL4C', 'SOILWATER_10CM', 'TOTLITC', 'TOTVEGC', 'TSA', 'WOOD_HARVESTC', 'lon_bnds', 'lat_bnds']
		# TODO: looks like we don't actually have area or landfrac in the dataset

Add Viewer support #9

Add Viewer support #9

Conversation

forsyth2 commented Nov 26, 2024 • edited Loading

Issue resolution

1. Does this do what we want it to do?

2. Are the implementation details accurate & efficient?

3. Is this well documented?

4. Is this code clean?

forsyth2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xylar commented Nov 26, 2024

xylar commented Nov 26, 2024

forsyth2 commented Nov 26, 2024

xylar commented Nov 26, 2024

forsyth2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

forsyth2 Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

forsyth2 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xylar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

forsyth2 commented Dec 3, 2024

xylar commented Dec 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chengzhuzhang Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

forsyth2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

forsyth2 Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chengzhuzhang commented Dec 5, 2024 • edited Loading

forsyth2 left a comment

forsyth2 commented Nov 26, 2024 •

edited

Loading

forsyth2 Nov 27, 2024 •

edited

Loading

forsyth2 left a comment •

edited

Loading

chengzhuzhang Dec 4, 2024 •

edited

Loading

forsyth2 Dec 6, 2024 •

edited

Loading

chengzhuzhang commented Dec 5, 2024 •

edited

Loading

chengzhuzhang commented Dec 17, 2024 •

edited

Loading