Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Viewer support #9

Merged
merged 11 commits into from
Jan 14, 2025
Merged

Add Viewer support #9

merged 11 commits into from
Jan 14, 2025

Conversation

forsyth2
Copy link
Collaborator

@forsyth2 forsyth2 commented Nov 26, 2024

Issue resolution

Select one: This pull request is...

  • a bug fix: increment the patch version
  • a new feature: increment the minor version
  • an incompatible (non-backwards compatible) API change: increment the major version

1. Does this do what we want it to do?

Objectives:

  • Add Viewers for global_time_series, similar to those of E3SM Diags
  • Add support for Land global_time_series plots

Required:

  • Product Management: I have confirmed with the stakeholders that the objectives above are correct and complete.
  • Testing: I have considered likely and/or severe edge cases and have included them in testing.

2. Are the implementation details accurate & efficient?

Required:

  • Logic: I have visually inspected the entire pull request myself.
  • Logic: I have left GitHub comments highlighting important pieces of code logic. I have had these code blocks reviewed by at least one other team member.

If applicable:

  • Dependencies: This pull request introduces a new dependency. I have discussed this requirement with at least one other team member. The dependency is noted in zppy-interfaces/conda, not just an import statement.
    • beautifulsoup4, lxml, output_viewer

3. Is this well documented?

Required:

  • Documentation: by looking at the docs, a new user could easily understand the functionality introduced by this pull request.
  • zppy-interfaces doesn't have docs yet

4. Is this code clean?

Required:

  • Readability: The code is as simple as possible and well-commented, such that a new team member could understand what's happening.
  • Pre-commit checks: All the pre-commits checks have passed.

If applicable:

  • Software architecture: I have discussed relevant trade-offs in design decisions with at least one other team member. It is unlikely that this pull request will increase tech debt.

Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar I'm trying to port over E3SM-Project/zppy#616 to post-refactored code. It's actually an excellent example of why it was a good idea to pull the code out into a separate package in the first place: there are three packages that my editor is showing as unfound, meaning pre-refactor I must have just been picking them up from Unified.

Is every Unified dependency included in https://github.com/E3SM-Project/e3sm-unified/blob/main/recipes/e3sm-unified/meta.yaml? I'm wondering if some of these packages are inside others. (I see output_viewer but not bs4 or distutils).

@@ -1,7 +1,10 @@
# Script to plot some global atmosphere and ocean time series
import csv
import distutils.dir_util
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could not be resolved

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not be using distutils anymore. Try to find a python 3.13 compatible alternative to this function.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've asked @altheaden if she would be willing to look for the preferred replacement for this function. Hopefully, she can come up with something.

zppy_interfaces/global_time_series/coupled_global.py Outdated Show resolved Hide resolved
Comment on lines 21 to 29
from output_viewer.build import build_page, build_viewer
from output_viewer.index import (
OutputFile,
OutputGroup,
OutputIndex,
OutputPage,
OutputRow,
)
from output_viewer.utils import rechmod
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_viewer could not be resolved.

@xylar
Copy link

xylar commented Nov 26, 2024

Is every Unified dependency included in https://github.com/E3SM-Project/e3sm-unified/blob/main/recipes/e3sm-unified/meta.yaml?

No, that only shows direct dependencies. There are also dependencies-of-dependencies and so on. There isn't a great way to know what those are or what their constraints are.

@xylar
Copy link

xylar commented Nov 26, 2024

Is bs4 a different implementation of beautiful_soup? If so, what a mess!

@forsyth2
Copy link
Collaborator Author

@xylar
Copy link

xylar commented Nov 26, 2024

It's actually an excellent example of why it was a good idea to pull the code out into a separate package in the first place: there are three packages that my editor is showing as unfound, meaning pre-refactor I must have just been picking them up from Unified.

Glad that this work is proving its value!

Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now have a viewer index pointing to both the atm and lnd viewers! (Images below).

However, I have 5 remaining issues I need some help on (described in this review's comments).

Thanks!

Screenshot 2024-11-27 at 12 13 06 PM
Screenshot 2024-11-27 at 12 13 18 PM
Screenshot 2024-11-27 at 12 13 29 PM

@@ -47,6 +47,7 @@ jobs:
# since the action is run on a branch in detached head state.
# This is the equivalent of running "pre-commit run --all-files" locally.
# If you commit with the `--no-verify` flag, this check may fail.
# TODO: this doesn't seem to run when I run `git commit` locally. I always have to run `pre-commit run --all-files` manually.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue 1: When I run git commit it doesn't actually do the pre-commit checks. I don't get the passed messages. If I run pre-commit run --all-files afterward, there are in fact changes made, which I then need to amend in to the commit if I didn't think to run pre-commit run --all-files before committing. I'm not sure why it's not working.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomvothecoder Not sure if you've had a chance to look at this yet, but I'm not quite sure how to fix this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you run pre-commit install beforehand?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that fixes this, thanks!

header = True
# TODO: how do we make sure the csv is actually accessible????
# The current directory is where we ran the code from, which is not necessarily where the csv is.
csv_path = INCLUSIONS_DIR
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue 4a: See Issue 4 comment.

Comment on lines 213 to 230
annual_average_dataset_for_var: xarray.core.dataset.Dataset
if metric == Metric.AVERAGE:
annual_average_dataset_for_var: xarray.core.dataset.Dataset = (
self.f.temporal.group_average(var, "year")
annual_average_dataset_for_var = self.f.temporal.group_average(
var, "year"
)
data_array = annual_average_dataset_for_var.data_vars[var]
elif metric == Metric.TOTAL:
annual_average_dataset_for_var = self.f.temporal.group_average(
var, "year"
)
data_array = annual_average_dataset_for_var.data_vars[var]
# import pprint
# pprint.pprint(
# f"annual_average_dataset_for_var attributes={annual_average_dataset_for_var.attrs}"
# )
# pprint.pprint(f"data_array attributes={data_array.attrs}")
# data_array *= area*landfrac
# TODO: Determine how to get area and landfrac
Copy link
Collaborator Author

@forsyth2 forsyth2 Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue 2: I know we're aiming for data_array *= area*landfrac for the Metric.TOTAL calculation, but I'm still a little confused about how to extract those. Something like the following?

self.f.temporal.group_average(var, "area")
self.f.temporal.group_average(var, "landfrac")

But those are known scalars, not computed averages, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use xarray to extract area or landfrac -- specify variable with [] for indexing. In annual_average_dataset_for_var.data_vars[var], instead of var use "area" or "landfrac"

def coupled_global(parameters: Parameters) -> None:
requested_variables = RequestedVariables(parameters)
for rgn in parameters.regions:
run(parameters, requested_variables, rgn)
plots_per_page = parameters.nrows * parameters.ncols
# TODO: Is this how we want to determine when to make a viewer or should we have a `make_viewer` parameter in the cfg?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue 3: this is a design decision question. I guess it would be best to have a specific make_viewer parameter rather than relying on a user specifying they want 1 row and 1 column in the parameters. It's possible a user would want single plot images, but not a Viewer. (Although, the inverse doesn't work; we need 1x1 images to use in the Viewer). Once #3 is implemented too, that will make parameter specification less clunky.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon further thought, I think I should just go ahead and add the make_viewer parameter. It's clearer to understand, and allows for the case of wanting single plot images but not a Viewer.

@@ -0,0 +1,42 @@
from enum import Enum

# TODO: how to determine this automatically?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue 4: There are two cases where we need to access non-python files. (See the Issue 4a and 4b comments). The code can't seem to access these files without me hard-coding a path.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@forsyth2, I think you need a manifest.in.

https://github.com/E3SM-Project/polaris/blob/a44d4b8b308067287c007d2b17176d827fa39c9d/MANIFEST.in

@altheaden and I worked hard to find an alternative with luck.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, good to know, thanks!

Comment on lines 98 to 102
# import sys
# logger.debug(f"sys.prefix: {sys.prefix}, ls sys.prefix: {os.listdir(sys.prefix)}")
# TODO: figure out install_path
install_path: str = INCLUSIONS_DIR
path: str = os.path.join(install_path, "index_template.html")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue 4b: See Issue 4 comment.

E3SM Diags uses sys.prefix (INSTALL_PATH = os.path.join(sys.prefix, "share/e3sm_diags/") in e3sm_diags/__init__.py, but there's no zppy-interfaces in my {sys.prefix}/share dir)

zppy_interfaces/multi_utils/logger.py Outdated Show resolved Hide resolved
@@ -0,0 +1,126 @@
import os
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangshixuan1987 For E3SM-Project/zppy#647, this Viewer file may be useful for generating Viewers for PCMDI Diags. (This will be in the main branch of zppy-interfaces after this PR merges).

Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar I tried adding MANIFEST.in in the latest commit (b9d16ca), but I'm still getting FileNotFoundError: [Errno 2] No such file or directory.

I just need to do the pip install . step, right? Or does this need to be loaded into conda? https://github.com/E3SM-Project/polaris/pull/64/files appears to literally just add MANIFEST.in and nothing else.

MANIFEST.in Outdated
Comment on lines 1 to 2
recursive-include zppy-interfaces zppy_interfaces/global_time_series/zppy_land_fields.csv
recursive-include zppy-interfaces zppy_interfaces/global_time_series/index_template.html
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried zppy-interfaces, zppy_interfaces, and zi-global-time-series here.

MANIFEST.in Outdated
Comment on lines 1 to 2
recursive-include zppy-interfaces zppy_interfaces/global_time_series/zppy_land_fields.csv
recursive-include zppy-interfaces zppy_interfaces/global_time_series/index_template.html
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command is for recursively including all files of a given type in the package:

Suggested change
recursive-include zppy-interfaces zppy_interfaces/global_time_series/zppy_land_fields.csv
recursive-include zppy-interfaces zppy_interfaces/global_time_series/index_template.html
recursive-include zppy_interfaces *.csv
recursive-include zppy_interfaces *.html

This is not the syntax to use for including individual files, which I would not recommend. You should not have cvs or html files in the python package that you don't want to include in the conda package.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar Hmm I'm still getting the same error even with this change.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@forsyth2, I see. I think the problem is 2-fold and I hadn't realized that. I think we have solved the first problem. Can you verify that, after calling pip install ., you can find the place where zppy-interfaces is installed and that both the csv and html file are included there along with the python files? I think that will be the case with the MANIFEST.in as we have it now.

The next problem is that you are looking for files in a location that e3sm_diags would move them to, but this is not how zppy-interfaces works. You need to use importlib-resources to find the files within the zppy-interfaces package. I will try to make the relevant suggested changes.

Copy link

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is how I would handle finding the csv and html files in the package. INCLUSIONS_DIR should not be needed.

from bs4 import BeautifulSoup

from zppy_interfaces.global_time_series.coupled_global_utils import (
INCLUSIONS_DIR,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
INCLUSIONS_DIR,

td.append(a)
row_obj.append(td)

path: str = os.path.join(INCLUSIONS_DIR, "index_template.html")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
path: str = os.path.join(INCLUSIONS_DIR, "index_template.html")
path: str = str(imp_res.files("zppy_interfaces.global_time_series") /
"index_template.html")

Comment on lines 3 to 5
# Relies on MANIFEST.in to include files
INCLUSIONS_DIR = "zppy_interfaces/global_time_series"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Relies on MANIFEST.in to include files
INCLUSIONS_DIR = "zppy_interfaces/global_time_series"


from zppy_interfaces.global_time_series.coupled_global_plotting import make_plot_pdfs
from zppy_interfaces.global_time_series.coupled_global_utils import (
INCLUSIONS_DIR,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
INCLUSIONS_DIR,

def construct_land_variables(requested_vars: List[str]) -> List[Variable]:
var_list: List[Variable] = []
header = True
with open(f"{INCLUSIONS_DIR}/zppy_land_fields.csv", newline="") as csv_file:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with open(f"{INCLUSIONS_DIR}/zppy_land_fields.csv", newline="") as csv_file:
path: csv_filename = str(imp_res.files("zppy_interfaces.global_time_series") /
"zppy_land_fields.html")
with open(csv_filename, newline="") as csv_file:

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Dec 3, 2024

@xylar Thanks, those changes work!

@xylar
Copy link

xylar commented Dec 3, 2024

Wonderful! Glad I could help.

Comment on lines 108 to 109
# ['AR', 'time_bounds', 'CWDC', 'FSH', 'GPP', 'H2OSNO', 'HR', 'LAISHA', 'LAISUN', 'NBP', 'QINTR', 'QOVER', 'QRUNOFF', 'QSOIL', 'QVEGE', 'QVEGT', 'RH2M', 'SOIL1C', 'SOIL2C', 'SOIL3C', 'SOIL4C', 'SOILWATER_10CM', 'TOTLITC', 'TOTVEGC', 'TSA', 'WOOD_HARVESTC', 'lon_bnds', 'lat_bnds']
# TODO: looks like we don't actually have area or landfrac in the dataset
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looked like we didn't have area or landfrac in our test input data (see the results of this logger line).

I tried adding the extra_vars = "area,landfrac" line below in the zppy cfg, to produce a new set of test input data, but zppy-interfaces is still showing that the key "area" doesn't exist: [ERROR]: coupled_global.py(set_var:184) >> "No variable named 'area'.

  [[ lnd_monthly_glb ]]
  extra_vars = "area,landfrac"
  frequency = "monthly"
  input_files = "elm.h0"
  input_subdir = "archive/lnd/hist"
  mapping_file = "glb"
  vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
  years = "1985:1995:5",

These variables exist in the zppy input though:

cd /lcrc/group/e3sm2/ac.wlin/E3SMv3/v3.LR.historical_0051/archive/lnd/hist
ncdump -h v3.LR.historical_0051.elm.h0.1850-01.nc | grep "float area"
#	float area(lat, lon) ;
ncdump -h v3.LR.historical_0051.elm.h0.1850-01.nc | grep "float landfrac"
#	float landfrac(lat, lon) ;

Maybe they need to be added to vars rather than extra_vars.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@forsyth2 do you have a copy of the global time series files that ncclimo generated? I thought both should be fixed variables included there, if not we should try include both as extra variables.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have a copy of the global time series files that ncclimo generated

Yeah, that's the input data I'm testing on.

$ cd /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post_until_20241203/lnd/glb/ts/monthly/5yr
$ ls
AR_198501_198912.nc      H2OSNO_199001_199412.nc  QINTR_198501_198912.nc    QVEGE_199001_199412.nc   SOIL3C_198501_198912.nc          TOTVEGC_199001_199412.nc
AR_199001_199412.nc      HR_198501_198912.nc      QINTR_199001_199412.nc    QVEGT_198501_198912.nc   SOIL3C_199001_199412.nc          TSA_198501_198912.nc
CWDC_198501_198912.nc    HR_199001_199412.nc      QOVER_198501_198912.nc    QVEGT_199001_199412.nc   SOIL4C_198501_198912.nc          TSA_199001_199412.nc
CWDC_199001_199412.nc    LAISHA_198501_198912.nc  QOVER_199001_199412.nc    RH2M_198501_198912.nc    SOIL4C_199001_199412.nc          WOOD_HARVESTC_198501_198912.nc
FSH_198501_198912.nc     LAISHA_199001_199412.nc  QRUNOFF_198501_198912.nc  RH2M_199001_199412.nc    SOILWATER_10CM_198501_198912.nc  WOOD_HARVESTC_199001_199412.nc
FSH_199001_199412.nc     LAISUN_198501_198912.nc  QRUNOFF_199001_199412.nc  SOIL1C_198501_198912.nc  SOILWATER_10CM_199001_199412.nc
GPP_198501_198912.nc     LAISUN_199001_199412.nc  QSOIL_198501_198912.nc    SOIL1C_199001_199412.nc  TOTLITC_198501_198912.nc
GPP_199001_199412.nc     NBP_198501_198912.nc     QSOIL_199001_199412.nc    SOIL2C_198501_198912.nc  TOTLITC_199001_199412.nc
H2OSNO_198501_198912.nc  NBP_199001_199412.nc     QVEGE_198501_198912.nc    SOIL2C_199001_199412.nc  TOTVEGC_198501_198912.nc

$ cd /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post_20241203_area_landfrac_as_extra_vars/lnd/glb/ts/monthly/5yr
$ ls
AR_198501_198912.nc      H2OSNO_199001_199412.nc  QINTR_198501_198912.nc    QVEGE_199001_199412.nc   SOIL3C_198501_198912.nc          TOTVEGC_199001_199412.nc
AR_199001_199412.nc      HR_198501_198912.nc      QINTR_199001_199412.nc    QVEGT_198501_198912.nc   SOIL3C_199001_199412.nc          TSA_198501_198912.nc
CWDC_198501_198912.nc    HR_199001_199412.nc      QOVER_198501_198912.nc    QVEGT_199001_199412.nc   SOIL4C_198501_198912.nc          TSA_199001_199412.nc
CWDC_199001_199412.nc    LAISHA_198501_198912.nc  QOVER_199001_199412.nc    RH2M_198501_198912.nc    SOIL4C_199001_199412.nc          WOOD_HARVESTC_198501_198912.nc
FSH_198501_198912.nc     LAISHA_199001_199412.nc  QRUNOFF_198501_198912.nc  RH2M_199001_199412.nc    SOILWATER_10CM_198501_198912.nc  WOOD_HARVESTC_199001_199412.nc
FSH_199001_199412.nc     LAISUN_198501_198912.nc  QRUNOFF_199001_199412.nc  SOIL1C_198501_198912.nc  SOILWATER_10CM_199001_199412.nc
GPP_198501_198912.nc     LAISUN_199001_199412.nc  QSOIL_198501_198912.nc    SOIL1C_199001_199412.nc  TOTLITC_198501_198912.nc
GPP_199001_199412.nc     NBP_198501_198912.nc     QSOIL_199001_199412.nc    SOIL2C_198501_198912.nc  TOTLITC_199001_199412.nc
H2OSNO_198501_198912.nc  NBP_199001_199412.nc     QVEGE_198501_198912.nc    SOIL2C_199001_199412.nc  TOTVEGC_198501_198912.nc

$ cd /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post_20241203_area_landfrac_as_vars
# no lnd directory
$ cd scripts
$ grep -in "error" *.o*
#ts_lnd_monthly_glb_1985-1989-0005.o639437#:15:ncrcat: ERROR no variables fit criteria for processing
#ts_lnd_monthly_glb_1985-1989-0005.o639437#:17:ncrcat: ERROR no variables fit criteria for processing
#ts_lnd_monthly_glb_1985-1989-0005.o639437#:19:ncclimo: ERROR Failed to split. cmd_sbs[0] failed. Debug this:
ts_lnd_monthly_glb_1985-1989-0005.o639437:15:ncrcat: ERROR no variables fit criteria for processing
ts_lnd_monthly_glb_1985-1989-0005.o639437:17:ncrcat: ERROR no variables fit criteria for processing
ts_lnd_monthly_glb_1985-1989-0005.o639437:19:ncclimo: ERROR Failed to split. cmd_sbs[0] failed. Debug this:
ts_lnd_monthly_glb_1990-1994-0005.o639438:15:ncrcat: ERROR no variables fit criteria for processing
ts_lnd_monthly_glb_1990-1994-0005.o639438:17:ncrcat: ERROR no variables fit criteria for processing
ts_lnd_monthly_glb_1990-1994-0005.o639438:19:ncclimo: ERROR Failed to split. cmd_sbs[0] failed. Debug this:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. You are right, neither area and landfrac in global mean datasets. I think it makes sense to try if both can be added to vars.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But adding to vars doesn't work either -- that's that third ls block above.

From ts_lnd_monthly_glb_1985-1989-0005.o639437:

ncrcat: ERROR no variables fit criteria for processing
ncrcat: HINT Extraction list must contain a record variable to concatenate. A record variable is a variable defined with a record dimension. Often the record dimension\
, aka unlimited dimension, refers to time. To change an existing dimension from a fixed to a record dimensions see http://nco.sf.net/nco.html#mk_rec_dmn or to add a ne\
w record dimension to all variables see http://nco.sf.net/nco.html#ncecat_rnm
ncrcat: ERROR no variables fit criteria for processing
ncrcat: HINT Extraction list must contain a record variable to concatenate. A record variable is a variable defined with a record dimension. Often the record dimension\
, aka unlimited dimension, refers to time. To change an existing dimension from a fixed to a record dimensions see http://nco.sf.net/nco.html#mk_rec_dmn or to add a ne\
w record dimension to all variables see http://nco.sf.net/nco.html#ncecat_rnm
ncclimo: ERROR Failed to split. cmd_sbs[0] failed. Debug this:

That is, area, and landfrac clearly are included in the original nc files but I just can't seem to extract them either through extra_vars or vars.

Copy link

@chengzhuzhang chengzhuzhang Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@forsyth2 thanks for testing both. It looks like extra_vars does work with generic variable splitting (e.x. land_monthly task) but not when --glb flag is on (e.x. land_monthly_global). I think there are a few possibilities for moving forward:

  1. include a [land_monthly] task to get both area and landfrac, and use both in downstream.
  2. based on @czender 's comment , there is a new version of ncclimo that supports global integral scaling. To do so, we need additional logic to filter through variables in the time-series task.
  3. we can ask @czender if it is possible to add extra-vars support in --glb mode, to keep area and landfrac in the global time-series files. In the mean time, I think you could tentatively use the first approach to get values of two variables..

Comment on lines 126 to 133
logger.error(f"area.shape={area.shape}") # area.shape=(180, 360)
logger.error(
f"landfrac.shape={landfrac.shape}"
) # landfrac.shape=(180, 360)
logger.error(
f"data_array.shape={data_array.shape}"
) # data_array.shape=(10, 3)
# e: dimensions cannot change for in-place operations
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengzhuzhang Ok, using the separate land_monthly non-global ts task (option 1 above), I can get area and landfrac, but now I'm having issues with the dimensions. I'm not sure how to take (180,360) to (10,3).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok I think I have a reasonable dimensionality reduction:

            if metric == Metric.TOTAL:
                logger.debug(
                    f"self.extra_var_dataset.keys()={list(self.extra_var_dataset.keys())}"
                )
                area: xarray.core.dataarray.DataArray = self.extra_var_dataset["area"]
                landfrac: xarray.core.dataarray.DataArray = self.extra_var_dataset["landfrac"]
                # area.shape() = (180, 360)
                total_area = area.sum() # Sum over all dimensions
                # landfrac.shape=(180, 360)
                average_landfrac = landfrac.mean() # Mean over all dimensions
                total_land_area = total_area * average_landfrac
                # data_array.shape = (number of years, number of regions)
                data_array *= total_land_area
                logger.info(
                    "for Metric.TOTAL, data_array has been scaled by total land area"
                )

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @forsyth2 and @chengzhuzhang, I'm trying to understand the change(s) you would like in ncclimo. As you know, area and landfrac can be output in every default (non-spatially averaged) timeseries by requesting them with the --var_xtr=area,landfrac. However, as @forsyth2 has discovered, this behavior changes when requesting global/regional-average timeseries. Currently the spatial-mean timeseries output includes only the geophysical field of interest, no matter what was requested with --var_xtr.

I would be happy to change that behavior so that in spatial-averaged timeseries files ...

  1. Always contain the full (i.e., non-spatially averaged) variables requested with --var_xtr OR
  2. Always contain the full area and landfrac variables OR
  3. Behave as currently by default, and contain the full area and landfrac variables when an additional switch is provided
  4. Behave as currently by default, and contain the full variables requested with --var_xtr when an additional switch is provided.

These changes would be relatively straightforward to implement. Keep in mind that the current behavior was implemented so that spatial-mean timeseries files are orders of magnitude smaller than their non-spatial-average counterparts. Adding full fields to spatial average timeseries will significantly inflate their size. However, area and landfrac are time-constant variables so the resulting filesizes would still be much smaller than full timeseries files if only these variables are added.

Also happy to discuss other options, like providing a sane way for zppy to provide the desired scale factors to ncclimo which would then apply them to the fields during spatial-average timeseries generation. Feedback welcome.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@forsyth2 The "dimensionality reduction" you mention above triggers some red flags with me. In particular this line
total_land_area = total_area * average_landfrac is questionable because area and landfrac can vary indepently so I would expect the total land area to be defined by total_land_area = (area*landfrac).sum. If I am right, the former method would produce small errors that might not be noticed (e.g., 1-2%) until/unless carefully examined. Or I could be totally off base because I'm not sure how you're using the variables in the code :) In any case, you might try both methods and then compare against a known-to-be-correct answer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would say (1) or (4) since it allows users to set whatever extra variable they want rather than hard-coding in area or landfrac.

I appear to have a workaround here working now without these changes in ncclimo, but it is a bit clunky.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@czender I was thinking the same, to have the product area_times_landfrac would be easiest for zppy in this use case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and perhaps making the new additon default would be fine, cause the new variables is small.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I will add a field called something likearea_times_landfrac to the output of all spatially-averaged timeseries. I'm headed to AGU on Friday and will return 12/16. Hopefully finish and release this in NCO 5.3.0 sometime that week.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your help, Charlie! Have a nice trip to DC.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you @czender!

Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengzhuzhang I think this is getting pretty close to done. I just have a few comments, and then I'll ask the Land team to review further. I also want to update the unit tests too.

Results are visible at https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zi-test-webdir/global_time_series_1985-1995_results_viewers/ (note: that's not a stable link, meaning it will update with every time I run updated code). How does that look?

Corresponding changes on the zppy side are at https://github.com/E3SM-Project/zppy/pull/648/files

@@ -47,6 +47,7 @@ jobs:
# since the action is run on a branch in detached head state.
# This is the equivalent of running "pre-commit run --all-files" locally.
# If you commit with the `--no-verify` flag, this check may fail.
# TODO: this doesn't seem to run when I run `git commit` locally. I always have to run `pre-commit run --all-files` manually.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomvothecoder Not sure if you've had a chance to look at this yet, but I'm not quite sure how to fix this.

Comment on lines 121 to 134
if "area_times_landfrac" in keys:
total_land_area = self.extra_var_dataset["area_times_landfrac"]
else:
area: xarray.core.dataarray.DataArray = self.extra_var_dataset[
"area"
]
landfrac: xarray.core.dataarray.DataArray = self.extra_var_dataset[
"landfrac"
]
# area.shape() = (180, 360)
# landfrac.shape() =(180, 360)
total_land_area = (area * landfrac).sum() # Sum over all dimensions
# data_array.shape = (number of years, number of regions)
data_array *= total_land_area
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengzhuzhang This is how I'm handling the total_land_area scaling now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@forsyth2 it is okay to use this logic for now for demonstration. The scaling is only good for global area, but not for N./S. hemisphere, which we can update later after ncclimo includes the scaling factor.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is possible to get N_land_area and S_land_area, but sub-select with lat >= 0 or <= 0 for both area and landfrac.

Copy link
Collaborator Author

@forsyth2 forsyth2 Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried updating to the following but I'm still getting some blank plots (strangely it's inconsistent if it's the glb or n or s plot that's empty).

            if metric == Metric.TOTAL:
                keys = list(self.extra_var_dataset.keys())
                logger.debug(f"self.extra_var_dataset.keys()={keys}")
                if "area_times_landfrac" in keys:
                    total_land_area = self.extra_var_dataset["area_times_landfrac"]
                else:
                    area: xarray.core.dataarray.DataArray = self.extra_var_dataset[
                        "area"
                    ]
                    landfrac: xarray.core.dataarray.DataArray = self.extra_var_dataset[
                        "landfrac"
                    ]
                    # area.shape() = (180, 360)
                    # landfrac.shape() = (180, 360)
                    total_land_area = (area * landfrac).sum()  # Sum over all dimensions

                    # Account for hemispheric plots:
                    north_area = area.where(area.lat >= 0)
                    south_area = area.where(area.lat <= 0)
                    north_landfrac = landfrac.where(landfrac.lat >= 0)
                    south_landfrac = landfrac.where(landfrac.lat <= 0)
                    north_land_area = (north_area * north_landfrac).sum()
                    south_land_area = (south_area * south_landfrac).sum()

                # data_array.shape = (number of years, number of regions)
                # We want to keep those dimensions, but with these values:
                # (glb*total_land_area, n*north_land_area, s*south_land_area)
                data_array[:, 0] *= total_land_area
                data_array[:, 1] *= north_land_area
                data_array[:, 2] *= south_land_area

Comment on lines 23 to 31
if extra_vars_directory:
# If an extra_vars_directory is provided, we'll use that to open a new dataset
self.extra_var_dataset = xcdat.open_mfdataset(
f"{extra_vars_directory}*.nc",
center_times=True,
)
else:
# Otherwise, we'll use the same dataset.
self.extra_var_dataset = self.dataset
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengzhuzhang Until ncclimo is updated to allow extra vars in glb runs, we can just specify an alternative path (i.e., to the [land_monthly] output).

zppy_interfaces/multi_utils/logger.py Outdated Show resolved Hide resolved
@chengzhuzhang
Copy link

chengzhuzhang commented Dec 5, 2024

@chengzhuzhang I think this is getting pretty close to done. I just have a few comments, and then I'll ask the Land team to review further. I also want to update the unit tests too.

Results are visible at https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zi-test-webdir/global_time_series_1985-1995_results_viewers/ (note: that's not a stable link, meaning it will update with every time I run updated code). How does that look?

Corresponding changes on the zppy side are at https://github.com/E3SM-Project/zppy/pull/648/files

This looks good! I would suggest to update Results to something more descriptive, for instance, "Zppy global time-series plot: v3.LR.historical_0051 (1985-1995)", so that it is clear to users when comparing with other viewers generated from other runs. Free free to share with land team to get feedback. Thank you!

Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thorntonpe @BunnyVon (and maybe also @dmricciuto @wlin7) This is ready for review.

For background context, we recently decided to split zppy into two packages: zppy itself will do its original purpose of orchestrating workflows while this new package zppy-interfaces will handle the "last mile" stretches of glue code to plot or otherwise post-process data. (I.e. zppy should not be doing any post-processing itself, only coordinating other post-processing tools.)

I have test results at https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zi-test-webdir/global_time_series_1985-1995_results_viewers/.

How to run your own tests

I think you also wanted to run some of your own test cases before we merge this PR. To do that:

# Set up zppy-interfaces
cd ~
git clone git@github.com:E3SM-Project/zppy-interfaces.git
cd zppy-interfaces
git fetch origin issue-601-viewers
git checkout -b test_land_viewers origin/issue-601-viewers
conda clean --all --y
conda env create -f conda/dev.yml -n my_zi_env
conda activate my_zi_env
pip install .

# Set up zppy
cd ~
git clone git@github.com:E3SM-Project/zppy.git
cd zppy
git fetch origin issue-601-viewers
git checkout -b test_land_viewers origin/issue-601-viewers
conda clean --all --y
conda env create -f conda/dev.yml -n my_zppy_env
conda activate my_zppy_env
pip install .

# Create a cfg
cd ~
emacs test_land_viewers.cfg
zppy -c test_land_viewers.cfg
Sample cfg

Below is an example cfg that you could modify with your own input data and settings. (In particular, these parameters will likely need changing: case, input, output, www, experiment_name, figstr)

[default]
case = "v3.LR.historical_0051"
constraint = ""
dry_run = "False"
environment_commands = ""
fail_on_dependency_skip = True
guess_path_parameters = False
guess_section_parameters = False
input = /lcrc/group/e3sm2/ac.wlin/E3SMv3/v3.LR.historical_0051
input_subdir = archive/atm/hist
mapping_file = "map_ne30pg2_to_cmip6_180x360_aave.20200201.nc"
output = "/lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_output/unique_id/v3.LR.historical_0051"
partition = "debug"
qos = "regular"
www = "/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_www/unique_id"
years = "1985:1989:2",

[ts]
active = True
e3sm_to_cmip_environment_commands = ""
walltime = "00:30:00"

  [[ land_monthly ]]
  extra_vars = "area,landfrac"
  frequency = "monthly"
  input_files = "elm.h0"
  input_subdir = "archive/lnd/hist"
  mapping_file = "map_r05_to_cmip6_180x360_aave.20231110.nc"
  years = "1985:1995:5",
  vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"

  [[ atm_monthly_glb ]]
  # Note global average won't work for 3D variables.
  frequency = "monthly"
  input_files = "eam.h0"
  input_subdir = "archive/atm/hist"
  mapping_file = "glb"
  years = "1985:1995:5",

  [[ lnd_monthly_glb ]]
  frequency = "monthly"
  input_files = "elm.h0"
  input_subdir = "archive/lnd/hist"
  mapping_file = "glb"
  vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
  years = "1985:1995:5",

[global_time_series]
active = True
climo_years = "1985-1989", "1990-1995",
environment_commands = "source <INSERT PATH TO CONDA>/conda.sh; conda activate my_zi_env"
experiment_name = "v3.LR.historical_0051"
figstr = "v3.LR.historical_0051"
make_viewer = True
num_cols = 1
num_rows = 1
plots_lnd = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
ts_num_years = 5
ts_years = "1985-1989", "1985-1995",
walltime = "00:30:00"
years = "1985-1995",

Reviewing code changes

I'm not sure how relevant the code changes themselves are to you, but I've marked some important areas as part of this review. Notably, the bulk of the changes are in this PR, but there are also a few changes on the zppy side: https://github.com/E3SM-Project/zppy/pull/648/files

Comment on lines 113 to 181
# Non-derived variables
annual_average_dataset_for_var: xarray.core.dataset.Dataset = (
self.dataset.temporal.group_average(var, "year")
)
data_array = annual_average_dataset_for_var.data_vars[var]
if metric == Metric.TOTAL:
keys = list(self.extra_var_dataset.keys())
logger.debug(f"self.extra_var_dataset.keys()={keys}")
if "area_times_landfrac" in keys:
total_land_area = self.extra_var_dataset["area_times_landfrac"]
else:
area: xarray.core.dataarray.DataArray = self.extra_var_dataset[
"area"
]
landfrac: xarray.core.dataarray.DataArray = self.extra_var_dataset[
"landfrac"
]
# area.shape() = (180, 360)
# landfrac.shape() = (180, 360)
total_land_area = (area * landfrac).sum() # Sum over all dimensions

# Account for hemispheric plots:
north_area = area.where(area.lat >= 0)
south_area = area.where(area.lat <= 0)
north_landfrac = landfrac.where(landfrac.lat >= 0)
south_landfrac = landfrac.where(landfrac.lat <= 0)
north_land_area = (north_area * north_landfrac).sum()
south_land_area = (south_area * south_landfrac).sum()

# data_array.shape = (number of years, number of regions)
# We want to keep those dimensions, but with these values:
# (glb*total_land_area, n*north_land_area, s*south_land_area)
data_array[:, 0] *= total_land_area
data_array[:, 1] *= north_land_area
data_array[:, 2] *= south_land_area
units = data_array.units
# `units` will be "1" if it's a dimensionless quantity
if (units != "1") and (original_units != "") and original_units != units:
raise ValueError(
f"Units don't match up: Have {units} but expected {original_units}. This renders the supplied scale_factor ({scale_factor}) unusable."
)
if (scale_factor != 1) and (final_units != ""):
data_array *= scale_factor
units = final_units
return data_array, units
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the code block where we do average or total calculations.

valid_vars,
invalid_vars,
rgn,
extra_vars_dict=exp["land"].replace("glb", "180x360_aave"),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, ncclimo can't generate area and landfrac for globally averaged datasets. So, we're currently relying on having a separate land_monthly task defined in the cfg to set up these variables.

@forsyth2 forsyth2 marked this pull request as ready for review December 9, 2024 21:58
@czender
Copy link

czender commented Dec 17, 2024

@forsyth2 and @chengzhuzhang,

While area_times_landfrac is a descriptive name, is is ELM-specific.
Other models have different names for the valid fraction of gridcells.
I'm mainly thinking of MPAS-SI and CICE, though there may be more.
And the zppy feature of scaling sums should therefore work for all models.

Hence I suggest that ncclimo name the new field valid_area_per_gridcell where:
valid_area_per_gridcell=area*landfrac for ELM
valid_area_per_gridcell=area*1 for EAM
valid_area_per_gridcell=area*sgs_frc when invoked with --sgs_frc option.

The latest ncclimo snapshot contains this modification.
Let me know if you disagree and prefer a different name.

It is unclear whether you want ncclimo to output variables specified with --var_xtr in global timeseries mode.
That would be easy to enable. Currently that is disabled just to keep the global timeseries files small and tidy.
However, this behavior is not as flexible, and prevents users from requesting, e.g., --var_xtr=area,landfrac for scaling the global timeseries themselves as they see fit.

Feedback welcome.

Charlie

@chengzhuzhang
Copy link

chengzhuzhang commented Dec 17, 2024

Thank you for the update, Charlie @czender
I think the current implementation looks good, and can help simply Ryan 's workflow to generate global time series plots. @forsyth2 could you test Charlie's update. Charlie can correct me but I believe for testing the new snapshot, what it takes is to replace the ncclimo call with /global/homes/z/zender/bin_perlmutter/ncclimo --npo as what is used in here

@forsyth2
Copy link
Collaborator Author

@czender Thanks, I like the valid_area_per_gridcell name.

this behavior is not as flexible, and prevents users from requesting, e.g.,--var_xtr=area,landfrac for scaling the global timeseries themselves as they see fit.

My personal preference would be to enable that functionality. It's nice to have the flexibility.

replace the ncclimo call with /global/homes/z/zender/bin_perlmutter/ncclimo --npo

I'm working on Chrysalis, so I'll try /home/ac.zender/bin_chrysalis/ncclimo --npo

@forsyth2
Copy link
Collaborator Author

@czender I get the following regardless if I set vars = "valid_area_per_gridcell,... or extra_vars = "valid_area_per_gridcell"

The following have been reloaded with a version change:
  1) openmpi/4.1.3-sxfyy4k => openmpi/4.0.4-hpcx-hghvhj5

ncclimo: ERROR /home/ac.zender/bin_chrysalis/ncks dies with error message on next line:
ncks: error while loading shared libraries: libgsl.so.25: cannot open shared object file: No such file or directory

@czender
Copy link

czender commented Dec 19, 2024

@forsyth2 That's because I haven't built NCO on Chrysalis (as opposed to Perlmutter) in a long time. I will try to do so today and ping you then.

@czender
Copy link

czender commented Dec 19, 2024

@forsyth2 Please try again on Chrysalis (and/or Perlmutter). I have rebuilt the latest snapshot there. BTW, the latest snapshot always includes area and landfrac variables in global timeseries output. No need to request them specially with --xtr_var. Is that OK with you or do you want to have to request them with --xtr_var?

@forsyth2
Copy link
Collaborator Author

@czender Hmm now I'm getting

ncrcat: ERROR nco_xtr_mk() reports user-supplied variable name (or regular expression) 'valid_area_per_gridcell' is not in (or rx does not match any) contents of input file

That's with this cfg snippet:

[ts]

  [[ land_monthly ]]
  vars = "valid_area_per_gridcell,..."

Alternatively the error in #9 (comment) appears when using:

[ts]

  [[ land_monthly ]]
  vars_xtra = "valid_area_per_gridcell"

What's the proper usage?

valid_area_per_gridcell ...in land_monthly subtask ...in lnd_monthly_glb subtask
include in var_xtr... shared library error shared library error
include in vars... not in contents of input file shared library error
don't include at all... shared library error shared library error

The one change in ts.bash I have is:

-cat input.txt | ncclimo \
+cat input.txt | /home/ac.zender/bin_chrysalis/ncclimo --npo \

always includes area and landfrac variables in global timeseries output

That seems reasonable to me.

@czender
Copy link

czender commented Dec 20, 2024

@forsyth2 Sorry for the problems. The intended proper usage is your last row, don't add any switches at all, and everything you might want will be in the output files in global timeseries mode. As for why you get shared library errors when invoking ncclimo --npo...I'm unsure. Let me see investigate...but please paste samples of the errors below since it's hard for me to pretend to be a different user of my own binaries.

@forsyth2
Copy link
Collaborator Author

Thanks @czender!

To put everything in one place, I just ran zppy -c tests/integration/generated/test_min_case_global_time_series_comprehensive_v3_setup_only_chrysalis.cfg. That produces output here:

cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_output/zppy_601_viewers_updated_nco_no_extra_vars/v3.LR.historical_0051/post/scripts

grep -v "OK" *status
# ts_lnd_monthly_glb_1985-1989-0005.status:ERROR (2)
# ts_lnd_monthly_glb_1990-1994-0005.status:ERROR (2)

cat ts_lnd_monthly_glb_19*.o*

That gives:

ts_only

The following have been reloaded with a version change:
  1) openmpi/4.1.3-sxfyy4k => openmpi/4.0.4-hpcx-hghvhj5

ncclimo: ERROR /home/ac.zender/bin_chrysalis/ncks dies with error message on next line:
ncks: error while loading shared libraries: libgsl.so.25: cannot open shared object file: No such file or directory
ts_only

The following have been reloaded with a version change:
  1) openmpi/4.1.3-sxfyy4k => openmpi/4.0.4-hpcx-hghvhj5

ncclimo: ERROR /home/ac.zender/bin_chrysalis/ncks dies with error message on next line:
ncks: error while loading shared libraries: libgsl.so.25: cannot open shared object file: No such file or directory

(For reference, E3SM-Project/zppy@ee60560 includes the changes for this)

@forsyth2
Copy link
Collaborator Author

And the cfg is tests/integration/generated/test_min_case_global_time_series_comprehensive_v3_setup_only_chrysalis.cfg:

[default]
case = "v3.LR.historical_0051"
constraint = ""
dry_run = "False"
environment_commands = ""
fail_on_dependency_skip = True
guess_path_parameters = False
guess_section_parameters = False
input = /lcrc/group/e3sm2/ac.wlin/E3SMv3/v3.LR.historical_0051
input_subdir = archive/atm/hist
mapping_file = "map_ne30pg2_to_cmip6_180x360_aave.20200201.nc"
output = "/lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_output/zppy_601_viewers_updated_nco_no_extra_vars/v3.LR.historical_0051"
partition = "debug"
qos = "regular"
www = "/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_www/zppy_601_viewers_updated_nco_no_extra_vars"
years = "1985:1989:2",

[ts]
active = True
e3sm_to_cmip_environment_commands = ""
walltime = "00:30:00"

  [[ atm_monthly_glb ]]
  active = False
  # Note global average won't work for 3D variables.
  frequency = "monthly"
  input_files = "eam.h0"
  input_subdir = "archive/atm/hist"
  mapping_file = "glb"
  years = "1985:1995:5",

  [[ lnd_monthly_glb ]]
  frequency = "monthly"
  input_files = "elm.h0"
  input_subdir = "archive/lnd/hist"
  mapping_file = "glb"
  vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
  years = "1985:1995:5",

[mpas_analysis]
active = False
anomalyRefYear = 1985
climo_years = "1985-1989", "1990-1995",
enso_years = "1985-1989", "1990-1995",
mesh = "IcoswISC30E3r5"
parallelTaskCount = 6
partition = "compute"
qos = "regular"
shortTermArchive = True
ts_years = "1985-1989", "1985-1995",
walltime = "00:30:00"

# (This cfg is the setup portion only)
# [global_time_series]
# active = True
# climo_years = "1985-1989", "1990-1995",
# environment_commands = "source <INSERT PATH TO CONDA>/conda.sh; conda activate <INSERT ENV NAME>"
# experiment_name = "v3.LR.historical_0051"
# figstr = "v3.LR.historical_0051"
# moc_file=mocTimeSeries_1985-1995.nc
# plots_lnd = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
# ts_num_years = 5
# ts_years = "1985-1989", "1985-1995",
# walltime = "00:30:00"
# years = "1985-1995",

@czender
Copy link

czender commented Dec 20, 2024

You have given me all the information I need. I think. I altered the --npo paths. This might fix the problem you encountered. Please try again on Chrysalis now...

@czender
Copy link

czender commented Dec 20, 2024

@forsyth2 Did those latest --npo changes work for you and zppy? Would like to release NCO 5.3.0 now if they did...

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Dec 20, 2024

@czender Thanks, I've gotten a lot further, but still seeing blank plots on the zppy-interfaces side. However, that could be an implementation problem on my end*. I'm still debugging. It does seem like valid_area_per_gridcell is available to use though, so if that's what's relevant for releasing NCO 5.3.0, then I suppose you can do that.

*For reference, the latest relevant logic is:

                if "valid_area_per_gridcell" in keys:
                    logger.debug("Metric.TOTAL -- Using valid_area_per_gridcell")
                    land_area_per_gridcell = self.dataset["valid_area_per_gridcell"]
                    total_land_area = land_area_per_gridcell.sum()  # Sum over all dimensions
                    north_land_area = land_area_per_gridcell.where(land_area_per_gridcell.lat >= 0)
                    south_land_area = land_area_per_gridcell.where(land_area_per_gridcell.lat <= 0)

@czender
Copy link

czender commented Dec 20, 2024

@forsyth2 Your logic looks good. Not sure why plots are blank. I will hold-off releasing until this works in zppy.

However, these lines will double-count gridcells centered on the equator so that north+south != total:

           total_land_area = land_area_per_gridcell.sum()  # Sum over all dimensions
                   north_land_area = land_area_per_gridcell.where(land_area_per_gridcell.lat >= 0)
                   south_land_area = land_area_per_gridcell.where(land_area_per_gridcell.lat <= 0)

To avoid this NCO defines south < 0.0 and north >= 0.0 latitude. I suggest zppy do likewise.

@forsyth2
Copy link
Collaborator Author

@czender I'm still running into issues with properly scaling the data by the valid area. I do feel like the problem is on the zppy-interfaces side, but obviously can't fully confirm that yet. I'm going to have to look into this more next week.

As far as the release schedule, mainly we just need the new NCO by the Jan. 15 release candidate deadline for E3SM Unified, for testing.

@forsyth2
Copy link
Collaborator Author

Mostly for my reference:

zppy steps
  • Relevant ts.bash change:
- cat input.txt | ncclimo \
+ cat input.txt | /home/ac.zender/bin_chrysalis/ncclimo --npo \
  • Ran pip install .
  • Set UNIQUE_ID = "zppy_601_viewers_updated_nco_no_extra_vars_v3" in python tests/integration/utils.py
  • Ran python tests/integration/utils.py && zppy -c tests/integration/generated/test_min_case_global_time_series_comprehensive_v3_setup_only_chrysalis.cfg
zppy-interfaces steps
  • Update the test input data with the output of the zppy run using the latest NCO:
mv /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post/ /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post_until_20241220
mv /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_comprehensive_v3_setup_only_output/zppy_601_viewers_updated_nco_no_extra_vars_v3/v3.LR.historical_0051/post/ /lcrc/group/e3sm/ac.forsyth2/zi-test-input-data/post/
cd tests/integration/global_time_series
pip install ~/ez/zppy-interfaces && python cases_global_time_series.py

Plots are always showing up blank, even after deleting cache.

@forsyth2
Copy link
Collaborator Author

Turns out it may have been a cache problem after all. I'm seeing plots now on https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zi-test-webdir/global_time_series_1985-1995_results_viewers/table_lnd/index.html. Maybe I needed to delete more cached data?? Or use a different browser?

@forsyth2
Copy link
Collaborator Author

@czender As far as I can tell, the NCO fix is in fact working. The cached empty plot issue was very unfortunate; it turns out the NCO fixes were working on Friday after all.

@czender
Copy link

czender commented Dec 23, 2024

FYI NCO 5.3.0 has been released and is in my personal directories on Perlmutter, Chrsalis, and Acme1.

@forsyth2
Copy link
Collaborator Author

An example of our new commit paradigm, discussed on E3SM-Project/zstash#355:

Rebase steps
git checkout issue-601-viewers

# This part is unnecessary, but I like to create a backup branch in case I mess something up
git checkout -b issue-601-viewers-until-20250114
git checkout issue-601-viewers

git log
# We have 19 commits here

# We can probably group them as follows:

# 1. Add Viewer support
# 2. Working land viewer

# 3. Atm viewer exists but is not linked to

# 4. Index points to atm and lnd viewers

# 5. Misc updates
# 6. Refactored coupled_global
# 7. Misc updates
# 8. MANIFEST.in not working
# 9. Updates

# 10. Attempt adding extra_vars
# 11. Working Total plots
# 12. Improved Total plotting

# 13. Add make_viewer parameter

# 14. Only show non empty viewers

# 15. Address comments
# 16. Fix unit tests
# 17. Clean up code

# 18. Attempt using new NCO
# 19. TOTAL plots working

# So, now we can rebase accordingly
git rebase -i 87c9e54c13afa470b879ae4672e5ecfb06cbb514

That gives:

pick 712ed30 Add Viewer support
pick 4a739a2 Working land viewer
pick 4f29d31 Atm viewer exists but is not linked to
pick 8092658 Index points to atm and lnd viewers
pick 22a7a7a Misc updates
pick e8e82be Refactored coupled_global
pick 35042d0 Misc updates
pick b9d16ca MANIFEST.in not working
pick f47f8a9 Updates
pick 9b7430f Attempt adding extra_vars
pick be48809 Working Total plots
pick b2f1dd9 Improved Total plotting
pick e79aaa1 Add make_viewer parameter
pick 77658c1 Only show non empty viewers
pick bff08ed Address comments
pick a9d2bb1 Fix unit tests
pick 13c6245 Clean up code
pick 944966f Attempt using new NCO
pick d09c1a1 TOTAL plots working

Which we'll change to:

pick 712ed30 Add Viewer support
f 4a739a2 Working land viewer
pick 4f29d31 Atm viewer exists but is not linked to
pick 8092658 Index points to atm and lnd viewers
pick 22a7a7a Misc updates
f e8e82be Refactored coupled_global
f 35042d0 Misc updates
f b9d16ca MANIFEST.in not working
f f47f8a9 Updates
pick 9b7430f Attempt adding extra_vars
f be48809 Working Total plots
f b2f1dd9 Improved Total plotting
pick e79aaa1 Add make_viewer parameter
pick 77658c1 Only show non empty viewers
pick bff08ed Address comments
f a9d2bb1 Fix unit tests
f 13c6245 Clean up code
pick 944966f Attempt using new NCO
f d09c1a1 TOTAL plots working
git log
# Now, we're down to 9 commits

# Let's give these commits better names
git rebase -i 87c9e54c13afa470b879ae4672e5ecfb06cbb514

We have:

pick c1031e7 Add Viewer support
pick 75f6a6e Atm viewer exists but is not linked to
pick afae0ef Index points to atm and lnd viewers
pick 74f3f82 Misc updates
pick 51a0285 Attempt adding extra_vars
pick 371cc72 Add make_viewer parameter
pick 52c5aca Only show non empty viewers
pick a8db102 Address comments
pick 113e524 Attempt using new NCO

Let's do:

r c1031e7 Add Viewer support
pick 75f6a6e Atm viewer exists but is not linked to
pick afae0ef Index points to atm and lnd viewers
r 74f3f82 Misc updates
r 51a0285 Attempt adding extra_vars
pick 371cc72 Add make_viewer parameter
pick 52c5aca Only show non empty viewers
pick a8db102 Address comments
r 113e524 Attempt using new NCO

Rename those 4 to:

Working land viewer
Refactored coupled_global
Working Total plots
Total plots working with new NCO

GitHub tells us we're expecting a merge conflict: tests/integration/global_time_series/cases_global_time_series.py

git fetch upstream
git rebase upstream/main

We have:

<<<<<<< HEAD
        "plots_atm": "TREFHT,AODDUST",
=======
        "atmosphere_only": "False",
        "plots_atm": "TREFHT",
>>>>>>> 3a7eee7 (Working land viewer)

Change to:

        "plots_atm": "TREFHT",
git grep -n "<<<" tests
# No more diffs
git add tests/integration/global_time_series/cases_global_time_series.py
git rebase --continue
git push -f upstream issue-601-viewers

@forsyth2
Copy link
Collaborator Author

E3SM-Project/zppy#654 removed scratch, atmosphere_only, and plot_names. None of these occur on https://github.com/E3SM-Project/zppy-interfaces/pull/9/files, except for:

"atmosphere_only": "False",

in tests/integration/global_time_series/cases_global_time_series.py

@forsyth2 forsyth2 merged commit 02e11c9 into main Jan 14, 2025
4 checks passed
@forsyth2 forsyth2 deleted the issue-601-viewers branch January 14, 2025 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Create HTML page for global time series
5 participants