Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance ASCII2NC to read USCRN point observations. #1019

Closed
dwfncar opened this issue Jun 28, 2018 · 5 comments · Fixed by #3049
Closed

Enhance ASCII2NC to read USCRN point observations. #1019

dwfncar opened this issue Jun 28, 2018 · 5 comments · Fixed by #3049
Assignees
Labels
MET: PreProcessing Tools (Point) priority: high High Priority requestor: NCAR/RAL NCAR Research Applications Laboratory type: new feature Make it do something new
Milestone

Comments

@dwfncar
Copy link
Contributor

dwfncar commented Jun 28, 2018

Users at the 2018 Unidata Users's Workshop recommended enhancing ASCII2NC to read point observation data from the NOAA USCRN network:
https://www.ncei.noaa.gov/access/crn/

Also need to support the SCAN data format:
https://www.nrcs.usda.gov/resources/data-and-reports/soil-climate-analysis-network

Also need to support AmeriFlux data:
https://ameriflux.lbl.gov/

Evaluate whether SCAN and AmeriFlux can be supported through this same issue or consider spinning it off into a separate one.


Seems like it'd be pretty straight forward. [MET-1019] created by johnhg

Can be funded by Dynamic Drought/Veg (2785031) key.

@JohnHalleyGotway
Copy link
Collaborator

Recommend doing this work between 12/30 and 1/7.

@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Jan 6, 2025

Moving this issue to "In Progress" status.

For USCRN data, support the "Quality Controlled Datatsets" formats described here:
https://www.ncei.noaa.gov/access/crn/qcdatasets.html

Note that the monthly, daily, hourly, and sub-hourly datasets contain different numbers of columns and contents. Would be nice if ascii2nc were able to handle all of them without the user needing to explicitly state which to use.

JohnHalleyGotway added a commit that referenced this issue Jan 7, 2025
…t I still need to make it work for the variety of USCRN inputs.
JohnHalleyGotway added a commit that referenced this issue Jan 7, 2025
…input files. Need to complete support for other format types and handle the unit strings
@JohnHalleyGotway
Copy link
Collaborator

Discussed progress on this issue during the MET development meeting on Jan 10, 2025. Rather than translating the USCRN variable names into more conventional GRIB naming conventions and units (e.g. converting celsius temperatures to kelvin since that's what's used for GRIB), @DanielAdriaansen recommends passing the names and units from the input directly to the output. Presumably, users of this data will have some familiarity with USCRN observations and the USCRN naming conventions will be familiar to them and that'll provide a more straight-forward solution.

I agree that keeping it simple is wise. However, I do note that ascii2nc does not currently write units to the output point observation files even though the format supports them. As part of this issue, enhance ascii2nc to write the USCRN units to the output. This is especially important because the USCRN temperatures in celsius do not match the GRIB conventions of kelvin. When using these observations in Point-Stat or Ensemble-Stat downstream, the user will generally need to use MET's convert(x) configuration option to convert from celsius to kelvin.

@JohnHalleyGotway
Copy link
Collaborator

@anewman89, I have a question about the drought data, describe here:
https://www.ncei.noaa.gov/pub/data/uscrn/products/drought01/readme.txt

The several columns named 30COUNTS and 70COUNTS contain counts of observations below the 30th-percentile or above the 70th. This doesn't seem useful to me from a verification perspective. I thinking we should exclude those columns from the output of ascii2nc.

Do you have any objection to this approach? Do you see any potential use for these counts in verification?

JohnHalleyGotway added a commit that referenced this issue Jan 10, 2025
JohnHalleyGotway added a commit that referenced this issue Jan 10, 2025
…efore it's actually read so that an error in parsing the data will indicate which file caused it.
JohnHalleyGotway added a commit that referenced this issue Jan 10, 2025
…g .csv files. Get rid of the unneeded Offsets vector. Add AllowEmptyColumns option to the DataLine class so that multiple delimiters in a row will be treated as separate columns. Since the default delim is whitespace, it makes sense that you'd want to parse multiple delims in a group. But for .csv files, each comma indicates a new column.
JohnHalleyGotway added a commit that referenced this issue Jan 10, 2025
… including .csv files. This required updates to the DataLine and LineDataFile classes to parse the .csv data properly. Still need to enhance ascii2nc to write units
JohnHalleyGotway added a commit that referenced this issue Jan 11, 2025
…s for all the other ascii file types as well.
JohnHalleyGotway added a commit that referenced this issue Jan 13, 2025
…a list of all empty strings. This is used in ascii2nc to determine if observation units and descriptions should be written.
JohnHalleyGotway added a commit that referenced this issue Jan 13, 2025
…oint observation descriptions. Previously, if units were present then descriptions (usually empty ones) were added. Now, units and descriptions and handled independently.
JohnHalleyGotway added a commit that referenced this issue Jan 13, 2025
…o make this work. Seems like we should ADD these numbers where needed rather than subtracting them everywhere else!
JohnHalleyGotway added a commit that referenced this issue Jan 13, 2025
…icated the logic for ignoring the first line from csv files.
JohnHalleyGotway added a commit that referenced this issue Jan 13, 2025
…les, just skip any lines where the station ID begins with 'WBAN'. That'll handle files being concatenated together and is simpler logic.
@JohnHalleyGotway JohnHalleyGotway linked a pull request Jan 13, 2025 that will close this issue
17 tasks
JohnHalleyGotway added a commit that referenced this issue Jan 13, 2025
… USCRN files are used to the determine the specific format.
@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Jan 13, 2025

@anewman89 MET#3049 is a pull requests for these changes. It's currently in "draft" because I need to determine what changes are needed for SonarQube.

This PR adds support for 7 different NCEI product types including:

  • Adding support for the -format uscrn command line option.
  • Defining/writing units for each variable.
  • Defining/writing descriptions for each variable which is new for ascii2nc.
  • Note that variable names, units, and descriptions are extracted (almost) verbatim from readme.txt files in the product subdirectories. The only differences are in pursuit of greater consistency and clarity across the product types.
  • Some variables (like 30COUNTS and 70COUNTS) are excluded because they're likely not useful for vx.

I'll request a review from you as well as @j-opatz, since most people are busy at AMS this week.

One detail about the handling of valid times should be clarified. In general, we use UTC times for data in MET. But USCRN has a mix of UTC and Local Standard Time (LST), depending on the sub-type. Here's details about the valid time in these 7 inputs:

  • monthly01 files specify the LOCAL STANDARD TIME of the month in YYYYMM format.
  • daily01 files specify the LOCAL STANDARD TIME of the day in YYYYMMDD format.
  • hourly02, subhourly01, drought01, heat01, and soil/anom01 (no readme.txt file provided) all provide time in UTC format and ascii2nc uses them.

Currently, ascii2nc uses the monthly and daily time stamps in LST and treats them as if they were UTC format. The alternative would be converting from LST to UTC based on lat/lon location. But it's not clear whether that is useful and worth doing for monthly and daily timestamps.

JohnHalleyGotway added a commit that referenced this issue Jan 14, 2025
… value since it conflicts with the initialization. While the GHA compiler is fine with it, the SonarQube one is not. These changes should enble to SonarQube build to complete.
JohnHalleyGotway added a commit that referenced this issue Jan 14, 2025
…place_back() which SonarQube prefers for efficiency.
JohnHalleyGotway added a commit that referenced this issue Jan 14, 2025
…he overall number of them lower than what's in the develop branch.
JohnHalleyGotway added a commit that referenced this issue Jan 22, 2025
* Per #1019, initial setup for supporting -format uscrn. It compiles but I still need to make it work for the variety of USCRN inputs.

* Per #1019, add NumArray constructor using a vector of doubles.

* Per #1019, saving progress after handling monthly, daily, and hourly input files. Need to complete support for other format types and handle the unit strings

* Per #1019, consistent spacing.

* Per #1019, tweak log messages so that the file being read is logged before it's actually read so that an error in parsing the data will indicate which file caused it.

* Per #1019, update DataLine and LineDataFile classes to support parsing .csv files. Get rid of the unneeded Offsets vector. Add AllowEmptyColumns option to the DataLine class so that multiple delimiters in a row will be treated as separate columns. Since the default delim is whitespace, it makes sense that you'd want to parse multiple delims in a group. But for .csv files, each comma indicates a new column.

* Per #1019, update USCRN handler code to support all 7 input variants, including .csv files. This required updates to the DataLine and LineDataFile classes to parse the .csv data properly. Still need to enhance ascii2nc to write units

* Per #1019, add units to the Observation class.

* Per #1019, add units string to the SummaryObs and SummaryKey classes.

* Per #1019, update USCRN format to write units. Consider defining units for all the other ascii file types as well.

* Per #1019, add StringArray::all_empty() member function to check for a list of all empty strings. This is used in ascii2nc to determine if observation units and descriptions should be written.

* Per #1019, update library code to handle the independent writing of point observation descriptions. Previously, if units were present then descriptions (usually empty ones) were added. Now, units and descriptions and handled independently.

* Per #1019, add descriptions for all USCRN observations, pulled from the USCRN website.

* Per #1019, need to subtract 1900 from the year and 1 from the month to make this work. Seems like we should ADD these numbers where needed rather than subtracting them everywhere else!

* Per #1019, get rid of USCRNHandler::_readHeaderInfo() since it compilicated the logic for ignoring the first line from csv files.

* Per #1019, rather than always skipping the first line of USCRN csv files, just skip any lines where the station ID begins with 'WBAN'. That'll handle files being concatenated together and is simpler logic.

* Per #1019, add an ascii2nc unit test for USCRN point observations.

* Per #1019, doc-only change to indicate the prefix/suffix of the input USCRN files are used to the determine the specific format.

* Per #1019, update USCRN code to no longer specify a default _qcOffset value since it conflicts with the initialization. While the GHA compiler is fine with it, the SonarQube one is not. These changes should enble to  SonarQube build to complete.

* Per #1019, for SonarQube replace ALL instances of push_back() with emplace_back() which SonarQube prefers for efficiency.

* Per #1019, more changes to address SonarQube code smells and reduce the overall number of them lower than what's in the develop branch.

* Per #1019, second pass through to further reduce SonarQube findings.

* Per #1019, revert back to protected members in file_handler.h

* Per #1019, one last round of minor SonarQube code smell remediation.
@github-project-automation github-project-automation bot moved this to 🩺 Needs Triage in METplus-6.1.0 Development Jan 28, 2025
@JohnHalleyGotway JohnHalleyGotway moved this from 🩺 Needs Triage to 🏁 Done in METplus-6.1.0 Development Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MET: PreProcessing Tools (Point) priority: high High Priority requestor: NCAR/RAL NCAR Research Applications Laboratory type: new feature Make it do something new
Projects
Status: 🏁 Done
Development

Successfully merging a pull request may close this issue.

3 participants