Enhance ASCII2NC to read USCRN point observations. #1019

dwfncar · 2018-06-28T17:22:03Z

Users at the 2018 Unidata Users's Workshop recommended enhancing ASCII2NC to read point observation data from the NOAA USCRN network:
https://www.ncei.noaa.gov/access/crn/

Also need to support the SCAN data format:
https://www.nrcs.usda.gov/resources/data-and-reports/soil-climate-analysis-network

Also need to support AmeriFlux data:
https://ameriflux.lbl.gov/

Evaluate whether SCAN and AmeriFlux can be supported through this same issue or consider spinning it off into a separate one.

Seems like it'd be pretty straight forward. [MET-1019] created by johnhg

Can be funded by Dynamic Drought/Veg (2785031) key.

JohnHalleyGotway · 2024-12-09T22:41:46Z

Recommend doing this work between 12/30 and 1/7.

JohnHalleyGotway · 2025-01-06T18:13:50Z

Moving this issue to "In Progress" status.

For USCRN data, support the "Quality Controlled Datatsets" formats described here:
https://www.ncei.noaa.gov/access/crn/qcdatasets.html

Note that the monthly, daily, hourly, and sub-hourly datasets contain different numbers of columns and contents. Would be nice if ascii2nc were able to handle all of them without the user needing to explicitly state which to use.

…t I still need to make it work for the variety of USCRN inputs.

…input files. Need to complete support for other format types and handle the unit strings

JohnHalleyGotway · 2025-01-10T16:28:28Z

Discussed progress on this issue during the MET development meeting on Jan 10, 2025. Rather than translating the USCRN variable names into more conventional GRIB naming conventions and units (e.g. converting celsius temperatures to kelvin since that's what's used for GRIB), @DanielAdriaansen recommends passing the names and units from the input directly to the output. Presumably, users of this data will have some familiarity with USCRN observations and the USCRN naming conventions will be familiar to them and that'll provide a more straight-forward solution.

I agree that keeping it simple is wise. However, I do note that ascii2nc does not currently write units to the output point observation files even though the format supports them. As part of this issue, enhance ascii2nc to write the USCRN units to the output. This is especially important because the USCRN temperatures in celsius do not match the GRIB conventions of kelvin. When using these observations in Point-Stat or Ensemble-Stat downstream, the user will generally need to use MET's convert(x) configuration option to convert from celsius to kelvin.

JohnHalleyGotway · 2025-01-10T17:32:44Z

@anewman89, I have a question about the drought data, describe here:
https://www.ncei.noaa.gov/pub/data/uscrn/products/drought01/readme.txt

The several columns named 30COUNTS and 70COUNTS contain counts of observations below the 30th-percentile or above the 70th. This doesn't seem useful to me from a verification perspective. I thinking we should exclude those columns from the output of ascii2nc.

Do you have any objection to this approach? Do you see any potential use for these counts in verification?

…efore it's actually read so that an error in parsing the data will indicate which file caused it.

…g .csv files. Get rid of the unneeded Offsets vector. Add AllowEmptyColumns option to the DataLine class so that multiple delimiters in a row will be treated as separate columns. Since the default delim is whitespace, it makes sense that you'd want to parse multiple delims in a group. But for .csv files, each comma indicates a new column.

… including .csv files. This required updates to the DataLine and LineDataFile classes to parse the .csv data properly. Still need to enhance ascii2nc to write units

…s for all the other ascii file types as well.

…a list of all empty strings. This is used in ascii2nc to determine if observation units and descriptions should be written.

…oint observation descriptions. Previously, if units were present then descriptions (usually empty ones) were added. Now, units and descriptions and handled independently.

…he USCRN website.

…o make this work. Seems like we should ADD these numbers where needed rather than subtracting them everywhere else!

…icated the logic for ignoring the first line from csv files.

…les, just skip any lines where the station ID begins with 'WBAN'. That'll handle files being concatenated together and is simpler logic.

… USCRN files are used to the determine the specific format.

JohnHalleyGotway · 2025-01-13T20:51:07Z

@anewman89 MET#3049 is a pull requests for these changes. It's currently in "draft" because I need to determine what changes are needed for SonarQube.

This PR adds support for 7 different NCEI product types including:

Adding support for the -format uscrn command line option.
Defining/writing units for each variable.
Defining/writing descriptions for each variable which is new for ascii2nc.
Note that variable names, units, and descriptions are extracted (almost) verbatim from readme.txt files in the product subdirectories. The only differences are in pursuit of greater consistency and clarity across the product types.
Some variables (like 30COUNTS and 70COUNTS) are excluded because they're likely not useful for vx.

I'll request a review from you as well as @j-opatz, since most people are busy at AMS this week.

One detail about the handling of valid times should be clarified. In general, we use UTC times for data in MET. But USCRN has a mix of UTC and Local Standard Time (LST), depending on the sub-type. Here's details about the valid time in these 7 inputs:

monthly01 files specify the LOCAL STANDARD TIME of the month in YYYYMM format.
daily01 files specify the LOCAL STANDARD TIME of the day in YYYYMMDD format.
hourly02, subhourly01, drought01, heat01, and soil/anom01 (no readme.txt file provided) all provide time in UTC format and ascii2nc uses them.

Currently, ascii2nc uses the monthly and daily time stamps in LST and treats them as if they were UTC format. The alternative would be converting from LST to UTC based on lat/lon location. But it's not clear whether that is useful and worth doing for monthly and daily timestamps.

… value since it conflicts with the initialization. While the GHA compiler is fine with it, the SonarQube one is not. These changes should enble to SonarQube build to complete.

…place_back() which SonarQube prefers for efficiency.

…he overall number of them lower than what's in the develop branch.

* Per #1019, initial setup for supporting -format uscrn. It compiles but I still need to make it work for the variety of USCRN inputs. * Per #1019, add NumArray constructor using a vector of doubles. * Per #1019, saving progress after handling monthly, daily, and hourly input files. Need to complete support for other format types and handle the unit strings * Per #1019, consistent spacing. * Per #1019, tweak log messages so that the file being read is logged before it's actually read so that an error in parsing the data will indicate which file caused it. * Per #1019, update DataLine and LineDataFile classes to support parsing .csv files. Get rid of the unneeded Offsets vector. Add AllowEmptyColumns option to the DataLine class so that multiple delimiters in a row will be treated as separate columns. Since the default delim is whitespace, it makes sense that you'd want to parse multiple delims in a group. But for .csv files, each comma indicates a new column. * Per #1019, update USCRN handler code to support all 7 input variants, including .csv files. This required updates to the DataLine and LineDataFile classes to parse the .csv data properly. Still need to enhance ascii2nc to write units * Per #1019, add units to the Observation class. * Per #1019, add units string to the SummaryObs and SummaryKey classes. * Per #1019, update USCRN format to write units. Consider defining units for all the other ascii file types as well. * Per #1019, add StringArray::all_empty() member function to check for a list of all empty strings. This is used in ascii2nc to determine if observation units and descriptions should be written. * Per #1019, update library code to handle the independent writing of point observation descriptions. Previously, if units were present then descriptions (usually empty ones) were added. Now, units and descriptions and handled independently. * Per #1019, add descriptions for all USCRN observations, pulled from the USCRN website. * Per #1019, need to subtract 1900 from the year and 1 from the month to make this work. Seems like we should ADD these numbers where needed rather than subtracting them everywhere else! * Per #1019, get rid of USCRNHandler::_readHeaderInfo() since it compilicated the logic for ignoring the first line from csv files. * Per #1019, rather than always skipping the first line of USCRN csv files, just skip any lines where the station ID begins with 'WBAN'. That'll handle files being concatenated together and is simpler logic. * Per #1019, add an ascii2nc unit test for USCRN point observations. * Per #1019, doc-only change to indicate the prefix/suffix of the input USCRN files are used to the determine the specific format. * Per #1019, update USCRN code to no longer specify a default _qcOffset value since it conflicts with the initialization. While the GHA compiler is fine with it, the SonarQube one is not. These changes should enble to SonarQube build to complete. * Per #1019, for SonarQube replace ALL instances of push_back() with emplace_back() which SonarQube prefers for efficiency. * Per #1019, more changes to address SonarQube code smells and reduce the overall number of them lower than what's in the develop branch. * Per #1019, second pass through to further reduce SonarQube findings. * Per #1019, revert back to protected members in file_handler.h * Per #1019, one last round of minor SonarQube code smell remediation.

dwfncar added the Application Code label Apr 30, 2019

dwfncar added this to the MET 9.0 milestone Apr 30, 2019

JohnHalleyGotway modified the milestones: MET 9.0, MET Future Versions Mar 15, 2020

JohnHalleyGotway removed the component: application code label Jun 10, 2021

JohnHalleyGotway modified the milestones: Backlog of Development Ideas, MET 12.0.0 Jul 8, 2024

JohnHalleyGotway added this to MET-12.0.0 Development Jul 8, 2024

github-project-automation bot moved this to 🔖 Ready in MET-12.0.0 Development Jul 8, 2024

JohnHalleyGotway added type: new feature Make it do something new requestor: NCAR/RAL NCAR Research Applications Laboratory MET: PreProcessing Tools (Point) priority: high High Priority labels Jul 8, 2024

JohnHalleyGotway self-assigned this Jul 8, 2024

JohnHalleyGotway removed this from MET-12.0.0 Development Sep 30, 2024

JohnHalleyGotway modified the milestones: MET-12.0.0, MET-12.1.0 Sep 30, 2024

JohnHalleyGotway added a commit that referenced this issue Jan 7, 2025

Per #1019, initial setup for supporting -format uscrn. It compiles bu…

30bf0b7

…t I still need to make it work for the variety of USCRN inputs.

JohnHalleyGotway added a commit that referenced this issue Jan 7, 2025

Per #1019, add NumArray constructor using a vector of doubles.

834b5d7

JohnHalleyGotway added a commit that referenced this issue Jan 7, 2025

Per #1019, saving progress after handling monthly, daily, and hourly …

df6211a

…input files. Need to complete support for other format types and handle the unit strings

JohnHalleyGotway assigned anewman89 Jan 9, 2025

JohnHalleyGotway added a commit that referenced this issue Jan 10, 2025

Per #1019, consistent spacing.

ea54503

JohnHalleyGotway added a commit that referenced this issue Jan 10, 2025

Per #1019, tweak log messages so that the file being read is logged b…

a9530a1

…efore it's actually read so that an error in parsing the data will indicate which file caused it.

JohnHalleyGotway added a commit that referenced this issue Jan 11, 2025

Per #1019, add units to the Observation class.

4674e98

JohnHalleyGotway added a commit that referenced this issue Jan 11, 2025

Per #1019, add units string to the SummaryObs and SummaryKey classes.

285804b

JohnHalleyGotway added a commit that referenced this issue Jan 11, 2025

Per #1019, update USCRN format to write units. Consider defining unit…

302e4d5

…s for all the other ascii file types as well.

JohnHalleyGotway added a commit that referenced this issue Jan 13, 2025

Per #1019, add StringArray::all_empty() member function to check for …

ef4d982

…a list of all empty strings. This is used in ascii2nc to determine if observation units and descriptions should be written.

JohnHalleyGotway added a commit that referenced this issue Jan 13, 2025

Per #1019, add descriptions for all USCRN observations, pulled from t…

07dd0a8

…he USCRN website.

JohnHalleyGotway added a commit that referenced this issue Jan 13, 2025

Per #1019, need to subtract 1900 from the year and 1 from the month t…

add3aef

…o make this work. Seems like we should ADD these numbers where needed rather than subtracting them everywhere else!

JohnHalleyGotway added a commit that referenced this issue Jan 13, 2025

Per #1019, get rid of USCRNHandler::_readHeaderInfo() since it compil…

a92047f

…icated the logic for ignoring the first line from csv files.

JohnHalleyGotway added a commit that referenced this issue Jan 13, 2025

Per #1019, rather than always skipping the first line of USCRN csv fi…

4012a65

…les, just skip any lines where the station ID begins with 'WBAN'. That'll handle files being concatenated together and is simpler logic.

JohnHalleyGotway added a commit that referenced this issue Jan 13, 2025

Per #1019, add an ascii2nc unit test for USCRN point observations.

aa1be48

JohnHalleyGotway linked a pull request Jan 13, 2025 that will close this issue

Feature #1019 USCRN #3049

Merged

17 tasks

JohnHalleyGotway added a commit that referenced this issue Jan 13, 2025

Per #1019, doc-only change to indicate the prefix/suffix of the input…

e38af93

… USCRN files are used to the determine the specific format.

JohnHalleyGotway added a commit that referenced this issue Jan 14, 2025

Per #1019, for SonarQube replace ALL instances of push_back() with em…

c02f1c9

…place_back() which SonarQube prefers for efficiency.

JohnHalleyGotway added a commit that referenced this issue Jan 14, 2025

Per #1019, more changes to address SonarQube code smells and reduce t…

8a9b02a

…he overall number of them lower than what's in the develop branch.

JohnHalleyGotway added a commit that referenced this issue Jan 15, 2025

Per #1019, second pass through to further reduce SonarQube findings.

65dd8ec

JohnHalleyGotway added a commit that referenced this issue Jan 15, 2025

Per #1019, revert back to protected members in file_handler.h

de71c6d

JohnHalleyGotway added a commit that referenced this issue Jan 15, 2025

Per #1019, one last round of minor SonarQube code smell remediation.

aca605b

JohnHalleyGotway closed this as completed Jan 22, 2025

JohnHalleyGotway added this to METplus-6.1.0 Development Jan 28, 2025

github-project-automation bot moved this to 🩺 Needs Triage in METplus-6.1.0 Development Jan 28, 2025

JohnHalleyGotway moved this from 🩺 Needs Triage to 🏁 Done in METplus-6.1.0 Development Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance ASCII2NC to read USCRN point observations. #1019

Enhance ASCII2NC to read USCRN point observations. #1019

dwfncar commented Jun 28, 2018 •

edited by JohnHalleyGotway

Loading

JohnHalleyGotway commented Dec 9, 2024

JohnHalleyGotway commented Jan 6, 2025 •

edited

Loading

JohnHalleyGotway commented Jan 10, 2025

JohnHalleyGotway commented Jan 10, 2025

JohnHalleyGotway commented Jan 13, 2025 •

edited

Loading

Enhance ASCII2NC to read USCRN point observations. #1019

Enhance ASCII2NC to read USCRN point observations. #1019

Comments

dwfncar commented Jun 28, 2018 • edited by JohnHalleyGotway Loading

JohnHalleyGotway commented Dec 9, 2024

JohnHalleyGotway commented Jan 6, 2025 • edited Loading

JohnHalleyGotway commented Jan 10, 2025

JohnHalleyGotway commented Jan 10, 2025

JohnHalleyGotway commented Jan 13, 2025 • edited Loading

dwfncar commented Jun 28, 2018 •

edited by JohnHalleyGotway

Loading

JohnHalleyGotway commented Jan 6, 2025 •

edited

Loading

JohnHalleyGotway commented Jan 13, 2025 •

edited

Loading