-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance ASCII2NC to read USCRN point observations. #1019
Comments
Recommend doing this work between 12/30 and 1/7. |
Moving this issue to "In Progress" status. For USCRN data, support the "Quality Controlled Datatsets" formats described here: Note that the monthly, daily, hourly, and sub-hourly datasets contain different numbers of columns and contents. Would be nice if ascii2nc were able to handle all of them without the user needing to explicitly state which to use. |
…t I still need to make it work for the variety of USCRN inputs.
…input files. Need to complete support for other format types and handle the unit strings
Discussed progress on this issue during the MET development meeting on Jan 10, 2025. Rather than translating the USCRN variable names into more conventional GRIB naming conventions and units (e.g. converting celsius temperatures to kelvin since that's what's used for GRIB), @DanielAdriaansen recommends passing the names and units from the input directly to the output. Presumably, users of this data will have some familiarity with USCRN observations and the USCRN naming conventions will be familiar to them and that'll provide a more straight-forward solution. I agree that keeping it simple is wise. However, I do note that |
@anewman89, I have a question about the drought data, describe here: The several columns named Do you have any objection to this approach? Do you see any potential use for these counts in verification? |
…efore it's actually read so that an error in parsing the data will indicate which file caused it.
…g .csv files. Get rid of the unneeded Offsets vector. Add AllowEmptyColumns option to the DataLine class so that multiple delimiters in a row will be treated as separate columns. Since the default delim is whitespace, it makes sense that you'd want to parse multiple delims in a group. But for .csv files, each comma indicates a new column.
… including .csv files. This required updates to the DataLine and LineDataFile classes to parse the .csv data properly. Still need to enhance ascii2nc to write units
…s for all the other ascii file types as well.
…a list of all empty strings. This is used in ascii2nc to determine if observation units and descriptions should be written.
…oint observation descriptions. Previously, if units were present then descriptions (usually empty ones) were added. Now, units and descriptions and handled independently.
…o make this work. Seems like we should ADD these numbers where needed rather than subtracting them everywhere else!
…icated the logic for ignoring the first line from csv files.
…les, just skip any lines where the station ID begins with 'WBAN'. That'll handle files being concatenated together and is simpler logic.
… USCRN files are used to the determine the specific format.
@anewman89 MET#3049 is a pull requests for these changes. It's currently in "draft" because I need to determine what changes are needed for SonarQube. This PR adds support for 7 different NCEI product types including:
I'll request a review from you as well as @j-opatz, since most people are busy at AMS this week. One detail about the handling of valid times should be clarified. In general, we use UTC times for data in MET. But USCRN has a mix of UTC and Local Standard Time (LST), depending on the sub-type. Here's details about the valid time in these 7 inputs:
Currently, ascii2nc uses the monthly and daily time stamps in LST and treats them as if they were UTC format. The alternative would be converting from LST to UTC based on lat/lon location. But it's not clear whether that is useful and worth doing for monthly and daily timestamps. |
… value since it conflicts with the initialization. While the GHA compiler is fine with it, the SonarQube one is not. These changes should enble to SonarQube build to complete.
…place_back() which SonarQube prefers for efficiency.
…he overall number of them lower than what's in the develop branch.
* Per #1019, initial setup for supporting -format uscrn. It compiles but I still need to make it work for the variety of USCRN inputs. * Per #1019, add NumArray constructor using a vector of doubles. * Per #1019, saving progress after handling monthly, daily, and hourly input files. Need to complete support for other format types and handle the unit strings * Per #1019, consistent spacing. * Per #1019, tweak log messages so that the file being read is logged before it's actually read so that an error in parsing the data will indicate which file caused it. * Per #1019, update DataLine and LineDataFile classes to support parsing .csv files. Get rid of the unneeded Offsets vector. Add AllowEmptyColumns option to the DataLine class so that multiple delimiters in a row will be treated as separate columns. Since the default delim is whitespace, it makes sense that you'd want to parse multiple delims in a group. But for .csv files, each comma indicates a new column. * Per #1019, update USCRN handler code to support all 7 input variants, including .csv files. This required updates to the DataLine and LineDataFile classes to parse the .csv data properly. Still need to enhance ascii2nc to write units * Per #1019, add units to the Observation class. * Per #1019, add units string to the SummaryObs and SummaryKey classes. * Per #1019, update USCRN format to write units. Consider defining units for all the other ascii file types as well. * Per #1019, add StringArray::all_empty() member function to check for a list of all empty strings. This is used in ascii2nc to determine if observation units and descriptions should be written. * Per #1019, update library code to handle the independent writing of point observation descriptions. Previously, if units were present then descriptions (usually empty ones) were added. Now, units and descriptions and handled independently. * Per #1019, add descriptions for all USCRN observations, pulled from the USCRN website. * Per #1019, need to subtract 1900 from the year and 1 from the month to make this work. Seems like we should ADD these numbers where needed rather than subtracting them everywhere else! * Per #1019, get rid of USCRNHandler::_readHeaderInfo() since it compilicated the logic for ignoring the first line from csv files. * Per #1019, rather than always skipping the first line of USCRN csv files, just skip any lines where the station ID begins with 'WBAN'. That'll handle files being concatenated together and is simpler logic. * Per #1019, add an ascii2nc unit test for USCRN point observations. * Per #1019, doc-only change to indicate the prefix/suffix of the input USCRN files are used to the determine the specific format. * Per #1019, update USCRN code to no longer specify a default _qcOffset value since it conflicts with the initialization. While the GHA compiler is fine with it, the SonarQube one is not. These changes should enble to SonarQube build to complete. * Per #1019, for SonarQube replace ALL instances of push_back() with emplace_back() which SonarQube prefers for efficiency. * Per #1019, more changes to address SonarQube code smells and reduce the overall number of them lower than what's in the develop branch. * Per #1019, second pass through to further reduce SonarQube findings. * Per #1019, revert back to protected members in file_handler.h * Per #1019, one last round of minor SonarQube code smell remediation.
Users at the 2018 Unidata Users's Workshop recommended enhancing ASCII2NC to read point observation data from the NOAA USCRN network:
https://www.ncei.noaa.gov/access/crn/
Also need to support the SCAN data format:
https://www.nrcs.usda.gov/resources/data-and-reports/soil-climate-analysis-network
Also need to support AmeriFlux data:
https://ameriflux.lbl.gov/
Evaluate whether SCAN and AmeriFlux can be supported through this same issue or consider spinning it off into a separate one.
Seems like it'd be pretty straight forward. [MET-1019] created by johnhg
Can be funded by Dynamic Drought/Veg (2785031) key.
The text was updated successfully, but these errors were encountered: