inconsistent in data rendering and in QARTOD flags creation #106

leilabbb · 2024-03-22T21:08:22Z

I run into issues when using the qartod package to QC the Glider DAC files. I have seen flags being applied inconsistently, which raises concern. Before I file an issue about that, I run into a new issue when reading the data that I am exposing here for some feedback.

Issues: (1) The salinity data in the netCDF file shows different results when read with different packages. (2) The spike_test and rate_of_change_test flags are questionably using the missing-flag and the suspect-flag on existing or non-erroneous data.

Description: The notebook is a walk-through process to help reproduce the above issues.
Files needed to run the notebook:
netCDF file
configuration file
notebook file

Debugging Info Issue (1):

When using netCDF4 to read the salinity array, all values show as nan: [this is the method I use in the Glider DAC QC process] _

with open(nc_path, 'r') as nc:
    nc  = Dataset(nc_path, 'r')
print('number of data point', len(nc.variables['salinity'][:]), '\n', nc.variables['salinity'][:])
number of data points 36 
 [-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- --]

When using the xarray to read the salinity array, not all values show as nan: [this is used in the ioos_qc.streams Class XarrayStream]

nd = xr.open_dataset(
                nc_path,
                decode_cf=True,
                decode_coords=True,
                decode_times=True,
                mask_and_scale=True
            )
print('number of data point', len(nd['salinity'].values), '\n', nd['salinity'].values)
number of data point 36 
 [32.80403       nan       nan       nan 32.771416 32.67339  32.671986
       nan       nan 32.636482 32.141727 32.114475       nan       nan
 32.103085       nan 32.122192       nan       nan 32.09773        nan
       nan 32.050667       nan 32.04136        nan       nan 32.010563
       nan 31.924274       nan 31.775118       nan       nan       nan
 31.886553]

When using ncdump to read the salinity array, not all values are masked: [using another method]

!ncdump -v salinity '/Users/leilabelabassi/Downloads/ru34_20220529T202312Z_rt.nc'
salinity = 32.80403, _, _, _, 32.77142, 32.67339, 32.67199, _, _, 32.63648, 
    32.14173, 32.11448, _, _, 32.10308, _, 32.12219, _, _, 32.09773, _, _, 
    32.05067, _, 32.04136, _, _, 32.01056, _, 31.92427, _, 31.77512, _, _, _, 
    31.88655 ;

Debugging Info Issue (2):

The flag selection for the spike_test shows a missing flag (9) applied to an existing data point.

KEYS for the right column: Missing 9 UNKNOWN 2

    salinity
2  32.804031
9        NaN
9        NaN
9        NaN
9  32.771416
1  32.673389
9  32.671986
9        NaN
9        NaN
9  32.636482
1  32.141727
9  32.114475
9        NaN
9        NaN
9  32.103085
9        NaN
9  32.122192
9        NaN
9        NaN
9  32.097729
9        NaN
9        NaN
9  32.050667
9        NaN
9  32.041359
9        NaN
9        NaN
9  32.010563
9        NaN
9  31.924274
9        NaN
9  31.775118
9        NaN
9        NaN
9        NaN
2  31.886553

The flag selection for the rate_of_change_test shows a suspect flag (3) applied to a good data point.

KEYS for the right column: Missing 9 UNKNOWN 2 SUSPECT 3

    salinity
1  32.804031
9        NaN
9        NaN
9        NaN
1  32.771416
3  32.673389
3  32.671986
9        NaN
9        NaN
1  32.636482
3  32.141727
3  32.114475
9        NaN
9        NaN
1  32.103085
9        NaN
1  32.122192
9        NaN
9        NaN
1  32.097729
9        NaN
9        NaN
1  32.050667
9        NaN
1  32.041359
9        NaN
9        NaN
1  32.010563
9        NaN
1  31.924274
9        NaN
1  31.775118
9        NaN
9        NaN
9        NaN
1  31.886553

Clarification:
The suspect flag may be acceptable, but I am not completely clear on labeling existent data with a missing flag. Also, the data read from the 3 methods may be unrelated to flagging data but any recommendations or best practices on how to work with the ioos_qc package will be appreciated.

The text was updated successfully, but these errors were encountered:

leilabbb · 2024-03-22T21:14:30Z

@ocefpaf could you look at the issues i am seeing with the QARTOD application? Thanks

ocefpaf · 2024-03-29T16:39:34Z

@ocefpaf could you look at the issues i am seeing with the QARTOD application? Thanks

@leilabbb I'm not too familiar with the code and QARTOD but I will take a look as soon as possible.

Can you send the version of the ioos_qc library that you are using?

leilabbb · 2024-04-01T16:32:28Z

@ocefpaf could you look at the issues i am seeing with the QARTOD application? Thanks

@leilabbb I'm not too familiar with the code and QARTOD but I will take a look as soon as possible.

Can you send the version of the ioos_qc library that you are using?

@ocefpaf Here is the SHA number: 0b1406b

ocefpaf · 2024-04-02T20:03:09Z

@leilabbb can you try the latest commit? @iwensu0313 made some improvements and fixed a few bugs that may impact your example. Please let us know what you find.

leilabbb · 2024-04-05T14:03:32Z

@leilabbb can you try the latest commit? @iwensu0313 made some improvements and fixed a few bugs that may impact your example. Please let us know what you find.

I rerun the code using the latest commits and I do not see any change in the results. The same flags were rendered by the ioos_qc/qartod library.

ocefpaf · 2024-04-05T14:16:03Z

I guess we need to wait for #108

iwensu0313 · 2024-05-31T18:54:40Z

#108 is merged now, @leilabbb do you want to try and see if that fixed it?

leilabbb · 2024-06-08T23:47:01Z

Not sure what is not working but the merge of 108 did not fix it. @iwensu0313 could use this netCDF file and see if you see a different response.

iwensu0313 · 2024-08-13T18:21:31Z

@leilabbb - apologies, somehow I didn't get an email notification of your response. I need to check to see if it's being filtered into my email quarantine. Thanks for sending along the file, I'll take a look at this again as soon as I can

iwensu0313 · 2024-08-13T18:22:53Z

I haven't had a chance to look at this linked PR #129 in more detail. @ocefpaf is the issue being addressed there

ocefpaf · 2024-08-13T18:23:50Z

I haven't had a chance to look at this linked PR #129 in more detail. @ocefpaf is the issue being addressed there

It is. The PR is still in its first draft. If you have some time, please give your input there.

iwensu0313 · 2024-09-03T22:59:51Z

Hi @leilabbb -

Spike Test Missing Flags
Looks like PR #129 will resolve the issue you're seeing in the spike test, where missing flags were being applied to valid data points. I ran your inconsistent_data_rendering_and_QARTOD_flags_creation.ipynb notebook (using the config_legacy_vars.yml file and the test ru34_20220529T202312Z_rt.nc file you provided above) with the changes in that PR. and this is the salinity results array I get for the spike test. Looks as expected to me:

salinity xarray values copied from your original post, for comparison

 [32.80403       nan       nan       nan 32.771416 32.67339  32.671986
       nan       nan 32.636482 32.141727 32.114475       nan       nan
 32.103085       nan 32.122192       nan       nan 32.09773        nan
       nan 32.050667       nan 32.04136        nan       nan 32.010563
       nan 31.924274       nan 31.775118       nan       nan       nan
 31.886553]

Rate of Change Suspect Flags
Following up on your question regarding the suspect flags in the salinity rate of change. The results seem to be expected based on the threshold you set of 3.334577679634094e-10 salinity units per second.

I ran the debugger in your inconsistent_data...ipynb notebook and put a breakpoint in the rate of change test to investigate. Here are some screenshots showing what the test is doing. Let me know if you have questions

Example rate of change calculation from the 5th to 6th measurement

fifth_meas = 32.77141571044922
sixth_meas = 32.67338943481445
change_in_sec_from_5th_to_6th_meas = 9
roc_val = abs((sixth_meas - fifth_meas) / change_in_sec_from_5th_to_6th_meas)

print(roc_val)
> 0.010891808403862847

Reading in Data using Different Methods
I'm not so sure about this one and why netCDF4.Dataset and xr.open_dataset resulted in different masked values. Will see if my colleagues have any input here.

leilabbb · 2024-09-18T13:57:20Z

Reading in Data using Different Methods
I'm not so sure about this one and why netCDF4.Dataset and xr.open_dataset resulted in different masked values. Will see if my colleagues have any input here.

@benjwadams figured out that the valid_min and valid_max of the salinity variable were switched and therefore netCDF4.Dataset masked the salinity values.

leilabbb · 2024-09-30T18:57:11Z

Rate of Change Suspect Flags
Following up on your question regarding the suspect flags in the salinity rate of change. The results seem to be expected based on the threshold you set of 3.334577679634094e-10 salinity units per second.

I ran the debugger in your inconsistent_data...ipynb notebook and put a breakpoint in the rate of change test to investigate. Here are some screenshots showing what the test is doing. Let me know if you have questions

I realized that incorrect flag results happen when the time input to the rate_of_change_test function isn't in seconds. See rate_of_change_test flags are corrupted when the time array is in nano seconds #376

iwensu0313 · 2024-10-08T15:57:23Z

@leilabbb - if helpful, here's what my colleague (@kthyng) said about Reading in Data using Different Methods. I also generally use xarray over netcdf as well, just preference.

xarray interprets a lot of attributes under the hood which is usually really helpful but occasionally can be confusing. You can probably find more info about the details in the xarray docs and/or stack overflow.

A major secret piece of information xarray interprets is the encoding. In xarray you can find it under ds.encoding. It has already interpreted it once you’ve read in the Dataset, but you can still see the info there. Stuff like fill values.

mask values being turned into 9.9692100e+36 looks like the netcdf library using the fill value when it does that function.

I know less about how the netcdf library does things but it is generally less under-the-hood and more just on the surface.

leilabbb mentioned this issue Jun 9, 2024

Link to an open ioos_qc issue related to QARTOD QC tests ioos/glider-dac#353

Open

ocefpaf mentioned this issue Jun 18, 2024

GSoC: Workflow Overview and Missing Functionalities for Glider QA/QC #113

Open

Sakshamgupta90 mentioned this issue Aug 13, 2024

Fix flagging logic for missing and unknown data in QARTOD spike test #129

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inconsistent in data rendering and in QARTOD flags creation #106

inconsistent in data rendering and in QARTOD flags creation #106

leilabbb commented Mar 22, 2024

leilabbb commented Mar 22, 2024

ocefpaf commented Mar 29, 2024

leilabbb commented Apr 1, 2024 •

edited

Loading

ocefpaf commented Apr 2, 2024

leilabbb commented Apr 5, 2024

ocefpaf commented Apr 5, 2024

iwensu0313 commented May 31, 2024

leilabbb commented Jun 8, 2024

iwensu0313 commented Aug 13, 2024 •

edited

Loading

iwensu0313 commented Aug 13, 2024

ocefpaf commented Aug 13, 2024

iwensu0313 commented Sep 3, 2024 •

edited

Loading

leilabbb commented Sep 18, 2024

leilabbb commented Sep 30, 2024

iwensu0313 commented Oct 8, 2024 •

edited

Loading

inconsistent in data rendering and in QARTOD flags creation #106

inconsistent in data rendering and in QARTOD flags creation #106

Comments

leilabbb commented Mar 22, 2024

leilabbb commented Mar 22, 2024

ocefpaf commented Mar 29, 2024

leilabbb commented Apr 1, 2024 • edited Loading

ocefpaf commented Apr 2, 2024

leilabbb commented Apr 5, 2024

ocefpaf commented Apr 5, 2024

iwensu0313 commented May 31, 2024

leilabbb commented Jun 8, 2024

iwensu0313 commented Aug 13, 2024 • edited Loading

iwensu0313 commented Aug 13, 2024

ocefpaf commented Aug 13, 2024

iwensu0313 commented Sep 3, 2024 • edited Loading

leilabbb commented Sep 18, 2024

leilabbb commented Sep 30, 2024

iwensu0313 commented Oct 8, 2024 • edited Loading

leilabbb commented Apr 1, 2024 •

edited

Loading

iwensu0313 commented Aug 13, 2024 •

edited

Loading

iwensu0313 commented Sep 3, 2024 •

edited

Loading

iwensu0313 commented Oct 8, 2024 •

edited

Loading