Added additional validation for connectivity checks. #168

david-sh-csiro · 2024-12-16T07:38:53Z

Starting with just edge face connectivity to make sure were on the right track.

…it tests.

mx-moth · 2025-01-08T03:03:50Z

src/emsarray/conventions/ugrid.py

+        try:
+            fill_value = data_array.encoding['_FillValue']
+        except KeyError:
+            return True


Because of the nature of check functions like this, an early return of True can lead to bugs in the future. If another developer comes along and adds another check below this check - just as you've added a check below the existing checks - then that new check will be skipped if there is a missing _FillValue. An early return of False is always valid as any failed check renders the whole connectivity array invalid. Consider instead:

if '_FillValue' in data_array.encoding: fill_value = data_array.encoding['_FillValue'] ... if lower_bound < fill_value < upper_bound: warnings.warn(...) return False return True

New checks can be safely added later by appending them below the existing set of checks without risk of the check being skipped.

mx-moth · 2025-01-08T03:10:15Z

src/emsarray/conventions/ugrid.py

+        if lower_bound < fill_value < theoretical_upper_bound:
+            warnings.warn(
+                f"Got a face_edge_connectivity variable {data_array.name!r} with "
+                f"a _FillValue inside the theoretical index range",
+                ConventionViolationWarning,
+            )
+            return False


This shouldn't be checked. The theoretical upper bound is a conservative over estimate, useful when computing our own _FillValue as a value that is guaranteed never to collide. However in practice lower values can safely be used as long as the actual _FillValue is lower than actual upper bound. This test will give superfluous warnings on valid data sets.

mx-moth · 2025-01-08T03:21:58Z

src/emsarray/conventions/ugrid.py

+
+        lower_bound = _get_start_index(data_array)
+        theoretical_upper_bound = self.face_count * self.max_node_count
+        actual_upper_bound = numpy.nanmax(data_array)


The maximum edge ID can be found more precisely with

upper_bound = self.edge_count + lower_bound

Because of the exact issue of variables being incorrectly masked and therefore missing valid data, any information gleaned by introspecting the data is potentially suspect. This check as written would fail if _FillValue was exactly the maximum edge ID. numpy.nanmax() would not find this value as it has been masked, returning the second-to-last value which would then incorrectly cause the checks below to pass.

mx-moth · 2025-01-08T03:23:11Z

src/emsarray/conventions/ugrid.py

+        theoretical_upper_bound = self.face_count * self.max_node_count
+        actual_upper_bound = numpy.nanmax(data_array)
+
+        if lower_bound < fill_value < actual_upper_bound:


These should be less than or equal checks. A fill value of 0 and a lower bound of 0 is invalid as it will mask out the first edge ID:

if lower_bound <= fill_value <= upper_bound:

Added additional validation for face edge connectivity. Also added un…

0cf3c89

…it tests.

david-sh-csiro linked an issue Dec 16, 2024 that may be closed by this pull request

Connectivity Fill value Validation #165

Open

Updated development rst.

7ce4421

mx-moth requested changes Jan 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added additional validation for connectivity checks. #168

Added additional validation for connectivity checks. #168

david-sh-csiro commented Dec 16, 2024 •

edited

Loading

mx-moth Jan 8, 2025 •

edited

Loading

mx-moth Jan 8, 2025

mx-moth Jan 8, 2025

mx-moth Jan 8, 2025

Added additional validation for connectivity checks. #168

Are you sure you want to change the base?

Added additional validation for connectivity checks. #168

Conversation

david-sh-csiro commented Dec 16, 2024 • edited Loading

mx-moth Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

mx-moth Jan 8, 2025

Choose a reason for hiding this comment

mx-moth Jan 8, 2025

Choose a reason for hiding this comment

mx-moth Jan 8, 2025

Choose a reason for hiding this comment

david-sh-csiro commented Dec 16, 2024 •

edited

Loading

mx-moth Jan 8, 2025 •

edited

Loading