-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added additional validation for connectivity checks. #168
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -748,6 +748,31 @@ def has_valid_face_edge_connectivity(self) -> bool: | |
) | ||
return False | ||
|
||
try: | ||
fill_value = data_array.encoding['_FillValue'] | ||
except KeyError: | ||
return True | ||
|
||
lower_bound = _get_start_index(data_array) | ||
theoretical_upper_bound = self.face_count * self.max_node_count | ||
actual_upper_bound = numpy.nanmax(data_array) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The maximum edge ID can be found more precisely with upper_bound = self.edge_count + lower_bound Because of the exact issue of variables being incorrectly masked and therefore missing valid data, any information gleaned by introspecting the data is potentially suspect. This check as written would fail if |
||
|
||
if lower_bound < fill_value < actual_upper_bound: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These should be less than or equal checks. A fill value of 0 and a lower bound of 0 is invalid as it will mask out the first edge ID: if lower_bound <= fill_value <= upper_bound: |
||
warnings.warn( | ||
f"Got a face_edge_connectivity variable {data_array.name!r} with " | ||
f"a _FillValue inside the actual index range", | ||
ConventionViolationWarning, | ||
) | ||
return False | ||
|
||
if lower_bound < fill_value < theoretical_upper_bound: | ||
warnings.warn( | ||
f"Got a face_edge_connectivity variable {data_array.name!r} with " | ||
f"a _FillValue inside the theoretical index range", | ||
ConventionViolationWarning, | ||
) | ||
return False | ||
Comment on lines
+768
to
+774
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This shouldn't be checked. The theoretical upper bound is a conservative over estimate, useful when computing our own _FillValue as a value that is guaranteed never to collide. However in practice lower values can safely be used as long as the actual _FillValue is lower than actual upper bound. This test will give superfluous warnings on valid data sets. |
||
|
||
return True | ||
|
||
@cached_property | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the nature of check functions like this, an early return of
True
can lead to bugs in the future. If another developer comes along and adds another check below this check - just as you've added a check below the existing checks - then that new check will be skipped if there is a missing_FillValue
. An early return ofFalse
is always valid as any failed check renders the whole connectivity array invalid. Consider instead:New checks can be safely added later by appending them below the existing set of checks without risk of the check being skipped.