Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema use of datatype allows safe castable datatypes #353

Closed
braingram opened this issue Jan 16, 2024 · 8 comments · Fixed by #369
Closed

Schema use of datatype allows safe castable datatypes #353

braingram opened this issue Jan 16, 2024 · 8 comments · Fixed by #369

Comments

@braingram
Copy link
Collaborator

braingram commented Jan 16, 2024

rad makes extensive use of datatype validation in schemas. For example the dq sub-schema defines a datatype:

dq:
tag: tag:stsci.edu:asdf/core/ndarray-1.0.0
datatype: uint32
ndim: 2

The datatype is not part of the standard JSONSchema and is instead validated by asdf.

By default asdf will allow datatypes that can be safely cast to the datatype listed in the schema (assigning a uint8 to dq is valid). This allowance of "safe-castable" datatypes can be disabled by adding exact_datatype: true to the schema containing the datatype definition.

@schlafly Is this allowance of "safe-castable" datatypes intended or a bug?

@braingram
Copy link
Collaborator Author

This issue and the solution was noted in spacetelescope/roman_datamodels#27

@schlafly
Copy link
Collaborator

Thanks Brett. Sorry for being slow. I presume the safe-casting actually occurs, so that if I make some uint8s and try to put them in dq the uint8s get silently casted to uint32s? I think we can safely call that a feature rather than a bug. I guess @PaulHuwe is the original reporter, so he may feel differently?

@braingram
Copy link
Collaborator Author

@schlafly Thanks for the response. Neither roman_datamodels nor asdf will cast the dtype.

To expand on the the wfi_image schema example, where datatype is uint32 for dq. This means that assigning a uint8 is valid and results in a 'uint8' saved to the file:

m.dq = np.zeros(m.dq.shape, dtype='uint8')  # this is valid
assert m.dq.dtype == 'uint8'  # dq is now uint8
m.save('foo.asdf')
m = rdm.open('foo.asdf')
assert m.dq.dtype == 'uint8'  # and is still uint8 after a write/read

asdf doesn't cast the values it merely checks the dtypes can be safely cast.

@schlafly
Copy link
Collaborator

Ah, thanks. My sense is that we probably want the schema to be right; if we say that there are uint32s in there that better be what they are. For these kind of safe-castable things I'd be on board with the safe-casting occurring automatically but less on board if we are allowed to write out files that aren't what we say they are. I presume "automatic safe cast to correct type" isn't an option, so I guess I'd be arguing for not allowing safe casting?

@braingram
Copy link
Collaborator Author

Unfortunately, I'm not aware of any way to do an automatic safe cast with asdf or roman_datamodels. Perhaps a plan to address this issue would be:

  • update the schemas to use exact_datatype: true
  • run the regression tests to see if any failures occur due to the more strict dtype requirement

A "complete" fix may be more involved as I just ran a quick test here where I added exact_datatype: true to the dq portion of the wfi_image schema. Running the above example:

m.dq = np.zeros(m.dq.shape, dtype='uint8')  # this still passes and likely should not given the validation on assignment
m.save('foo.asdf')  # with exact_datatype this now raises a ValidationError

It turns out things are worse than expected for assignment validation where even float and complex dtypes work.

m.dq = np.zeros(m.dq.shape, dtype='float64')   # no issue
m.dq = np.zeros(m.dq.shape, dtype='complex64')   # no issue

This appears to be due to roman_datamodels using only one custom type validator and the default yaml validators (which do not include the datatype validator).
https://github.com/spacetelescope/roman_datamodels/blob/2706ac1e502c9fdd98073145e6213c51b2515c31/src/roman_datamodels/stnode/_node.py#L26-L27

That seems like a separate roman_datamodels issue (where it is not doing any datatype validation on assignment) and fixing the schemas will at least allow asdf to catch the error when the file is written.

@schlafly
Copy link
Collaborator

Thanks for looking at this. For my two cents, I don't really feel strongly about validate-on-assignment vs. validate-on-save; either way it won't be long until the problem bites us. The nicest feature of validate-on-assignment is that when we fail we'll see exactly where we failed, but I wouldn't go through too many contortions to achieve that.

I agree with you that fixing the schemas sounds like the way to go at this point, ignoring the roman_datamodels issue (which also of course is seeing a lot of flux at the moment...).

@PaulHuwe
Copy link
Collaborator

PaulHuwe commented Feb 6, 2024

I agree with adding the exact_datatype: true line to the appropriate metadata keywords. I have made ticket RAD-148 in Jira for this work.

@stscijgbot-rstdms
Copy link
Collaborator

This issue is tracked on JIRA as RAD-148.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants