Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExposurePipeline uses meta.filename instead of actual filename #818

Open
braingram opened this issue Aug 9, 2023 · 12 comments
Open

ExposurePipeline uses meta.filename instead of actual filename #818

braingram opened this issue Aug 9, 2023 · 12 comments

Comments

@braingram
Copy link
Collaborator

I believe the changes in #802 are not compatible with the test_processing_pipeline_all_saturated regtest.

Here's a recent regtest run with this PR where only the mentioned test was run and stdout was not captured:
https://plwishmaster.stsci.edu:8081/blue/organizations/jenkins/RT%2FRoman-Developers-Pull-Requests/detail/Roman-Developers-Pull-Requests/311/pipeline/205/#step-206-log-97

A few relevant log lines are:

2023-08-08 16:45:59,578 - stpipe.ExposurePipeline - INFO - Starting Roman exposure calibration pipeline ...
2023-08-08 16:45:59,818 - stpipe.ExposurePipeline - INFO - Processing a WFI exposure <roman_datamodels.datamodels._datamodels.ScienceRawModel object at 0x7f0893db6450>
...
2023-08-08 16:51:22,126 - stpipe.ExposurePipeline - INFO - Roman exposure calibration pipeline ending...

2023-08-08 16:51:24,208 - stpipe.ExposurePipeline - INFO - Saved model in r0000101001001001001_01101_0001_WFI01_cal.asdf

Note the filename above does not match the output filename in the regression test:

output = "r0000101001001001001_01101_0001_WFI01_ALL_SATURATED_cal.asdf"

which causes the error

E                   FileNotFoundError: [Errno 2] No such file or directory: '/srv/jenkins/workspace/RT/Roman-Developers-Pull-Requests/clone/test_outputs/popen-gw1/test_processing_pipeline_all_s0/r0000101001001001001_01101_0001_WFI01_ALL_SATURATED_cal.asdf'
/srv/jenkins/workspace/RT/Roman-Developers-Pull-Requests/miniconda/envs/tmp_env0/lib/python3.11/site-packages/asdf/generic_io.py:1150: FileNotFoundError

This seems related to the input filename handling in exposure pipline where when provided with an input as a string, input is overwritten as a datamodel:

file_type = filetype.check(input)
asn = None
if file_type == "asdf":
try:
input = rdm.open(input)
except TypeError:
log.debug("Error opening file:")
return

then added to expos_file:
# Build a list of observations to process
expos_file = []
if file_type == "asdf":
expos_file = [input]
elif file_type == "asn":
for product in asn["products"]:
for member in product["members"]:
expos_file.append(member["expname"])

before this check:
if isinstance(in_file, str):
input_filename = basename(in_file)
log.info(f"Input file name: {input_filename}")
else:
input_filename = None

which means that meta.filename is not overwritten here:
if input_filename:
result.meta.filename = input_filename

and the meta.filename in the input file does not match the filename (r0000101001001001001_01101_0001_WFI01_ALL_SATURATED_uncal.asdf):

In [10]: af['roman']['meta']['filename']
Out[10]: 'r0000101001001001001_01101_0001_WFI01_uncal.asdf'

making the output filename incorrect.

@braingram
Copy link
Collaborator Author

@ddavis-stsci should the truth file be updated so the meta.filename is correct or the code fixed to overwrite meta.filename?

@ddavis-stsci
Copy link
Collaborator

ddavis-stsci commented Aug 9, 2023 via email

@braingram
Copy link
Collaborator Author

Thanks!

So it sounds like this is a new issue introduced by #802 and not an issue with the truth file?

To summarize, prior to #802 meta.filename was overwritten during exposure pipeline, which allowed test_processing_pipeline_all_saturated to generate a file with name r0000101001001001001_01101_0001_WFI01_ALL_SATURATED_cal.asdf matching the one expected by the regression test. After #802, meta.filename is not overwritten so the same test produces a file r0000101001001001001_01101_0001_WFI01_cal.asdf which leads to the FileNotFoundError when the regression tests attempts to open a file that doesn't exist.

@ddavis-stsci
Copy link
Collaborator

ddavis-stsci commented Aug 9, 2023 via email

@braingram
Copy link
Collaborator Author

The first instance of this test failing with FileNotFoundError I see is run 361:
https://plwishmaster.stsci.edu:8081/job/RT/job/Roman-devdeps/361/
which ran with the romancal commit including the changes from #802: df09aa5

I see prior failures (358, 359, 360) of this test but those were not due to FileNotFoundError and instead because of the rad schema changes: spacetelescope/rad#301

357 passed with 1 failure of a different test.

Is there an instance of this test failing with FileNotFoundError prior to #802?

@ddavis-stsci
Copy link
Collaborator

ddavis-stsci commented Aug 9, 2023 via email

@braingram
Copy link
Collaborator Author

Thanks for the update.
I ran just this regtest on jenkins and it's now showing a different error:
https://plwishmaster.stsci.edu:8081/job/RT/job/Roman-devdeps/364/testReport/romancal.regtest/test_wfi_pipeline/_unstable_deps__test_processing_pipeline_all_saturated/

>       assert model.meta.cal_step.linearity == "SKIPPED"
E       AssertionError: assert 'COMPLETE' == 'SKIPPED'
E         - SKIPPED
E         + COMPLETE

/srv/jenkins/workspace/RT/Roman-devdeps/clone/romancal/regtest/test_wfi_pipeline.py:455: AssertionError

Is preferring 'meta.filename' over the actual filename preferred? If so, does this match what happens when files are read from an association? Looking at the code it looks like for those files the filename would take precedence:

if isinstance(in_file, str):
input_filename = basename(in_file)
log.info(f"Input file name: {input_filename}")
else:
input_filename = None
# Open the file
input = rdm.open(in_file)
log.info(f"Processing a WFI exposure {in_file}")
self.dq_init.suffix = "dq_init"
result = self.dq_init(input)
if input_filename:
result.meta.filename = input_filename

@ddavis-stsci
Copy link
Collaborator

ddavis-stsci commented Aug 9, 2023 via email

@ddavis-stsci
Copy link
Collaborator

Brett you might want to close this issue. This is fixed in rcal-631

@braingram
Copy link
Collaborator Author

Thanks for pinging me on this.
I just tested main with r0000101001001001001_01101_0001_WFI01_uncal.asdf by running:

strun romancal.pipeline.ExposurePipeline r0000101001001001001_01101_0001_WFI01_uncal.asdf

This produces a file r0000101001001001001_01101_0001_WFI01_cal.asdf.

However, if I copy r0000101001001001001_01101_0001_WFI01_uncal.asdf to foo_uncal.asdf and then run the same command (using the copied file):

strun romancal.pipeline.ExposurePipeline foo_uncal.asdf

it produces a file with what appears to be an incorrect filename r0000101001001001001_01101_0001_WFI01_cal.asdf. With the relevant log message:

2023-08-29 15:52:01,477 - stpipe.ExposurePipeline - INFO - Saved model in r0000101001001001001_01101_0001_WFI01_cal.asdf

I will update the title of this issue to hopefully a more descriptive title.

@braingram braingram changed the title #802 input_filename changes incompatible with test_processing_pipeline_all_saturated ExposurePipeline uses meta.filename instead of actual filename Aug 29, 2023
@ddavis-stsci
Copy link
Collaborator

You cannot just change the filename you need to update meta.filename to match. So you need to copy r0000101001001001001_01101_0001_WFI01_uncal.asdf to foo_uncal.asdf
and set the meta.filename = 'foo_uncal.asdf'

This is built into stpipe and is not something romancal controls. It looks like in stpipe you can use the input filename as a template for the output filename,

    def default_output_file(self, input_file=None):
        """Create a default filename based on the input name"""
        output_file = input_file

if meta.filename is not present. I haven't played with this.
We need to use meta.filename here for the case that the input file is an association file,
e.g. r00001-o001_image_001_asn.json
we don't really want the output file to be r00001-o001_image_001_cal.json(??)

@braingram
Copy link
Collaborator Author

Thanks for the response.

It appears that the output filename (used by stpipe) is set here in the romancal ExposurePipeline:

self.output_file = input.meta.filename

For a non-association run where the filename (string) is provided as an input. Prior to #802 the input filename was stored to input_filename here:

input_filename = basename(input)

prior to the input being overwritten as the opened model:
input = rdd.open(input)

after the model is opened input_filename is assigned to input.meta.filename which meant that the above run of foo_uncal.asdf would produce a foo_cal.asdf file.

On the current main branch (after #802) the line that assigned input_filename:

input_filename = basename(input)

has no effect as input_filename is overwritten as None before it is used:
input_filename = None

which results in this if conditional failing and input.meta.filename not being assigned:
if input_filename:
result.meta.filename = input_filename

So to summarize, prior to #802 the actual filename was used, after #802 meta.filename was used. This resulted in the regression test failure that caused this issue to be opened. Looking at the code, I suspect that for files in an association the actual filename will be used and not meta.filename of the file in the association but I do not have an association to test this.

I will follow this up with a PR that should hopefully help clarify this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants