Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: SDSS-V SpectrumList format ambiguity, mwmVisit BOSS load fail #1185

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

rileythai
Copy link
Contributor

@rileythai rileythai commented Oct 12, 2024

This PR addresses #1182 , and an issue mentioned in #1107 about SpectrumList loader ambiguity.

Changes are:

  • mwmVisit files now correctly load BOSS-only files.
    • the mwmVisit default loader no longer loads metadata about the "date observed" -- only the MJD of the observation (which is enough)
  • Format is now successfully automatically detected and no longer require a manual specification when loading as a SpectrumList.
    • Forces override of default SDSS-V SpectrumList loaders generated from the Spectrum1D default loaders.
      • BREAKING: if anything previously manually specified any SDSS-V "multi" loaders for SpectrumList, they no longer exist and it will fail.
      • This is only really relevant for sdss_solara's Jdaviz component @havok2063 .
  • Now warns users when using Spectrum1D loader for SDSS-V files with multiple spectra and specific HDU is not specified.
  • Loaders now fetch sdss_id for meta field.
  • Relevant tests in tests/test_sdss_v.py have been adjusted
    • tests now correctly assess whether the mwm filetype is mwmVisit or mwmStar
    • tests have been adjusted to remove temporary file when complete (via os.remove)
    • test cases have been added for BOSS-only mwmVisit and mwmStar files.

- no longer checks for "date_obs"; calculate that yourself
- also adds "sdss_id" to metadata now
…ases

- added new test cases for BOSS-only mwmVisit and mwmStar files
- added new checks to SpectrumList mwmVisit/mwmStar test to check verified filetype is correct
- forced override on default SpectrumList loaders -- now SpectrumList is no longer ambiguous and doesn't require a format specification
  - relevant areas in tests are updated accordingly
- added print warnings to when HDU is not specified on Spectrum1D loaders for files with multiple spectra.
- ensured tests now remove tempfiles with os.remove
  - arguably, this could work better with tmpfile, but i don't know how tests are deployed on the server-side
@pllim
Copy link
Member

pllim commented Oct 12, 2024

Please add a change log.

@weaverba137 , would you be interested to review this or ping someone who would be?

Thanks!

- three points outlining changes listed in PR as per astropy#1185
Copy link

codecov bot commented Oct 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.95%. Comparing base (a34ebb8) to head (87fa13f).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1185      +/-   ##
==========================================
+ Coverage   86.89%   86.95%   +0.05%     
==========================================
  Files          63       63              
  Lines        4564     4568       +4     
==========================================
+ Hits         3966     3972       +6     
+ Misses        598      596       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +354 to +357
"SDSS-V spec",
identifier=spec_sdss5_identify,
dtype=SpectrumList,
force=True,
Copy link
Contributor

@havok2063 havok2063 Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the multi removed, what is the difference between this data loader and the above data loader at line 318? Same with the mwm data loaders?

With this removal, what would be the correct format to specify to load all data extensions? Is it now the default? Jdaviz only supports using format during data load to provide hints to the type of filepath.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal with this was to simplify the interface. The difference previously was that multi loaded every valid HDU extension, whereas the single SpectrumList ones would load just a specified HDU. Since it's a SpectrumList, I felt the default loader for that type should just load every extension.

Across both the Spectrum1D and SpectrumList loaders there should be a single format (SDSS-V [DATATYPE]), which should be detected automagically and no longer needs to be manually specified on SpectrumList.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue might be mostly in Specviz. I'd like to retain the workflow of passing Specviz a single file and having it load all spectra from all extensions. Specviz uses the following https://github.com/spacetelescope/jdaviz/blob/main/jdaviz/configs/specviz/plugins/parsers.py#L78-L87 to load a specutils object from a file, which first attempts to load a Spectrum1D with a format option, and on fail, tries to load the SpectrumList with a format option.

With the multi option, I can explicitly tell Specviz to load using the SpectrumList loader, which will break will this change. Specviz only allows me to pass in a format keyword to specutils. Previously using SDSS-V mwm multi on the mwmStar files allowed me to load all spectra, since it triggered line 87. With this, it only loads the first spectrum from the first extension found.

In my mind, the easiest fix would be to reintroduce the multi here, but maybe it's more appropriate to fix Specviz instead. @rosteen since you're a maintainer/dev of both specutils and jdaviz, do you have thoughts on the best approach, and/or where the fix should go?

I am testing with the mwmStar and mwmVisit files for id 61731453, which has spectra in both the BOSS and APOGEE extensions. the mwmStar file has 1 spectrum per extension. The mwmVisit file has 1 spectrum in the BOSS extension, and 3 spectra in the APOGEE extension, loaded as a SpectrumList of 2 Spectrum1D objects, with flux.shapes of ((4648,), (3, 8575)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to catch up on this this afternoon, I might have thoughts then.

specutils/io/default_loaders/sdss_v.py Outdated Show resolved Hide resolved
@@ -462,14 +467,17 @@ def load_sdss_mwm_1d(file_obj, hdu: Optional[int] = None, **kwargs):
for i in range(len(hdulist)):
if hdulist[i].header.get("DATASUM") != "0":
hdu = i
print('HDU not specified. Loading spectrum at (HDU{})'.
Copy link
Contributor

@havok2063 havok2063 Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a warning or is specifying an HDU required? For production, we use Jdaviz to load the file, and want by default all data extensions loaded, without having to specify an HDU, which I think will be the most common use case by users as well. Does this result in many constant warning messages printed to the user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how we want to handle this case generally. mwm files contain a max of 4 extensions which can each have N spectra. These shouldn't really be loaded into a single Spectrum1D object because APOGEE/BOSS are at different wavelength ranges.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree that the APOGEE/BOSS extensions should not be loaded into a single Spectrum1D. They should be loaded into a SpectrumList. I think the warning is fine. It's useful for anyone using specutils directly, and will be buried within Specviz and Zora.

@havok2063
Copy link
Contributor

havok2063 commented Oct 14, 2024

I think we might need the multi loader. With this change, Specviz now fails to load all the extensions for the mwmStar file. It now only loads the first data extension found.

from jdaviz import Specviz
s = Specviz()
s.show()
s.load_data("mwmStar-0.6.0-61731453.fits", format='SDSS-V mwm')

Before I could load all extensions with s.load_data("mwmStar-0.6.0-61731453.fits", format='SDSS-V mwm multi')

I can load a mwmVisit file in specutils with ss = Spectrum1D.read('mwmVisit-0.6.0-109635459.fits', format='SDSS-V mwm'). However, when I load into Specviz, it fails

s = Specviz()
s.show()
s.load_data(mwmVisit-0.6.0-109635459.fits, format='SDSS-V mwm')

with

WARNING: UnitsWarning: The unit 'Angstrom' has been deprecated in the VOUnit standard. Suggested: 0.1nm. [astropy.units.format.utils]
WARNING:astropy:UnitsWarning: The unit 'Angstrom' has been deprecated in the VOUnit standard. Suggested: 0.1nm.
[ERROR]: Traceback (most recent call last):
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/var/folders/sg/r2s67pm50tv6ydrrhs1c3rmw0000gn/T/ipykernel_88658/1622468986.py", line 1, in <module>
    spec.load_data(files[-7], format='SDSS-V mwm')
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/configs/specviz/helper.py", line 64, in load_data
    super().load_data(data,
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/core/helpers.py", line 110, in load_data
    self.app.load_data(data, parser_reference=parser_reference, **kwargs)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/app.py", line 828, in load_data
    parser(self, file_obj, **kwargs)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/configs/specviz/plugins/parsers.py", line 149, in specviz_spectrum1d_parser
    app.add_data_to_viewer(spectrum_viewer_reference_name, data_label[i])
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/app.py", line 1547, in add_data_to_viewer
    self.set_data_visibility(viewer_item, data_label, visible=visible, replace=clear_other_data)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/app.py", line 2092, in set_data_visibility
    self.hub.broadcast(add_data_message)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/glue/core/hub.py", line 215, in broadcast
    handler(message)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/app.py", line 427, in _on_add_data_message
    self._on_layers_changed(msg)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/app.py", line 573, in _on_layers_changed
    self.state.layer_icons = {
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/echo/core.py", line 261, in __setattr__
    super(HasCallbackProperties, self).__setattr__(attribute, value)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/echo/core.py", line 76, in __set__
    self.notify(instance, old, new)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/echo/core.py", line 125, in notify
    cback(new)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/core/template_mixin.py", line 3275, in <lambda>
    self.app.state.add_callback('layer_icons', lambda _: self._on_data_changed())
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/core/template_mixin.py", line 3433, in _on_data_changed
    self._apply_default_selection()
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/core/template_mixin.py", line 917, in _apply_default_selection
    self.selected = self.labels[0] if len(self.labels) else default_empty
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/core/template_mixin.py", line 624, in __setattr__
    return setattr(self._plugin, self._plugin_traitlets.get(attr), value)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/traitlets/traitlets.py", line 716, in __set__
    self.set(obj, value)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/traitlets/traitlets.py", line 706, in set
    obj._notify_trait(self.name, old_value, new_value)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/traitlets/traitlets.py", line 1513, in _notify_trait
    self.notify_change(
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/ipywidgets/widgets/widget.py", line 701, in notify_change
    super().notify_change(change)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/traitlets/traitlets.py", line 1525, in notify_change
    return self._notify_observers(change)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/traitlets/traitlets.py", line 1568, in _notify_observers
    c(event)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/core/template_mixin.py", line 2751, in _check_non_finite_uncertainty_mismatch
    spec = self._get_1d_spectrum()
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/configs/default/plugins/model_fitting/model_fitting.py", line 318, in _get_1d_spectrum
    return self.dataset.selected_spectrum_for_spatial_subset(self.spatial_subset_selected,
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/core/template_mixin.py", line 3347, in selected_spectrum_for_spatial_subset
    return self.plugin._specviz_helper.get_data(data_label=self.selected,
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/configs/specviz/helper.py", line 333, in get_data
    return self._get_data(data_label=data_label, spatial_subset=spatial_subset,
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/core/helpers.py", line 558, in _get_data
    data = data.get_object(cls=cls, **object_kwargs)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/glue/core/data.py", line 298, in get_object
    return handler.to_object(self, **kwargs)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/glue_astronomy/translators/nddata.py", line 148, in to_object
    attribute = _get_attribute(attribute, data)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/glue_astronomy/translators/nddata.py", line 23, in _get_attribute
    raise ValueError("Data object has more than one attribute, so "
ValueError: Data object has more than one attribute, so you will need to specify which one to use as the flux for the spectrum using the attribute= keyword argument.

ERROR:sdss_access:Traceback (most recent call last):
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/var/folders/sg/r2s67pm50tv6ydrrhs1c3rmw0000gn/T/ipykernel_88658/1622468986.py", line 1, in <module>
    spec.load_data(files[-7], format='SDSS-V mwm')
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/configs/specviz/helper.py", line 64, in load_data
    super().load_data(data,
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/core/helpers.py", line 110, in load_data
    self.app.load_data(data, parser_reference=parser_reference, **kwargs)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/app.py", line 828, in load_data
    parser(self, file_obj, **kwargs)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/configs/specviz/plugins/parsers.py", line 149, in specviz_spectrum1d_parser
    app.add_data_to_viewer(spectrum_viewer_reference_name, data_label[i])
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/app.py", line 1547, in add_data_to_viewer
    self.set_data_visibility(viewer_item, data_label, visible=visible, replace=clear_other_data)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/app.py", line 2092, in set_data_visibility
    self.hub.broadcast(add_data_message)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/glue/core/hub.py", line 215, in broadcast
    handler(message)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/app.py", line 427, in _on_add_data_message
    self._on_layers_changed(msg)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/app.py", line 573, in _on_layers_changed
    self.state.layer_icons = {
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/echo/core.py", line 261, in __setattr__
    super(HasCallbackProperties, self).__setattr__(attribute, value)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/echo/core.py", line 76, in __set__
    self.notify(instance, old, new)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/echo/core.py", line 125, in notify
    cback(new)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/core/template_mixin.py", line 3275, in <lambda>
    self.app.state.add_callback('layer_icons', lambda _: self._on_data_changed())
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/core/template_mixin.py", line 3433, in _on_data_changed
    self._apply_default_selection()
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/core/template_mixin.py", line 917, in _apply_default_selection
    self.selected = self.labels[0] if len(self.labels) else default_empty
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/core/template_mixin.py", line 624, in __setattr__
    return setattr(self._plugin, self._plugin_traitlets.get(attr), value)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/traitlets/traitlets.py", line 716, in __set__
    self.set(obj, value)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/traitlets/traitlets.py", line 706, in set
    obj._notify_trait(self.name, old_value, new_value)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/traitlets/traitlets.py", line 1513, in _notify_trait
    self.notify_change(
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/ipywidgets/widgets/widget.py", line 701, in notify_change
    super().notify_change(change)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/traitlets/traitlets.py", line 1525, in notify_change
    return self._notify_observers(change)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/traitlets/traitlets.py", line 1568, in _notify_observers
    c(event)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/core/template_mixin.py", line 2751, in _check_non_finite_uncertainty_mismatch
    spec = self._get_1d_spectrum()
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/configs/default/plugins/model_fitting/model_fitting.py", line 318, in _get_1d_spectrum
    return self.dataset.selected_spectrum_for_spatial_subset(self.spatial_subset_selected,
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/core/template_mixin.py", line 3347, in selected_spectrum_for_spatial_subset
    return self.plugin._specviz_helper.get_data(data_label=self.selected,
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/configs/specviz/helper.py", line 333, in get_data
    return self._get_data(data_label=data_label, spatial_subset=spatial_subset,
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/jdaviz/core/helpers.py", line 558, in _get_data
    data = data.get_object(cls=cls, **object_kwargs)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/glue/core/data.py", line 298, in get_object
    return handler.to_object(self, **kwargs)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/glue_astronomy/translators/nddata.py", line 148, in to_object
    attribute = _get_attribute(attribute, data)
  File "/Users/brian/anaconda3/envs/valis_solara/lib/python3.10/site-packages/glue_astronomy/translators/nddata.py", line 23, in _get_attribute
    raise ValueError("Data object has more than one attribute, so "
ValueError: Data object has more than one attribute, so you will need to specify which one to use as the flux for the spectrum using the attribute= keyword argument.

HDU not specified. Loading spectrum at (HDU1)

Co-authored-by: Brian Cherinka <havok2063@hotmail.com>
- all loaders now only load for a single datatype, avoiding prior knowledge of SDSS datatypes
- updated to only load as SpectrumList
- updated to load all visits in mwmVisit files as individual Spectrum1D objects in the SpectrumList
- relevant tests removed
- relevant import __all__ adjusted
- describes changes shown previously
@rileythai
Copy link
Contributor Author

rileythai commented Oct 16, 2024

I can load a mwmVisit file in specutils with ss = Spectrum1D.read('mwmVisit-0.6.0-109635459.fits', format='SDSS-V mwm'). However, when I load into Specviz, it fails

This is again the same issue we run into previously where jdaviz can't load Spectrum1D objects with 2D flux arrays. I've opened a jdaviz issue at spacetelescope/jdaviz#3223 (I think I forgot to last time).

With this change, Specviz now fails to load all the extensions for the mwmStar file. It now only loads the first data extension found.

The helper for Specviz's default I/O will try and open it first as a Spectrum1D, and then it will try do it as a SpectrumList only if it throws an IORegistryError. This means it'll default to a non-specified hdu keyword case and just load that first extension.

For both of these, I've been debating how to keep I/O simple without waiting for any jdaviz fixes, so I've done this:

  1. There is a single SDSS-V loader for each datatype. It is always a SpectrumList because every SDSS-V file contains multiple spectra.
  2. It always loads each spectra in that list as a Spectrum1D with 1D flux (bypass jdaviz issues and can be changed later if necessary/desired)
  3. Users can no longer load Spectrum1D instances with the hdu specification (avoids the complexity of users having to know our datatypes to load as Spectrum1D; just send them to SpectrumList I/O)

This is how the DESI default loaders handle I/O, and this avoids any confusion on a user (student's) part who is unfamiliar with the SDSS-V datatype structure.

I'm not sure if this is exactly what we want for specutils though. Should it be that all files load only as SpectrumList objects if they have multiple spectra in them?

edit: with this change the mwm files can load in Jdaviz as well. @havok2063 , and the print warnings now go through the AstropyUserWarning class.

In [6]: s.load_data('mwmVisit-0.6.0-54459273.fits',format='SDSS-V mwm')

WARNING: WARNING: HDU2 (BOSS/LCO) is empty. [specutils.io.default_loaders.sdss_v]
WARNING:astropy:WARNING: HDU2 (BOSS/LCO) is empty.
WARNING: WARNING: HDU3 (APOGEE/APO) is empty. [specutils.io.default_loaders.sdss_v]
WARNING:astropy:WARNING: HDU3 (APOGEE/APO) is empty.
WARNING: WARNING: HDU4 (APOGEE/LCO) is empty. [specutils.io.default_loaders.sdss_v]
WARNING:astropy:WARNING: HDU4 (APOGEE/LCO) is empty.
WARNING: UnitsWarning: The unit 'Angstrom' has been deprecated in the VOUnit standard. Suggested: 0.1nm. [astropy.units.format.utils]
WARNING:astropy:UnitsWarning: The unit 'Angstrom' has been deprecated in the VOUnit standard. Suggested: 0.1nm.
WARNING: UnitsWarning: The unit 'Angstrom' has been deprecated in the VOUnit standard. Suggested: 0.1nm. [astropy.units.format.utils]
WARNING:astropy:UnitsWarning: The unit 'Angstrom' has been deprecated in the VOUnit standard. Suggested: 0.1nm.
WARNING: UnitsWarning: The unit 'Angstrom' has been deprecated in the VOUnit standard. Suggested: 0.1nm. [astropy.units.format.utils]
WARNING:astropy:UnitsWarning: The unit 'Angstrom' has been deprecated in the VOUnit standard. Suggested: 0.1nm.

@havok2063
Copy link
Contributor

havok2063 commented Oct 17, 2024

The helper for Specviz's default I/O will try and open it first as a Spectrum1D, and then it will try do it as a SpectrumList only if it throws an IORegistryError. This means it'll default to a non-specified hdu keyword case and just load that first extension.

This is exactly the problem that was occurring, and why we need an explicitly different format for a list of spectra, until we can fix the Specviz data loader.

  1. There is a single SDSS-V loader for each datatype. It is always a SpectrumList because every SDSS-V file contains multiple spectra.

This is NOT true. The majority of BHM spectra are single spectra. Only the spec-full files contain multiple spectra for individual exposures, and most users do not use these files.

3. Users can no longer load Spectrum1D instances with the hdu specification (avoids the complexity of users having to know our datatypes to load as Spectrum1D; just send them to SpectrumList I/O)

IMO, we should avoid doing this, and I'm hesitant to remove it. Spectrum1D is the primary user entry point for handling spectra. SpectrumList is rarely mentioned in the specutils documentation, beyond that it exists, and for precisely dealing with loading multiple spectra. Spectrum1D will work for the majority of SDSS spectra, and SpectrumList will work for the Astra outputs.

I agree that it'd be nice to have a single format for each SDSS data product. I think it's ok to have users specify hdu=x to get a specific extension. The default hdu value should be 1. The following examples should be supported

ss = Spectrum1D.read("spec-lite.fits", format='SDSS-V spec') # loads the coadd spectrum in hdu 1
ss = Spectrum1D.read("spec-full.fits", format='SDSS-V spec', hdu=6) # loads the exposure spectrum in hdu 6
sl = SpectrumList.read("spec-full.fits", format='SDSS-V spec') # loads all the extensions 
ss = Spectrum1D.read("mwmStar.fits", format='SDSS-V mwm', hdu=3) # loads the apogee/apo spectrum in hdu 3
sl = SpectrumList.read("mwmStar.fits", format='SDSS-V mwm') # loads all extensions as a list of Spectrum1D
ss = Spectrum1D.read("mwmVisit.fits", format='SDSS-V mwm') # loads only the boss/apo spectrum in hdu 1
sl = SpectrumList.read("mwmVisit.fits", format='SDSS-V mwm') # loads all the extensions

For extensions with many spectra on the same wavelength grid, a single Spectrum1D object with flux [n_spectra, n_wave] is the correct format. Until the Specviz loader can be fixed, your current approach of unpacking any n-dimensional flux arrays into individual Spectrum1D objects within the SpectrumList is the right one here.

It's not ideal, but if we also add back in the explicit multi on the format for SpectrumList, then, with your current fix, we could load into the current Specviz.

edit: with this change the mwm files can load in Jdaviz as well. @havok2063 , and the print warnings now go through the AstropyUserWarning class.

While indeed this does "fix" Specviz data loading, it does so at the expense of usability within the specutils package, and potentially other downstream applications.

Comment on lines 465 to 474
return [
Spectrum1D(
spectral_axis=spectral_axis,
flux=flux[i],
uncertainty=e_flux[i],
mask=mask[i],
meta=metadicts[i],
) for i in range(flux.shape[0])
]

Copy link
Contributor

@havok2063 havok2063 Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not already in place, we will need to do a similar thing for the SDSS-V spec loader, but only for any of the spec-full files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't spec-full visits dispersed across the HDUs past the spall and zall HDU's? Or am I mistaken about the format?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's right. They're appended at the end as

5  MJD_EXP_60052-00    1 BinTableHDU    273   4648R x 8C   [E, E, E, J, J, E, E, E]   
6  MJD_EXP_60052-01    1 BinTableHDU    271   4648R x 8C   [E, E, E, J, J, E, E, E]   
7  MJD_EXP_60052-02    1 BinTableHDU    273   4648R x 8C   [E, E, E, J, J, E, E, E]   
8  MJD_EXP_60052-03    1 BinTableHDU    273   4648R x 8C   [E, E, E, J, J, E, E, E] 
...  

So, yeah, a change isn't needed here since they are 1d spectra in separate extensions.

@rileythai
Copy link
Contributor Author

This is NOT true. The majority of BHM spectra are single spectra. Only the spec-full files contain multiple spectra for individual exposures, and most users do not use these files.

I did forget about the individual spectra. My mistake.

I agree with you on your points. I didn't really consider how Spectrum1D is the main user interface enough when thinking about this implementation.

I'd like to keep the interface consistent -- if we output 1D flux here, we output it everywhere. So users should also have to specify the specific visit as well from the mwm interface for mwmVisit files. For example:

Spectrum1D.load('mwmVisit.fits') # loads first visit in 1st HDU w/ data
Spectrum1D.load('mwmVisit.fits', hdu=2) # loads first visit in HDU2
Spectrum1D.load('mwmVisit.fits', visit=2) # loads 3rd (indexing) visit in 1st HDU w/ data 
Spectrum1D.load('mwmVisit.fits', hdu=2, visit=2) # loads 2nd visit in HDU2

It does get a lil confusing but it keeps our interface supported on all datatypes incl. the man one.

It's not ideal, but if we also add back in the explicit multi on the format for SpectrumList, then, with your current fix, we could load into the current Specviz.

I would really like to keep this change since it fixes a specutils side auto-detection bug: i.e. any other 3rd party code loads SpectrumList won't automatically work. I'm fine to remove it if it's an absolute blocker.

Is it possible in the case of sdss_solara to load them as Spectrum objects in the solara app itself and pass that object in?

- readded print -> warnings conversion
- can specify the visit to load on mwmVisit load.
- added relevant tests for the new mwmVisit case
@rosteen
Copy link
Contributor

rosteen commented Oct 18, 2024


For extensions with many spectra on the same wavelength grid, a single Spectrum1D object with flux [n_spectra, n_wave] is the correct format. Until the Specviz loader can be fixed, your current approach of unpacking any n-dimensional flux arrays into individual Spectrum1D objects within the SpectrumList is the right one here.

It's not ideal, but if we also add back in the explicit multi on the format for SpectrumList, then, with your current fix, we could load into the current Specviz.

edit: with this change the mwm files can load in Jdaviz as well. @havok2063 , and the print warnings now go through the AstropyUserWarning class.

While indeed this does "fix" Specviz data loading, it does so at the expense of usability within the specutils package, and potentially other downstream applications.

I agree with @havok2063's points here - we don't want to sacrifice specutils usability.

If my understanding is correct, at this point the main sticking point seems to be Specviz not handling 2D flux arrays. I opened a PR in Jdaviz with a quick patch to do that, it simply unpacks the 2D array into separate Spectrum1Ds before continuing to load the data. Is this sufficient for this use case for now? I see potential downsides, mainly that with a 2D array unpacked like this, there probably isn't a great way to attach the proper metadata about each of the 1D spectra (we could just copy the full meta to all of the split out spectra). That PR needs a change to also handle a 2D array coming from a file, I can add that if this seems like a direction that will enable your use case here.

@havok2063
Copy link
Contributor

havok2063 commented Oct 18, 2024

I'd like to keep the interface consistent -- if we output 1D flux here, we output it everywhere. So users should also have to specify the specific visit as well from the mwm interface for mwmVisit files. For example:
Spectrum1D.load('mwmVisit.fits', visit=2) # loads 3rd (indexing) visit in 1st HDU w/ data
Spectrum1D.load('mwmVisit.fits', hdu=2, visit=2) # loads 2nd visit in HDU2

That's fine. Although Spectrum1D is also meant to hold many spectra with the same spectral axis, so I don't think it's necessary to force the visit spectrum to be 1d. I would say if someone did Spectrum1D.load('mwmVisit.fits', hdu=2) it loads all visits, and they can optionally specify visit to extract just a single visit. But I don't feel too strongly about it.

Does adding the visit keyword here make it available generally on all Spectrum.load calls, or just the visit specific data loader? It may be confusing for users if visit is now an allowed keyword when loading hst or jwst spectra, for example.

I would really like to keep this change since it fixes a specutils side auto-detection bug: i.e. any other 3rd party code loads SpectrumList won't automatically work. I'm fine to remove it if it's an absolute blocker.

Then we'll also need to change the specviz loading logic at https://github.com/spacetelescope/jdaviz/blob/main/jdaviz/configs/specviz/plugins/parsers.py#L78-L87

Is it possible in the case of sdss_solara to load them as Spectrum objects in the solara app itself and pass that object in?

Yeah it is, but I'd like to avoid that. I'd prefer to pass the file and have jdaviz and specutils handle the data loading. If we move that logic into the solara app, then users would need to do the same thing when loading data locally into jdaviz. Which to me defeats the advantage of jdaviz/specutils.

@havok2063
Copy link
Contributor

@rosteen

If my understanding is correct, at this point the main sticking point seems to be Specviz not handling 2D flux arrays. I opened a PR in Jdaviz with a quick patch to do that, it simply unpacks the 2D array into separate Spectrum1Ds before continuing to load the data. Is this sufficient for this use case for now?

Thanks! I will test this out. If we use your PR, then would we likely need to rollback the equivalent changes here?

The other issue is the logic of the Specviz data loader when the input is a filepath, at https://github.com/spacetelescope/jdaviz/blob/main/jdaviz/configs/specviz/plugins/parsers.py#L78-L87. It always tries Spectrum1D first and only on fail, does it try SpectrumList. Previously we had loader formats SDSS-V mwm and SDSS-V mwm multi for explicitly loading data as Spectrum1D and SpectrumList, respectively. This worked with Specviz because I could specify the multi format and trigger line 87. This PR would consolidate the specutils formats into a single SDSS-V mwm format for both, which would break the loading since line 82 always gets called, since it's always using the Spectrum1D loader first.

I'm not sure the right solution here. Is there a preference in specutils for creating specifically named loader formats for Spectrum1D vs SpectrumList? One option is to go back to different format names here. Another option could be to allow Specviz.load_data to specify a dtype keyword to prefer loading with either Spectrum1D or SpectrumList?

I see potential downsides, mainly that with a 2D array unpacked like this, there probably isn't a great way to attach the proper metadata about each of the 1D spectra (we could just copy the full meta to all of the split out spectra).

I don't have a strong opinion on the metadata. I'd be fine copying the full dict. I'm not aware of any visit-specific metadata but @rileythai might know. @rileythai what are you currently copying over for metadata?

That PR needs a change to also handle a 2D array coming from a file, I can add that if this seems like a direction that will enable your use case here.

Yeah our use case has the nd array coming from an extension in a file.

@rosteen
Copy link
Contributor

rosteen commented Oct 18, 2024

Hm, how about adding a force_load_as_list keyword to the Specviz loader that forces it to use the SpectrumList instead of Spectrum1D read function? That would be a minimal change on our end and I think would be sufficient to replace the multi specification.

@havok2063
Copy link
Contributor

Hm, how about adding a force_load_as_list keyword to the Specviz loader that forces it to use the SpectrumList instead of Spectrum1D read function? That would be a minimal change on our end and I think would be sufficient to replace the multi specification.

That works for me!

@rosteen
Copy link
Contributor

rosteen commented Oct 18, 2024

Great, I implemented it in that PR as load_as_list. Once @rileythai confirms that my PR is working with this I'll move it out of draft to get reviewed.

- no longer unpacks nD spectrum in all cases for mwm
- will still unpack flux from 2D to 1D in the event its single visit or coadd
@rileythai
Copy link
Contributor Author

rileythai commented Oct 19, 2024

@rileythai what are you currently copying over for metadata?

I'm copying over lists of some visit-specific stuff that isn't in the header (SNR, MJD, telescope) that could let users figure out which visits could be bad and where they come from, etc. As of now when imported and copied they'll just become a list of values for each of the imported and unpacked spectra.

image

We could make the 2D-to-1D converter split list-like objects with the same length as spectrum1d.flux.shape[0] across N metadata dictionaries and bind one to each Spectrum1D, like this PR did previously.

Once @rileythai confirms that my PR is working with this I'll move it out of draft to get reviewed.

@rosteen check my PR for your PR, which has some fixes for the edge case and simplifies the data_label handling a bit. It works for our files.

how about adding a force_load_as_list keyword to the Specviz loader that forces it to use the SpectrumList instead of Spectrum1D read function?

While this works for the CLI case, the SDSS-V data loaded through the jdaviz solara webapp (IMPORT DATA button) won't load all the spectra in the file, just those from the first extension. As we said previously:

...then users would need to do the same thing when loading data locally into jdaviz. Which to me defeats the advantage of jdaviz/specutils

There's a minor Specviz side change that could fix this. See my comment about that at spacetelescope/jdaviz#3229 (comment)

rileythai and others added 4 commits October 26, 2024 13:41
- fixed flux array length check so that the 2D mwm visits are checked properly
- removed HDU is empty warning on mwm SpecList, not really worth a warning
- moved test cases which are designed to throw exceptions into their own test function
@rileythai
Copy link
Contributor Author

This PR should be good to go now @rosteen .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants