Skip to content

Spectrum 1D concept & issues

wkerzendorf edited this page Feb 28, 2012 · 17 revisions

Spectrum1D API concepts and issues

Implementation questions

Instantiator

  • two main suggestions: instantiation with a flux & a dispersion array OR a flux array and a WCS object

  • suggested WCS implementation:

https://github.com/wkerzendorf/specutils/blob/specio/specutils/io/io_utils.py

https://github.com/wkerzendorf/specutils/blob/specwcs/specutils/specwcs/specwcs.py

Slicing

  • slicing can work on either dispersion or pixels

  • keyword to switch between both of them

  • getitem overload?

arithmetic

  • operator overload?

if operator overload

  • what operands allowed? scalars, spectrum same dispersion, spectrum different dispersion
  • op-mode setting in spectrum1d?

interpolator

  • for now simple np.interp?
  • thinking about interpolation using wcs?

API questions

Copy vs inplace

  • meta data copy -- time intensive?

  • pythonic to do copy?

methods vs functions

  • when to use functions, when spectrum1d internal methods.

  • argument plotting

  • argument io

textbased keyword arguments vs function names

  • e.g. slice(blah, units='index') vs slice_index(blah)

Issues outside the Spectrum1D discussion

Error

Mask


===================== From the pull request: #1

  1. Instantiator

**Ideas**:
    
    s = Spectrum1D(flux=<flux array>, dispersion=<dispersion array>)
    vs. 
    s = Spectrum1D.from_whatever()

@astrofrog
    
    I agree with this, it does indeed not make sense to initialize from anything other than two 1-D sequences (numpy array, list, tuple), because it is easy enough to specify two columns from a table, from a file, etc. and Spectrum1D should not really be implementing I/O. I think this does not mean that in future, one could not imagine having a 'read' method for well defined 1-D spectrum specific formats that don't have any ambiguities though.
    
@eteq

    I mostly agree with @astrofrog (and @adrn and @demitri) - we don't want any of the io stuff actually implemented here... The from_table is a bit less clear, to me - it seems needlessly verbose to type Spectrum1D(table['wavelength'],table['flux']) when I could type Spectrum1D.from_table(table). But then again, perhaps that also should be considered "io" and a io.from_table function that we make later could accept either a filename of a table OR a Table object.
     
@adrn
    
    @eteq regarding your point about from_table() -- I think the idea here is that Spectrum1D(table['wavelength'],table['flux']) vs. Spectrum1D.from_table(table, columns=[0,1]) (or similar) doesn't save you anything, because if you have non-standard column names you still end up specifying the columns. But regardless, I agree that it probably belongs somewhere in io.

@wkerzendorf
    
    I agree 100% there should only be one way to instantiate the Spectrum1D object. In my opinion this is init(self, data, error=None, mask=None, wcs=None, meta=None, units=None, copy=True, validate=True).

    What I merely did was include the read-in methods, that you guys mentioned, within the Spectrum1D object. Obviously we could put them somewhere in specutils.io, but why? IMHO, they do belong to Spectrum1D as they are only useful with a Spectrum1D object. It would obviously be reflected in the doc_string of Spectrum1D, how to use these read methods.

    A spectrum dispersion solution is in the core a world coordinate system. If you reduce spectra with IRAF or other software for that matter, one often fits some polynomial to get a mapping between pixel and dispersion space. I think it's not a good idea to just force every user to map out his WCS into an array, as there's definitely a loss of information that is important in some cases. 
    In addition, the current implementation (just using flux and dispersion arrays) sees no way of specifying what units the dispersion axis is. For example, I would like to multiply my spectrum with a filter curve and subsequently integrate it to get a flux (I think Adam suggested this in one of the emails as well). I believe all the attributes in NDData are important for Spectrum1D and should therefore be used in Spectrum1D.
    Of course we could include the units in such a implementation, but as the wcs will have that anyways I don't see a reason to duplicate the effort.

    As for my implementation it just set WCS equal to the lookup table (dispersion array), for now. But this is only a temporary measure until there's a proper WCS in place that deals with lookup tables. With a proper WCS class, when .disp is required it would return a lookup table generated from the true WCS.

**Consensus**:
    It seems that the consensus is that there should be only one way to create a Spectrum1D object, and io operations don't belong in the Spectrum1D class. 
  1. Be verbose in naming

**Ideas**:
    
    Abbreviate 
    vs.
    Spell things out

@astrofrog
    
    Also agree, I keep stumbling every time I see 'disp' too!
    
@eteq

    Agreed, except that "dispersion" is a bit long to type for something we may be using a lot (in an interactive/ipython setting). What about having dispersion be the actual name (which we will use in all examples and documentation), and add in a very-short alias like x to make interactive work quicker/easier (with a clear warning somewhere that it should not be used in a code/library setting)? Or is that too much rope to hang oneself with?
     
@adrn
    
     I agree that it is long, but I don't know how I feel about having both dispersion and x. What do others think?

@wkerzendorf
    
    I agree in most cases to be verbose. I think however that disp for now is fine. We spoke in great length about what we wanted to call that axis (and after 'wave' was shot down and that rightly so ;-) ), we decided between dispersion and disp. disp won and I still think that's a good name. It would be at a prominent place in the Spectrum1D docstring. We could have both I guess.

@andycasey

    2 - Verbosity is great, but it should not be forced for those who know what they're doing. Have the documentation all state the argument as dispersion, but just as matplotlib allows me to specify "edgecolor" or "ec", let me specify disp instead of dispersion.
    FYI I have never read "disp" as display in these discussions. Just like you read RCE as rice, I read it as race. Let the user who knows what they're doing use a shortcut - just don't advertise the shortcut.

**Consensus**:
    We should not use 'disp' (use dispersion instead), but whether or not we implement a shortcut '.x' is still up in the air.
  1. Scipy Dependency

Astropy has the requirement that if scipy is not available, it should not break the imports, so if scipy is not available, the interpolate method would have to be disable. Therefore, I am suggesting that for linear interpolation, np.interp should be used so that at least basic interpolation is available if scipy is not installed. But I'm all for allowing more complex interpolation if scipy is installed.
  1. Spectrum1D is not an array

**Ideas**:
    
    Slicing on the Spectrum1D object itself e.g. spectrumObject[100:200]
    vs.
    Slicing methods e.g. spectrumObject.slice_wavelength[1001.41:2000.52]

@adrn

     I think the point is just what you say: we shouldn't allow spectrum[0:100] because (as a general comment) a Spectrum is not an array!

@wkerzendorf
    I believe that was taken a little bit out of context. My slice operations had a switch, what units to use with slicing. The essence of the spectrum1d object is the flux array. I believe it is pretty clear, what is meant by slicing (in the sense that we cut out a bit of the array, not in the what values we are slicing on). I also believe that slicing should always return a new Spectrum1D array. Many other functions in python work that way (slicing ndarrays, slicing lists, ….). I have had great success with overloading the getitem operator and using this as slicing in dispersion units. It's a simple and obvious convention that many people in my old department liked. I propose having the overload in addition to slice.

@adrn But I strongly disagree with overloading getitem -- a Spectrum is not an array, so spectrumObject[100:200] doesn't make sense! It is much less ambiguous to do spectrumObject.slice_index(100,200)