Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

translator for specreduce Trace objects #72

Merged
merged 8 commits into from
Aug 17, 2022

Conversation

kecnry
Copy link
Contributor

@kecnry kecnry commented Jul 20, 2022

This PR implements a new translator for specreduce Trace objects. In doing so, it adds specreduce as a dependency. In order to round-trip and retrieve the same underyling Trace object (instead of always having to export to an ArrayTrace), this currently saves a copy of the input Trace object in the meta of the glue Data object. If this is not preferred, we could create an ArrayTrace from the glue Data object, but we'll likely still need to attach an instance or reference to the input data.

@dhomeier dhomeier added the enhancement New feature or request label Jul 22, 2022
@dhomeier
Copy link
Contributor

Do you think you could set up a test for this with a basic example Data set?

@dhomeier
Copy link
Contributor

dhomeier commented Jul 26, 2022

Thanks for this contribution, this looks all quite good to me, although I am not familiar enough with the Trace format to say if this is the best possible implementation.
It certainly deserves a changelog entry as well, perhaps under 0.5.0 (unreleased) for a new feature.

The main points I see remaining to be settled are

  1. Adding specreduce to requirements; with specutils and spectral-cube already runtime dependencies and specreduce now an Astropy-coordinated package as well, I don't see much problems with that.
  2. As you mentioned, adding the full trace to the meta data might be controversial. As I see it at the moment you are storing a full copy in data.meta['Trace'], while data.meta['Trace'].trace is at the same time kept in data['y'] (this might additionally be checked in the tests BTW). Wondering if it would be preferable to strip the trace itself from the meta copy and add it back in later in to_object to save some space, but I have no idea if it's worth the effort at all for typical sizes of these objects – I guess they are usually (n-1)-D, so not that large? Possibly there is not even a real memory impact if this is a shallow copy.

Leaving those open for discussion with @astrofrog.

Copy link
Collaborator

@eteq eteq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @dhomeier that the dependency doesn't seem a big deal given the status of specreduce now.

I also agree that the memory impact is not serious given that, from my read of the internals of Trace and its sub-classes, both obj.trace and obj.image are references not copies, and this is just carrying along those references. As long as glue itself does not do any copying later on down the line this is probably "free".

obj : `specreduce.tracing.Trace`
The Trace object to convert
"""
data = Data(x=obj.image[0], y=obj.trace)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is implicitly assuming that the 0th dimension is the "spectral"/"x" dimension. I think that's also what specreduce is doing, so that's probably fine right now. But I think it would be better if this is a property of the Trace object that indicates which is the "x" axis.

Maybe I should create an issue in specreduce about this and you can just add a "TODO: change when issue #xxx is addressed" or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are some places in specreduce with general code to handle arbitrary "dispersion_axis", other parts where it's fixed at 0 but stored, and others where it's just assumed to be 0. I agree that this is a specreduce issue that needs to be addressed consistently across all methods, and then generalize back here when that is supported.

Copy link
Contributor

@dhomeier dhomeier Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are some places in specreduce with general code to handle arbitrary "dispersion_axis", other parts where it's fixed at 0 but stored, and others where it's just assumed to be 0. I agree that this is a specreduce issue

What I can see in the different Trace subclasses is using the same throughout (actually axis 1, which is the dimension that you get from obj.image[0]); KosmosTrace has _disp_axis and _crossdisp_axis properties, but currently fixes them at 1 and 0, respectively. Might make sense to check for the presence of obj._disp_axis and assume 1 otherwise, though.

More importantly however I realised that I had confused axes lengths, values and dimensions here, and am now wondering why you'd want obj.image[0] as component 'x' here at all. This is just a (somewhat arbitrary?) row of count values in the detector image, so really bears no particular information on the dispersion or spectral axis, right?
You could construct a pixel count or similar from that, but Data already gives you that automatically in data[data.pixel_component_ids[0]].
Of course if that pixel row is required in your application it can be included by the importer, but then the documentation should add an explanation why this component is added. My impression now is that the most natural structure would be to have only obj.trace as a component (and perhaps also name it 'trace' rather than 'y', which seems more self-explanatory).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, I removed the x-component entirely and that still seems to do what is needed (without needing to worry or document anything about the dispersion axes). Please check to see if this is what you had in mind or if you can think of any scenario this wouldn't work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; I think this is the most intuitive structure. About potential use scenarios I cannot say much, but in any case if you needed any of the actual image data, you'd still have access through data.meta['Trace'].
Could you rebase; this will hopefully fix the previous test failures?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebased, but I don't have permissions to rerun the CI workflows. Thanks!

if not isinstance(data.meta.get('Trace'), Trace):
raise TypeError("data is not of a valid specreduce Trace object")

return data.meta['Trace']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worries me, because it is completely ignoring what's actually in the data object itself. I think that's counter-intuitive because a likely eventual workflow is to change the y trace by hand, which is probably most naturally done in glue.

So I would suggest the following:

  1. If there's some way that glue data objects can flag "have been modified since created", only do the next step if that one is true.
  2. Do something like:
Suggested change
return data.meta['Trace']
data.meta['Trace'].trace = data.y
return data.meta['Trace']

Now this does introduce an assymetry between x and y. But for a trace I think that makes sense.

Regardless, I think either in to_object or the top-level class, the docstring should be explcit about whether it gets data from glue to create the object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea. We can even include a check to make sure that x has not changed and that y is the correct length. In cases where y has changed, we'll probably want to return an ArrayTrace with the modified trace.y. For example: if the input trace object is a FlatTrace, but then the underlying data is modified once in glue, it no longer (necessarily) makes sense to return another FlatTrace object, right?

In theory, I don't see why Trace needs the input image, it's just convenient to have attached, so if we can make that more flexible on the specreduce end, then we might be able to rebuild the object from scratch instead of relying on these references. But I think in cases where the data isn't modified, it is nice to roundtrip and return the same type of trace, which at least involves storing some more information than the arrays themselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I now check to see if the "y"-component matches the trace, and if not, create a new ArrayTrace object with the same image. It turns out that glue does not allow changing the shape of data components, so there is no need for any checks on the lengths, but I do raise an error if the "x"-component no longer matches the dispersion axis of the image.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree that the memory impact is not serious given that, from my read of the internals of Trace and its sub-classes, both obj.trace and obj.image are references not copies, and this is just carrying along those references. As long as glue itself does not do any copying later on down the line this is probably "free".

Any data component initialising or being added to a data object is processed by Component.autotyped(), which will typically involve np.asarray(data) and potentially other re-casting operations. So for most ndarray inputs there will probably be no actual copying, but even if, this only involves obj.trace. The meta reference is just an OrderedDict item.

  1. If there's some way that glue data objects can flag "have been modified since created", only do the next step if that one is true.

Wondering if the callback mechanism might be used for this, although I do not see an example where this is applied in a Data component so far, apart from the use in _update_world_components.

@eteq
Copy link
Collaborator

eteq commented Jul 29, 2022

(Oh, and my "If there's some way that glue data objects can flag "have been modified since created", only do the next step if that one is true." comment could use some insight from @astrofrog if @dhomeier doesn't know the answer!)

kecnry and others added 6 commits August 12, 2022 12:39
additional assert statement in test

Co-authored-by: Derek Homeier <dhomeie@gwdg.de>
* and raise error if "x" component is changed
* can rely on the automatic pixel component and recover everything from the original trace object
@kecnry kecnry force-pushed the translator-specreduce-trace branch from 6680612 to 46c65e5 Compare August 12, 2022 16:39
@codecov
Copy link

codecov bot commented Aug 12, 2022

Codecov Report

Merging #72 (aec7ee3) into main (bae4ecf) will decrease coverage by 0.16%.
The diff coverage is 92.68%.

❗ Current head aec7ee3 differs from pull request most recent head 7ce7142. Consider uploading reports for the commit 7ce7142 to get more accurate results

@@            Coverage Diff             @@
##             main      #72      +/-   ##
==========================================
- Coverage   97.63%   97.46%   -0.17%     
==========================================
  Files          15       17       +2     
  Lines        1182     1223      +41     
==========================================
+ Hits         1154     1192      +38     
- Misses         28       31       +3     
Impacted Files Coverage Δ
glue_astronomy/translators/trace.py 85.71% <85.71%> (ø)
glue_astronomy/translators/__init__.py 100.00% <100.00%> (ø)
glue_astronomy/translators/tests/test_trace.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Contributor

@dhomeier dhomeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the minor style comments this should be good to go, thanks again!

@eteq I think your change requests are adequately addressed with this (the question which axis is dispersion in fact no longer arises)?
@astrofrog the only potentially contentious issue I see remaining is the specreduce dependency, but we had a clear majority in favour of that.

glue_astronomy/translators/trace.py Outdated Show resolved Hide resolved
glue_astronomy/translators/trace.py Outdated Show resolved Hide resolved
@dhomeier
Copy link
Contributor

Oh, one last thing: please add a changelog entry, and change the current section heading to 0.5.0 (unreleased)! :)

@dhomeier
Copy link
Contributor

Successful test run, so considering the last commit a [ci skip].

@dhomeier dhomeier merged commit d245375 into glue-viz:main Aug 17, 2022
@eteq
Copy link
Collaborator

eteq commented Aug 17, 2022

sorry I didn't get back to this sooner, by a a post-hoc "yes this is good, thanks!"

@kecnry kecnry deleted the translator-specreduce-trace branch August 17, 2022 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants