-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
translator for specreduce Trace objects #72
Conversation
Do you think you could set up a test for this with a basic example |
Thanks for this contribution, this looks all quite good to me, although I am not familiar enough with the The main points I see remaining to be settled are
Leaving those open for discussion with @astrofrog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @dhomeier that the dependency doesn't seem a big deal given the status of specreduce
now.
I also agree that the memory impact is not serious given that, from my read of the internals of Trace
and its sub-classes, both obj.trace
and obj.image
are references not copies, and this is just carrying along those references. As long as glue
itself does not do any copying later on down the line this is probably "free".
glue_astronomy/translators/trace.py
Outdated
obj : `specreduce.tracing.Trace` | ||
The Trace object to convert | ||
""" | ||
data = Data(x=obj.image[0], y=obj.trace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is implicitly assuming that the 0th dimension is the "spectral"/"x" dimension. I think that's also what specreduce
is doing, so that's probably fine right now. But I think it would be better if this is a property of the Trace
object that indicates which is the "x" axis.
Maybe I should create an issue in specreduce about this and you can just add a "TODO: change when issue #xxx is addressed" or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are some places in specreduce with general code to handle arbitrary "dispersion_axis", other parts where it's fixed at 0 but stored, and others where it's just assumed to be 0. I agree that this is a specreduce issue that needs to be addressed consistently across all methods, and then generalize back here when that is supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are some places in specreduce with general code to handle arbitrary "dispersion_axis", other parts where it's fixed at 0 but stored, and others where it's just assumed to be 0. I agree that this is a specreduce issue
What I can see in the different Trace
subclasses is using the same throughout (actually axis 1, which is the dimension that you get from obj.image[0]
); KosmosTrace
has _disp_axis
and _crossdisp_axis
properties, but currently fixes them at 1 and 0, respectively. Might make sense to check for the presence of obj._disp_axis
and assume 1 otherwise, though.
More importantly however I realised that I had confused axes lengths, values and dimensions here, and am now wondering why you'd want obj.image[0]
as component 'x'
here at all. This is just a (somewhat arbitrary?) row of count values in the detector image, so really bears no particular information on the dispersion or spectral axis, right?
You could construct a pixel count or similar from that, but Data
already gives you that automatically in data[data.pixel_component_ids[0]]
.
Of course if that pixel row is required in your application it can be included by the importer, but then the documentation should add an explanation why this component is added. My impression now is that the most natural structure would be to have only obj.trace
as a component (and perhaps also name it 'trace'
rather than 'y'
, which seems more self-explanatory).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, I removed the x-component entirely and that still seems to do what is needed (without needing to worry or document anything about the dispersion axes). Please check to see if this is what you had in mind or if you can think of any scenario this wouldn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks; I think this is the most intuitive structure. About potential use scenarios I cannot say much, but in any case if you needed any of the actual image data, you'd still have access through data.meta['Trace']
.
Could you rebase; this will hopefully fix the previous test failures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebased, but I don't have permissions to rerun the CI workflows. Thanks!
glue_astronomy/translators/trace.py
Outdated
if not isinstance(data.meta.get('Trace'), Trace): | ||
raise TypeError("data is not of a valid specreduce Trace object") | ||
|
||
return data.meta['Trace'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worries me, because it is completely ignoring what's actually in the data
object itself. I think that's counter-intuitive because a likely eventual workflow is to change the y
trace by hand, which is probably most naturally done in glue.
So I would suggest the following:
- If there's some way that glue data objects can flag "have been modified since created", only do the next step if that one is true.
- Do something like:
return data.meta['Trace'] | |
data.meta['Trace'].trace = data.y | |
return data.meta['Trace'] |
Now this does introduce an assymetry between x and y. But for a trace I think that makes sense.
Regardless, I think either in to_object
or the top-level class, the docstring should be explcit about whether it gets data from glue to create the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea. We can even include a check to make sure that x
has not changed and that y
is the correct length. In cases where y
has changed, we'll probably want to return an ArrayTrace
with the modified trace.y
. For example: if the input trace object is a FlatTrace
, but then the underlying data is modified once in glue, it no longer (necessarily) makes sense to return another FlatTrace
object, right?
In theory, I don't see why Trace
needs the input image, it's just convenient to have attached, so if we can make that more flexible on the specreduce end, then we might be able to rebuild the object from scratch instead of relying on these references. But I think in cases where the data isn't modified, it is nice to roundtrip and return the same type of trace, which at least involves storing some more information than the arrays themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I now check to see if the "y"-component matches the trace, and if not, create a new ArrayTrace
object with the same image. It turns out that glue does not allow changing the shape of data components, so there is no need for any checks on the lengths, but I do raise an error if the "x"-component no longer matches the dispersion axis of the image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree that the memory impact is not serious given that, from my read of the internals of
Trace
and its sub-classes, bothobj.trace
andobj.image
are references not copies, and this is just carrying along those references. As long asglue
itself does not do any copying later on down the line this is probably "free".
Any data component initialising or being added to a data
object is processed by Component.autotyped()
, which will typically involve np.asarray(data)
and potentially other re-casting operations. So for most ndarray
inputs there will probably be no actual copying, but even if, this only involves obj.trace
. The meta
reference is just an OrderedDict
item.
- If there's some way that glue data objects can flag "have been modified since created", only do the next step if that one is true.
Wondering if the callback mechanism might be used for this, although I do not see an example where this is applied in a Data
component so far, apart from the use in _update_world_components
.
(Oh, and my "If there's some way that glue data objects can flag "have been modified since created", only do the next step if that one is true." comment could use some insight from @astrofrog if @dhomeier doesn't know the answer!) |
additional assert statement in test Co-authored-by: Derek Homeier <dhomeie@gwdg.de>
* and raise error if "x" component is changed
* can rely on the automatic pixel component and recover everything from the original trace object
6680612
to
46c65e5
Compare
Codecov Report
@@ Coverage Diff @@
## main #72 +/- ##
==========================================
- Coverage 97.63% 97.46% -0.17%
==========================================
Files 15 17 +2
Lines 1182 1223 +41
==========================================
+ Hits 1154 1192 +38
- Misses 28 31 +3
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the minor style comments this should be good to go, thanks again!
@eteq I think your change requests are adequately addressed with this (the question which axis is dispersion in fact no longer arises)?
@astrofrog the only potentially contentious issue I see remaining is the specreduce
dependency, but we had a clear majority in favour of that.
Oh, one last thing: please add a changelog entry, and change the current section heading to |
Successful test run, so considering the last commit a |
sorry I didn't get back to this sooner, by a a post-hoc "yes this is good, thanks!" |
This PR implements a new translator for specreduce Trace objects. In doing so, it adds specreduce as a dependency. In order to round-trip and retrieve the same underyling Trace object (instead of always having to export to an
ArrayTrace
), this currently saves a copy of the input Trace object in themeta
of the glueData
object. If this is not preferred, we could create anArrayTrace
from the glueData
object, but we'll likely still need to attach an instance or reference to the input data.