-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: structure removal order for unions #3397
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pfackeldey - Great! Thanks for providing the fix so quickly! Indeed, iterating over the unique index fixes the issue.
src/awkward/contents/unionarray.py
Outdated
@@ -1574,7 +1574,8 @@ def _remove_structure( | |||
self, backend: Backend, options: RemoveStructureOptions | |||
) -> list[Content]: | |||
out = [] | |||
for i in range(len(self._contents)): | |||
_, unique_index, *_ = numpy.unique_all(self._tags.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pfackeldey - should it be Numpy-like? To cover the cupy
as a backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is to cover all backends (also typetracer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, wait, you're right. I'm using the global numpy
nplike. Let me have a look again to use the nplike of the content class... 🤦
this PR does however make the assumption that the number of unique tags corresponds to the |
@pfackeldey just reading the comments and not the diff -- import awkward as ak
import numpy as np
x = ak.Array([1, "hi", 2, "bye"])
y = x[[0,2]]
assert np.unique(y.layout.tags).size == len(y.layout.contents) |
Ah I see. In this case this implementation is still working correct though: # with this PR
x = ak.Array([1, "hi", 2, "bye"])
y = x[[0,2]]
ak.ravel(y)
# <Array [1, 2] type='2 * int64'> on main this actually fails, which I would consider wrong aswell (so this PR might fix another bug aswell?): x = ak.Array([1, "hi", 2, "bye"])
y = x[[0,2]]
ak.ravel(y)
# ... > AssertionError: cannot merge NumpyArray with ListOffsetArray The loop over the unique ordered tags will pick (in order) the contents where tags point to. This is always less or equal to the length of contents. Do you see any problem with that @agoose77 ? |
@pfackeldey the original code has two bugs (which are the same kind of bug)! This PR fixes one, but not the other I think. The intention of For trivial-unions (e.g. a union of integer and bool), the proper way to flatten the layout in an order-preserving way is to write a kernel that fills a new buffer with Moreover, as we are dealing with a union, to remove the structure probably means we want to simplify a union over the results such that the union disappears and the result is simplified to a common type. So, maybe the proper way to do this is to build a new union via But, I also feel compelled to say -- |
Fixes: #3180