-
Notifications
You must be signed in to change notification settings - Fork 39
Behavior of .all() is not consistent with numpy #166
Comments
The same goes of course for similar reduction methods like Edit: The last claim is at least true for |
The reducers are consistent with Numpy with an In awkward 1.0, all operations on awkward arrays will be in the module namespace, so Because arbitrary |
Personally I think this default behavior of assuming each reducer is on |
A claim about which default behavior is more useful is subjective because it obviously depends on your use cases. The most common reduction I've used is Incidentally, I think you can meaningfully reduce along an arbitrary axis in |
My example (written in #167) was supposed to be here: >>> a = numpy.ones((2, 3, 4), dtype=int)
>>> a
array([[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]],
[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 1]]])
>>> a.sum(axis=0)
array([[2, 2, 2, 2],
[2, 2, 2, 2],
[2, 2, 2, 2]])
>>> a.sum(axis=1)
array([[3, 3, 3, 3],
[3, 3, 3, 3]])
>>> a.sum(axis=2)
array([[4, 4, 4],
[4, 4, 4]]) Here's a version with some zeros: >>> b = numpy.array([[[1, 1, 1, 1], [1, 1, 1, 0], [1, 1, 0, 0]], [[1, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]])
>>> b
array([[[1, 1, 1, 1],
[1, 1, 1, 0],
[1, 1, 0, 0]],
[[1, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]]])
>>> b.sum(axis=0)
array([[2, 1, 1, 1],
[1, 1, 1, 0],
[1, 1, 0, 0]])
>>> b.sum(axis=1)
array([[3, 3, 2, 1],
[1, 0, 0, 0]])
>>> b.sum(axis=2)
array([[4, 3, 2],
[1, 0, 0]]) Now what if instead of zeros, we had gaps... >>> c = awkward.fromiter([[[1, 1, 1, 1], [1, 1, 1], [1, 1]], [[1], []], []])
>>> c
<JaggedArray [[[1 1 1 1] [1 1 1] [1 1]] [[1] []] []] at 0x78e211da4400> Note that I did emptiness three different ways: (1) I turned a It's clear what an >>> c.sum()
<JaggedArray [[4 3 2] [1 0] []] at 0x78e212497588> Compare this to >>> b.sum(axis=-1)
array([[4, 3, 2],
[1, 0, 0]]) The first row is the same because nonexistent elements are equivalent to zeros in summation and we haven't dealt with any of the tricky empty cases. The second row is different because while So we have
an It gets even harder to give a well-reasoned definition for >>> b.sum(axis=0)
array([[2, 1, 1, 1],
[1, 1, 1, 0],
[1, 1, 0, 0]]) For the flat array That's why I decided some time ago that we can't make sense of Muon_pt.max(axis=0) give you an In awkward 1.0, when this is done in C++, we can have a reasonable implementation of an output array that grows each time we see a larger inner array (using >>> c.pad(c.flatten().counts.max(), axis=1).fillna(0).pad(c.counts.max(), axis=0).fillna(0)
<JaggedArray [[[1 1 1 1] [1 1 1 0] [1 1 0 0]] [[1 0 0 0] [0 0 0 0] [0 0 0 0]]] at 0x7f03145497f0> before running Numpy's |
I'm going to close this here, though the |
So, it turns out that |
Consider the difference between
.all()
with anumpy.ndarray
and aJaggedArray
:One would certainly intuitively expect
.all()
to give the same result in both of these cases, but for aJaggedArray
an array is returned rather than abool
. Numpy does a logical AND over all the dimensions of the input array, whereas awkward seems to only do an AND over the final dimension. Is there a good reason whyAwkwardArray.all()
doesn't follow the same convention as numpy's default.all()
?The text was updated successfully, but these errors were encountered: