Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing #1266 by reordering nextparents (in v2; v1 is NOT FIXED!). #1274

Merged
merged 9 commits into from
Feb 18, 2022

Conversation

jpivarski
Copy link
Member

@agoose77, just reordering the parents to be monotonic fixes the case you found, but it breaks others. I'm going to leave this open until we know what to do about it.

I included @ianna in the branch name, thinking that this PR in v1 might involve a kernel equivalent to np.argmax, but it might not.

@jpivarski jpivarski linked an issue Feb 1, 2022 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Feb 1, 2022

Codecov Report

Merging #1274 (2d24309) into main (8e68200) will increase coverage by 1.61%.
The diff coverage is 75.86%.

Impacted Files Coverage Δ
src/awkward/_v2/_connect/numba/arrayview.py 96.24% <ø> (ø)
src/awkward/_v2/_connect/numba/layout.py 87.01% <ø> (ø)
src/awkward/_v2/forms/emptyform.py 78.18% <ø> (-0.25%) ⬇️
src/awkward/_v2/operations/convert/ak_from_cupy.py 25.00% <0.00%> (-50.00%) ⬇️
src/awkward/_v2/operations/describe/ak_type.py 44.11% <0.00%> (ø)
src/awkward/_v2/operations/structure/ak_isclose.py 100.00% <ø> (ø)
src/awkward/_v2/operations/convert/ak_to_cupy.py 8.19% <3.44%> (-66.81%) ⬇️
src/awkward/_v2/operations/describe/ak_backend.py 9.52% <9.52%> (ø)
src/awkward/_v2/contents/unmaskedarray.py 54.95% <25.00%> (-1.12%) ⬇️
src/awkward/_v2/contents/indexedarray.py 58.62% <37.14%> (-2.03%) ⬇️
... and 69 more

@agoose77
Copy link
Collaborator

agoose77 commented Feb 4, 2022

Yes, clearly this isn't as simple a fix as I had first hoped. Maybe I've just missed something, but I'll take another look at some point :)

@agoose77
Copy link
Collaborator

agoose77 commented Feb 11, 2022

This fails on the positional reducers, I believe, because argsort isn't guaranteed to preserve the order between equal elements. A quick check with sorter = np.argsort(nextparents, kind="mergesort") seems to confirm this.

I'd quite like to try re-writing the kernel(s) in listoffsetarray so that we can make use of the guarantee that parents is ordered, but that's a bigger task than I have time for right now.

@agoose77
Copy link
Collaborator

Now the test-failures are just typetracer failures (I think, I haven't run all the tests locally).

@jpivarski
Copy link
Member Author

All the tests pass!

Wait—is it the case that the only thing we needed was to sort the parents with a stable sort? That's all that was wrong?

@agoose77
Copy link
Collaborator

Wait—is it the case that the only thing we needed was to sort the parents with a stable sort? That's all that was wrong?

Yep! I was thinking about it whilst trying to redesign the kernel to compute nextparents and nextcarry directly, and realised that there was probably no guarantee of stability in the sort that we were using. It was the fact that it only failed for positional reducers that made me suspicious though.

@agoose77
Copy link
Collaborator

Note to self - if we do merge this, then we should definitely explain why we are using mergesort.

@jpivarski
Copy link
Member Author

No kidding—so that somebody doesn't just say, "Oh, here's a faster sorting algorithm; I'll use that" (and maybe doesn't get segfaults because they're not on a Mac...).

This is definitely a faceplam moment. I'm still astonished.

So yes, we should think about merging this. I'm still keeping in mind your initial assessment that it's a band-aid on a system that's grown pretty complicated. I was talking with @ianna earlier today—she has a different solution so we should take a look at that.

Also, all of these intermediate arrays have a noticeable impact on performance (ak.sum is significantly slower than np.sum), and I've been thinking for a while that we should perhaps have a separate code path for the axis=-1 case. In #579, I was thinking that some of these arrays could be skipped without a separate code path, but we could skip a lot more of them with one. The downside is that looks like more complexity, having two ways to do something, but that all-important axis=-1 case would be less difficult to understand if it's handled in isolation. (If, for instance, someone is looking at it with a profiling tool, trying to see what can be trimmed.)

Then we'd be presenting only the axis != -1 case as something unavoidably complex and slow. As we've learned this week, it takes great effort to understand what it's doing.

@agoose77
Copy link
Collaborator

agoose77 commented Feb 11, 2022

This is definitely a faceplam moment. I'm still astonished.

I don't think this is a facepalm moment! There is a lot going on here, and it's not always clear what our invariants are.

(and maybe doesn't get segfaults because they're not on a Mac...).

Thankfully I think even if the sorter were changed, this would only break positional reducer results, rather than any allocation-related intermediary arrays. It just so happens that if the sorter isn't stable, the values in the reduction groups are visited in different orders, so positional reducers produce incorrect results. I never expected to care about sort stability!

I agree about performance. I was thinking about it abstractly that each kernel is doing an allocation and some work, and as we get larger and deeper arrays this will add up. If we can do things in one (or fewer) passes then it means we can claw back some performance.

RE optimising axis=-1 reductions. I suspect these are very common. It would be interesting to see how much faster they are than the slow path. My only worry is more code to debug (e.g. this reduction bug that only shows up at n>=4), but I am also cognizant of how even a 10% perf gain is substantial. This is certainly more your area of experience in the Awkward domain, so I'm not going to try and push my opinion around!

`stable` is the proper option for this requirement.
@ianna
Copy link
Collaborator

ianna commented Feb 14, 2022

Great! Well done @agoose77 ! Thanks!

@jpivarski jpivarski marked this pull request as ready for review February 18, 2022 02:20
@jpivarski
Copy link
Member Author

I think we're all happy about this fix. There might be a more efficient solution, but this one is correct. I'll be merging it.

@jpivarski jpivarski enabled auto-merge (squash) February 18, 2022 02:21
@jpivarski jpivarski merged commit d0b383f into main Feb 18, 2022
@jpivarski jpivarski deleted the jpivarski-ianna/fix-4D-reducers branch February 18, 2022 03:23
@agoose77
Copy link
Collaborator

agoose77 commented Feb 19, 2022

Thanks all! Note that we didn't fix v1 here, but maybe that's another PR.

@jpivarski jpivarski changed the title Fixing #1266 (in v1 and v2), possibly by reordering nextparents. Fixing #1266 by reordering nextparents (in v2; v1 is NOT FIXED!). Feb 19, 2022
@jpivarski
Copy link
Member Author

I missed that. Now the title is less misleading.

Of course: to do v1, we'd need to run a sorting algorithm in C++. Maybe one of the argsort kernels can be used for that.

@jpivarski
Copy link
Member Author

But yes, another PR. Not this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ak.sum produces incorrect structure for outer dimension
3 participants