-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start flatten implementation and add tests #45
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I should add tests directly, both to illustrate corner cases and to illustrate what it should do.
Oh, and you probably don't need a test of |
…scikit-hep/awkward-1.0 into feature/PR045-flatten-operation
I just put in an example of a When you enter the
|
Uh-oh: I didn't intend to merge the whole master branch into this branch, but if I'm reading it right, that's what I did with 5dde95c. If that's a problem, you can roll back that commit—I only did it to make sure that the |
No problem, I'll update my branch |
…scikit-hep/awkward-1.0 into feature/PR045-flatten-operation
@jpivarski - it's not finished yet. I just wanted to bring it up-to-date with my local branch. I'm a bit confused with the indices as you can see from the tests :-) |
@jpivarski - Could you, please, have a look? I'm not sure if I'm going in the right direction. Besides, I need some help here. If I retrieve the indices for the sliced array, I need an offset: array2 = array[2:-1]
- assert awkward1.tolist(array2.flatten()) == [3.3, 4.4, 5.5]
+ #assert awkward1.tolist(array2.flatten()) == [3.3, 4.4, 5.5] array2 = array[2:-1]
> assert awkward1.tolist(array2.flatten()) == [3.3, 4.4, 5.5]
E assert [0.0, 1.1, 2.2] == [3.3, 4.4, 5.5]
E At index 0 diff: 0.0 != 3.3
E Full diff:
E - [0.0, 1.1, 2.2]
E + [3.3, 4.4, 5.5] |
First of all, with the To see this, try running the pure Python >>> flatten(awkward1.tolist(array))
[0.0, 1.1, 2.2, 5.5, 8.8]
>>> flatten(awkward1.tolist(array[2:-1]))
[5.5] Bigger troublesHowever, you're going to have more difficulty when you get to the next block: content = awkward1.layout.NumpyArray(numpy.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=numpy.int64))
starts = awkward1.layout.Index64(numpy.array([4, 999, 0, 0, 1, 7]))
stops = awkward1.layout.Index64(numpy.array([7, 999, 1, 4, 5, 10]))
array = awkward1.layout.ListArray64(starts, stops, content)
assert awkward1.tolist(array) == [[4, 5, 6], [], [0], [0, 1, 2, 3], [1, 2, 3, 4], [7, 8, 9]] This one not only has gaps, but they're out of order and overlap. Maybe you should concentrate on this one first because of its generality. Suggested strategyThere are many ways to implement these functions, and I see that you're following a procedure for For instance, with the first >>> awkward1.tolist(array)
[[0.0, 1.1, 2.2], [], [5.5], [8.8]]
>>> flatten(awkward1.tolist(array))
[0.0, 1.1, 2.2, 5.5, 8.8] Your kernel can create an std::shared_ptr<Content> out = content_.get()->carry(nextcarry) that >>> content[[0, 1, 2, 5, 8]]
<NumpyArray format="d" shape="5" data="0 1.1 2.2 5.5 8.8" at="0x55f493797580"/> ("gather" in SIMD and numpy.take in NumPy). It doesn't matter what kind of an array Back when all of these things were implemented exclusively in NumPy calls (see old implementation of flatten), we used this trick everywhere: an array of non-negative integers is a partial application of a function on any type of data structure. I came up with this explanation why: considering element-selection from an array (i.e. passing an integer between square brackets) as a function call, treating that array of length
Function composition is associative, so we can apply it whenever we like. In So, given an array like content = awkward1.layout.NumpyArray(numpy.array([0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9]))
starts = awkward1.layout.Index64(numpy.array([4, 999, 0, 0, 1, 7]))
stops = awkward1.layout.Index64(numpy.array([7, 999, 1, 4, 5, 10]))
array = awkward1.layout.ListArray64(starts, stops, content)
assert awkward1.tolist(array) == [[4.4, 5.5, 6.6], [], [0.0], [0.0, 1.1, 2.2, 3.3], [1.1, 2.2, 3.3, 4.4], [7.7, 8.8, 9.9]] The task of your new cpu-kernel would be to generate this
and then pass that content_.get()->carry(nextcarry) to get >>> numpy.asarray(content[[4, 5, 6, 0, 0, 1, 2, 3, 1, 2, 3, 4, 7, 8, 9]])
array([4.4, 5.5, 6.6, 0. , 0. , 1.1, 2.2, 3.3, 1.1, 2.2, 3.3, 4.4, 7.7, 8.8, 9.9]) Actually, you'll need two cpu-kernels: the first one adds up all the |
@jpivarski - thanks, using the offsets fixed the test. Please, review the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put a lot of comments inline. The only actual "request changes" is that the flatten_length
cpu-kernel needs starts_.offset()
and stops_.offset()
and NumpyArray::flatten
needs more extreme testing (described inline).
Once those are done, it's up to you whether you want to extend to axis != 0
in this PR or in a new one. Now that we're collaborating, I'd prefer smaller PRs because then we can merge more frequently.
@jpivarski - I think, I'd prefer to open a new PR for The tests with NumpyArray work, but I couldn't figure out how to test it on the stride tricks - are they allowed? >>> a = np.arange(1,10).reshape(3,3)
>>> a
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> np.lib.stride_tricks.as_strided(a, shape=a.shape[::-1], strides=a.strides[::-1])
array([[1, 4, 7],
[2, 5, 8],
[3, 6, 9]]) |
The examples I have in the inline comment, by slicing a e-dimensional array, should provide enough extreme examples of strides. The The only shape/stride combination I can think of that you can't make with slicing, but can with But you don't need the low-level |
Thanks, @jpivarski! I think, I'm done with this PR. What are the next steps? |
I just resolved the merge conflict (version number), removed the Beyond that, I think it would be time to add The two-dimensional or three-dimensional (assuming the For a structure like this, should If, instead, I'm inclined to define |
When you're sure that intermediate commits don't have to be preserved, give me the okay and I'll squash-and-merge. Thanks! |
Thanks, yes, you can squash-and-merge it. |
Bumps the actions group with 2 updates: [pre-commit/action](https://github.com/pre-commit/action) and [codecov/codecov-action](https://github.com/codecov/codecov-action). Updates `pre-commit/action` from 3.0.0 to 3.0.1 - [Release notes](https://github.com/pre-commit/action/releases) - [Commits](pre-commit/action@v3.0.0...v3.0.1) Updates `codecov/codecov-action` from 4.0.1 to 4.3.0 - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@v4.0.1...v4.3.0) --- updated-dependencies: - dependency-name: pre-commit/action dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions - dependency-name: codecov/codecov-action dependency-type: direct:production update-type: version-update:semver-minor dependency-group: actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
TODO:
*
add a C++ test for a RawArrayOf@jpivarski - please, comment