-
-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Masked select and Masked fill #429
Conversation
awesome! |
Heh, indeed. I remember learning fancy indexing in numpy the first
|
PR ready:
The next step would be to integrate those and index_select in the |
* index_select should use SomeInteger not SOmeNumber * Overload index_select for arrays and sequences * Masked Selector overload for openarrays * Add masked overload for regular arrays and sequences * Initial support of Numpy fancy indexing: index select * Fix broadcast operators from #429 using deprecated syntax * Stash dispatcher, working with types in macros is a minefield nim-lang/Nim#14021 * Masked indexing: closes #400, workaround nim-lang/Nim#14021 * Test for full masked fancy indexing * Add index_fill * Tensor mutation via fancy indexing * Add tests for index mutation via fancy indexing * Fancy indexing: supports broadcasting a value to a masked assignation * Detect wrong mask or tensor axis length * masked axis assign value test * Add masked assign of broadcastable tensor * Tag for changelog [skip ci]
This is the first step in supporting NumPy fancy/advanced indexing.
The second step being extending the
[]
and[]=
macros to automatically dispatchto either of the
index_select
,masked_select
,masked_fill
or the about 10 other slicing variations that already exist.Thanks Base2 Genomics for sponsoring the work (https://base2genomics.com/)
State
More tests to be added and documentation to be improved.
masked_axis_fill with a tensor instead of just a scalar which is very common for dataframes.
The current PR is currently more of a RFC to discuss proc names and clarify documentation.
API design
Turns out that the API is not straightforward at all and it's very easy to get confused, feedback welcome, that would also significantly help in improving the documentation:
masked_select
Inspiration Numpy: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#boolean-array-indexing
and PyTorch: https://pytorch.org/docs/stable/torch.html#torch.masked_select
In Arraymancer:
Arraymancer/tests/tensor/test_selectors.nim
Lines 43 to 57 in 7db6023
Signature: masked_select(t, mask) -> 1D Tensor
Tricky part
This returns a 1D Tensor like Numpy and PyTorch. As shown in this Stack overflow thread: https://stackoverflow.com/questions/51586364/tf-boolean-mask2d-2d-gives-1d-result
some users may want to filter the non-zero from
[[1,0],[2,3]]
and obtain[[1], [2, 3]]
.This is not a valid "dense" tensor as [1] does not have the same shape as [2, 3].
Tensorflow provides
RaggedTensor
and PyTorch is exploring NestedTensor (https://www.tensorflow.org/guide/ragged_tensor) for this case.masked_axis_select
Inspiration Numpy: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#boolean-array-indexing
and Tensorflow: https://www.tensorflow.org/api_docs/python/tf/boolean_mask
Tricky part
The mask is 1D and corresponds to axis indices that will be retained or dropped.
Tensorflow allows multi-dimensional mask, I don't know at all what kind of result I should expect from a 2D masking operation with an axis so I assume it's Python overloading flexibility and combining both will result in an error
masked_fill
Straightforward: take a mutable tensor and a mask of the same shape and if the mask element is true fill with value.
masked_axis_fill
Inspiration Numpy: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#boolean-array-indexing
(from #400 (comment))
Arraymancer/tests/tensor/test_selectors.nim
Lines 91 to 102 in 7db6023
Tricky part
The mask is 1D and corresponds to axis indices that will be filled or skipped.
masked_filled_along_axis
Inspiration: my flawed initial understanding of Numpy boolean masking
It takes a N-D mask, iterate along the axis of a tensor and apply
masked_fill
on the axis slice.Arraymancer/tests/tensor/test_selectors.nim
Lines 105 to 116 in 7db6023
Tricky part
Notice how
masked_axis_fill
andmasked_filled_along_axis
have very similar names, similar application but it's easy to confuse the axes:Arraymancer/tests/tensor/test_selectors.nim
Line 96 in 7db6023
Arraymancer/tests/tensor/test_selectors.nim
Line 110 in 7db6023