Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Multidimensional Slicing (Numpy Issue) #541

Closed
calben opened this issue Mar 20, 2014 · 19 comments · Fixed by #1968
Closed

Support for Multidimensional Slicing (Numpy Issue) #541

calben opened this issue Mar 20, 2014 · 19 comments · Fixed by #1968
Labels

Comments

@calben
Copy link
Contributor

calben commented Mar 20, 2014

Numpy has support for multidimensional arrays like so:

>>> x = numpy.array([[x for x in range(10)] for y in range(10)])
>>> x[1:5,::2]
array([[0, 2, 4, 6, 8],
       [0, 2, 4, 6, 8],
       [0, 2, 4, 6, 8],
       [0, 2, 4, 6, 8]])

It's a beautiful way of handling matrices but it brings up the issue that it takes in a tuple as an argument for slicing, making it difficult to use one of Numpy's greatest features in Hy.

I don't think this is necessarily a bug, since it's an issue with an external library's extension of Lisp syntax, but a macro to fix this issue would be amazing.

And here I once thought Python's slicing notation was beautiful and convenient.

@calben
Copy link
Contributor Author

calben commented Mar 21, 2014

After a helpful StackOverflow question pointed out the obvious, multidimensional Numpy arrays can be sliced without the [] like so:

    x = np.array([list(range(5)) for x in list(range(5))])
    x.getitem(slice(1,4))
    array([[0, 1, 2, 3, 4],
           [0, 1, 2, 3, 4],
           [0, 1, 2, 3, 4]])
    x.getitem(tuple([slice(1,4),slice(1,4)]))
    array([[1, 2, 3],
           [1, 2, 3],
           [1, 2, 3]])

Yea... that turned out to be as simple as it should be.
Asked on StackOverflow after missing the obvious when trawling through C code and Python wrappers to figure it out.

This can be applied in Hy directly using get:

    => (def x (numpy.array [[1 2 3] [1 2 3] [1 2 3]]))
    => x
    array([[1, 2, 3],
           [1, 2, 3],
           [1, 2, 3]])
    => (list (get x (slice 0 3) (slice 1 3)))
    [array([1, 2, 3]), array([1, 2, 3])]

If support for this is to be added to the standard library, the cut function could be extended to support it, but this might either make calling the cut function more awkward or make its behaviour surprising with this added functionality.

Should another function be added to support this?
Should a note somewhere be added that this is how you perform multi-dimensional cuts and the code left as it is?
Should the cut function support this?

@Womble
Copy link

Womble commented Aug 13, 2015

I'm not sure if this is the best place to put this, but as the linked issues seem more general about replacing splice I'll leave it here

Would it be worth introducing a reader macro for creating complex slices? I've had a look around and there doesnt seem to be an easy way of doing this in hy at the moment. After a bit of playing around I came up with

(defreader | [expr]
(let [[sl (get --builtins-- "slice")]]
    `(get ~(car expr) (, ~@(list-comp (if (= (type x) HyExpression)     
        (+ '(sl) x)
        x)
    [x (cdr expr)])))))

which takes e.g. #|(arr 0 (1 10 2) Ellipsis 3) and converts it into the equivalent of arr[0,1:10:2,...,3]

@gilch
Copy link
Member

gilch commented Aug 13, 2015

It seems worth a reader macro to me. I think the cut form should support this, and be aliased as #:. I'm just not sure how multidimensional slices should look in a Lisp, but the above doesn't look too bad. I don't like that we have to insert Nones, but I'm not sure how else to do it.

@gilch
Copy link
Member

gilch commented Aug 17, 2015

On second thought, perhaps #: should be a slice literal?

So

(cut arr (, 0 #:(1 10 2) Ellipsis 3))
arr[0,1:10:2,...,3]

Maybe the tuple part should be implied:

(cut arr 0 #:(1 10 2) Ellipsis 3)

But then the more common one-slice cut would be:

(cut arr #:(1 10 2))
arr[1:10:2]
# or equivalently?
arr[slice(1,10,2)]

A better option might be a cuts macro that puts in the tuple, so the current cut is unchanged.

Maybe we could set up an internal reader macro to use :[1 10 2] instead of #:(1 10 2) for slice. This shouldn't interfere with keywords, because [] isn't allowed in keywords anyway.

(cuts arr 0 :[1 10 2]  Ellipsis 3)

Also Ellipsis is a lot harder to type than ..., so perhaps we could have :[] stand for that.

(cuts arr 0 :[1 10 2] :[] 3)

@jakirkham
Copy link
Contributor

Related question to this. How should one going about assign a value to an array slice?

@refi64
Copy link
Contributor

refi64 commented Sep 30, 2015

@jakirkham I believe you can do:

(setv (cut mylist a b) [1 2 3])

I do that in HyTest.

@jakirkham
Copy link
Contributor

Thanks @kirbyfan64 and that does work with a numpy.ndarray.

I was also wondering how this might be expanded to more dimensions or would this simply be a drop in replacement of cut with @gilch's proposed changes?

@jakirkham
Copy link
Contributor

Related question, given this (setv a (np.zeros (tuple [10 10]))), why does this happen?

=> (slice 1 3)
slice(1L, 3L, None)
=> (cut a (slice 1 3))
IndexError: invalid slice

I thought maybe if I made it was a singleton tuple this would help (given numpy frequently takes these).

=> (tuple [(slice 1 3)])
(slice(1L, 3L, None),)
=> (cut a (tuple [(slice 1 3)]))
IndexError: invalid slice

On the other hand, using __getitem__ and __setitem__ works in these cases.

=> (a.__getitem__ (slice 1 3))
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])
=> (a.__getitem__ (tuple [(slice 1 3)]))
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

@refi64
Copy link
Contributor

refi64 commented Sep 30, 2015

@jakirkham

I believe your code is actually compiled to:

a[slice(1, 3):]

which doesn't make much sense. Try:

(cut a 1 3)

Tip: whenever something seems odd, use hy2py (given a Hy script, prints the generated Python code) or hy --spy (prints the Python code after everything entered into the REPL). They're super handy for figuring stuff out.

@gilch
Copy link
Member

gilch commented Sep 30, 2015

That tip really ought to be in the docs somewhere.

@calben
Copy link
Contributor Author

calben commented Sep 30, 2015

I'll look into adding a note about this in the docs, but I'm not quite sure where it would go right now.

@gilch
Copy link
Member

gilch commented Sep 30, 2015

Maybe in docs/tutorial.rst "Protips!".

@troilusc
Copy link

One way to do x[1:5,1:2] is (get x (slice 1 5) (slice 1 2)) in hy.

@Kodiologist
Copy link
Member

I have a macro geta in Kodhy that lets you write, e.g., (geta x 1 : 2) to mean x[1, :, 2].

@gilch
Copy link
Member

gilch commented Jan 4, 2018

I came up with

(deftag $ [expr]
  `(slice ~@(list-comp (if (= x '.) None
                           (= x '...) 'Ellipsis
                           x)
             [x expr])))

It makes slice objects much more concise, e.g., #$[.] instead of (slice None) and #$[. ... -1] instead of (slice None Ellipsis -1). It's still not quite as compact as Python though. Even with #1481 eliminating the extra tuple x[1:5, 1:2] would be (. x [#$[1 5] #$[1 2]]).

You could do something similar with a normal macro [Edit: no, dots do weird things to HyExpressions.]

(defmacro $ [&rest args]
  `(slice ~@(list-comp (if (= x '.) None
                           (= x '...) 'Ellipsis
                           x)
             [x args])))

Then x[1:5, 1:2] would be (. x [($ 1 5) ($ 1 2)]). This actually seems pretty good to me, but it's still not as concise as Python. I'm not sure we can do much better.

Perhaps . or : (or both) inside [] could be special-cased in the . DSL macro somehow. (Though . is currently a special form, it shouldn't be.) I'm not sure how it should work though. It needs to be completely unambiguous where one slice starts and another ends.

Perhaps (. x [(: 1 5) (: 1 2)]) or (. x [:[1 5] :[1 2]]). Macros can do this sort of thing, but it's not really better. How do we want this to work?

@gilch
Copy link
Member

gilch commented Feb 21, 2018

I stumbled upon this https://qiita.com/riktor/items/cd914612673fe7828a8d our slicing is inadequate so someone fixed it with a macro. (warning, Japanese). But that's one possible syntax we could have.

@josiah14
Copy link

josiah14 commented Feb 24, 2019

I think since LISP is supposed to center on LISt Processing, I think it makes sense that the Hy community should be somewhat concerned about having excellent support for Python's #1 vector and matrix processing library (NumPy, and the superset of that, Pandas). I'm happy to see this discussion is still ongoing and hasn't died quite yet.

I want to stress that when coming up with a good syntax for slicing to keep the machine learning and data science oriented Pythonists happy, that we don't forget about the third parameter one can throw into a slice which can define the step (and also the direction), and also the ability to reference multiple specific indices at once (and even mix both ranged and specific indices in the same statement) and also use a mask, because those are a powerful and oft-used shorthand for a lot of Pythonists and data scientists. I didn't see this third "step" parameter referenced at all at the Japanese article, but I'm very glad to see it in an example further up in this discussion (here) and also in the example given in the original post.

I also think brevity should be emphasized because this is something that would be as common for many people who might want to use Hy + Pandas as using conjunctions and articles in English. For that reason, I think the idea listed above of slice literal is really attractive. I'm a bit new to LISP (but I'm loving it!) so I don't know how possible it would be to do this in a performant way... but I would like to see a concise syntax that can handle something as complex as the below from this example of the SciPy Lectures

>>> import numpy as np
>>> a = np.array([[0,1,2,3,4,5],
...    [10,11,12,13,14,15],
...    [20,21,22,23,24,25],
...    [30,31,32,33,34,35],
...    [40,41,42,43,44,45],
...    [50,51,52,53,54,55]])

>>> a[(0,1,2,3,4), (1,2,3,4,5)]
array([1,12,23,34,45])

>>> a[3:, [0,2,5]]
array([[30,32,35],
           [40, 42, 45],
           [50,52,55]])

>>> mask = np.array([1,0,1,0,0,1], dtype=bool)
>>> a[mask, 2]
array([2,22,52])

What seems like a decent starting point to me for dreaming up a decent syntax might be something like the following:

#:(a (, 0 1 2 3 4) (, 1 2 3 4 5)) ;; indexing

#:(a (: 3 _) (, 0 2 5)) ;; mixing ranges and indexing

#:(a (: 3 10 2)) ;; a range with the step size included

(setv mask (np.array [1 0 1 0 0 1] :dtype bool)
#:(a mask (, 2))  ;; or even better, #:(a mask 2)

Of course, I haven't been super active in the Hy community, yet, so I wouldn't blame anyone for completely disregarding my thoughts on this, but I also really like Hy and thus felt the desire to provide my perspective as someone in the big data analytics world working closely with data scientists every day.

@guanyilun
Copy link

Just want to share my implementation (based on the Japanese article),

  (defn parse-colon [sym]
    (list (map (fn [x]
                 (if (empty? x)
                     None
                     (int x)))
               (.split (str sym) ":"))))

  (defn parse-indexing [sym]
    (cond
      [(in ":" (str sym)) `(slice ~@(parse-colon sym))]
      [(in "..." (str sym)) 'Ellipsis]
      [True sym]))

  (defmacro nget [ar &rest keys]
    `(get ~ar (, ~@(map parse-indexing keys))))

It should work for the cases mentioned above. For example,

(setv a (.reshape (np.arange 36) (, 6 6)))
(setv mask (np.array (, 1 0 1 0 0 1) :dtype bool))
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])
array([ True, False,  True, False, False,  True])
(nget a (, 0 1 2 3 4) (, 1 2 3 4 5))
(nget a 3: (, 0 2 5))
(nget a 1:-1:2 3:5)
(nget a ::2 3 None)
(nget a ... 0)
(nget a mask 2)
array([ 1,  8, 15, 22, 29])
array([[18, 20, 23],
       [24, 26, 29],
       [30, 32, 35]])
array([[ 9, 10],
       [21, 22]])
array([[ 3],
       [15],
       [27]])
array([ 0,  6, 12, 18, 24, 30])
array([ 2, 14, 32])

@josiah14
Copy link

josiah14 commented Mar 1, 2019

That solution worked perfectly for me, @guanyilun. Thanks!

I just wrote a little extra code to provide a Tag Macro:

(deftag s [code]
    `(nget ~@code))

(comment example usage, below)
#s(a 3: (, 1 2 3 4))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants