Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax extension: list unpacking resp. destructuring #40

Open
fingolfin opened this issue Mar 16, 2015 · 12 comments
Open

Syntax extension: list unpacking resp. destructuring #40

fingolfin opened this issue Mar 16, 2015 · 12 comments
Assignees
Labels
gapdays2015-spring Issues and PRs that arose at https://www.gapdays.de/gapdays2015-spring gapsagedays2016 Issues and PRs that arose at https://www.gapdays.de/gap-sage-days2016 kind: new feature topic: GAP language

Comments

@fingolfin
Copy link
Member

Python (and various other languages) has a nice feature for unpacking a list into multple variables:
a,b,c := [1,2,3];
assigns the values 1,2,3 to the variables a,b,c. This is quite useful for functions returning multiple values as tuples. Say you have a function which returns two values in a list. Then often, you might write code like this:

tmp := QuotientRemainder(17,4);
q := tmp[1];
r := tmp[2];

which you can turn into this with list unpacking:

q,r := QuotientRemainder(17,4);

There are various pitfalls with that, though, once you go beyond plain identifiers on the left side. But I think python already thought about most (?) of those. Here are some example of a Python session (note that Python indexes starting from 0):

>>> x,x = [1,2]; x
2
>>> x,x[1] = [[1,2],42]; x
[1, 42]
>>> y = [17,23]
>>> x,y[x] = [1,2]; y
[17, 2]

But I am not sure whether we can "easily" support that in our parser...?

@fingolfin fingolfin added the kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements label Mar 16, 2015
@stevelinton
Copy link
Contributor

This is a bit tricky because we don't really have (internally in the parser) a proper notion of an "lvalue" (somewhere that an assignment can store something) or of an "l-expression" essentially an expression that evaluates to an l-value. As a result it is hard to store the multiple left-hand sides of the assignment after you parse them and before you get to the RHS. Obviously it could be done, but it would not be a small change.

@mohamed-barakat mohamed-barakat added the gapdays2015-spring Issues and PRs that arose at https://www.gapdays.de/gapdays2015-spring label Mar 17, 2015
@fingolfin fingolfin added kind: new feature and removed kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements labels Dec 16, 2015
@fingolfin
Copy link
Member Author

@ChrisJefferson this is one of the issues we discussed earlier today

@fingolfin fingolfin added the gapsagedays2016 Issues and PRs that arose at https://www.gapdays.de/gap-sage-days2016 label Jan 15, 2016
@fingolfin
Copy link
Member Author

To mention some of the things discussed on #457: The only use case I care about, and which I think is useful in many places, is the one were the left side is a list of variable names, i.e.

x,y,z := [1,2,3];
q,r := QuotientRemainder(34,3);

It was pointed out on #457 that dealing with e.g. the following would be tricky, and my initial reaction to this is "well, then let's not support it"

y := [4,5,6];
i := 2;
x := y;
x, i, x[i] := [[1,2,3],1,4];

Of course there still might some usecases of this kind people might like to have, e.g. to assign to records:

res:=rec();
res.q, res.r := QuotientRemainder(34,3);

But (a) if implementing this is hard, I don't feel bad not supporting it (we are not taking away anything from anybody, after all), and (b) of course we can still think about the possibility to relax the restriction to "pure list of variables names on the left hand side" a bit in the future; if we can come up with clear and somewhat intuitive rules for that, which also can be implemented with sane effort, sure, fine, let's do it; but let's not prevent ourselves from implementing a useful extension just because there might be an extension of the extension that we can't bother to deal with...

One minor extension that might be worth discussing is whether there should be a way to "capture the rest" of a list... I.e. one may want to capture the first two entries, and ignore the rest, or assign it all to a new list. Say, if there was a new function PolynomialRingWithIndeterminates which is called like PolynomialRing but returns a list containing the ring as first element, and then the indeterminates as the rest. Then I could write

R, x, y := PolynomialRingWithIndeterminates(Rationals, ["x", "y"]);

But sometimes I may not care about the indeterminates individually, then it would be handy if I could write, say

R, *indets := PolynomialRingWithIndeterminates(Rationals, 10);

to get all 10 indeterminates into indet. Here the * is a gadget to indicate GAP that it should put all remaining list entries into indets, instead of complaining that the number of left hand variables does not match the right hand list length.

This is similar to a feature in Python 3, see https://www.python.org/dev/peps/pep-3132/ (theirs is more generic, though)

@fingolfin
Copy link
Member Author

Regarding "capturing the rest of the list", maybe this syntax would fit better, as it imitates our vararg syntax:

  R, indets... := PolynomialRingWithIndeterminates(Rationals, 10);

@ChrisJefferson
Copy link
Contributor

Just a basic comma seperated list a,b,c := f() doesn't feel that hard to support. I'm not sure how much I would use it myself, but I can see the advantages and would be happy to have it.

@ChrisJefferson
Copy link
Contributor

The notation for R, indets... := should probably line up with #2043, however we do that.

@fingolfin
Copy link
Member Author

Should it, though? In function varargs and also here, foo... is about capturing the "rest" of a list. E.g. head, tail := [1,2,3]; to set head:=1; tail:=[2,3];.

Whereas #2043 is more about a "dual" use case: expanding a list resp interpolation. And unlike the examples here, this interpolation can be used multiple times, eg x:=[*list1,*list2];.

So having very similar syntax for these two related but different concepts is IMHO not necessarily a win...

@ChrisJefferson
Copy link
Contributor

ChrisJefferson commented Feb 22, 2018

That's a reasonable argument!

To be honest, having thought further, I'm happy with either notation.

@fingolfin
Copy link
Member Author

Well, just to clarify, I am also not opposed to using ... for interpolation. Then one could write x:=[list1..., list2...]. I am pretty sure both a prefix and a suffix for this are equally easy to implement, and it wouldn't be too hard to switch; i.e. as long as there is some consensus that we are interested in such a feature, then one could make a prototype PR, and have another vote on which syntax is preferred.

Or perhaps this whole idea should first be brought up on the gap@ list to see what people think.

@stevelinton
Copy link
Contributor

I want this feature to work, but I remain a bit concerned that we will end up with lots of bug reports of the form "Why doesn't x[1],y[2] := QuotientRemainder(...) work?". Essentially it breaks orthogonality -- what can appear on the LHS of a regular assignment, doesn't on the face of it seem to have anything to do with how many LHS expressions there are.

So that opens up all the problems I raised long ago about order of evaluation of LHSs and assignment to them. Even if we could agree a precise and reasonably intuitive solution to that, it would require a fairly major rewrite of the interpreter to fix because we don't really have a notion of lvalue, but we also don't want to do much lookahead.

I can buy the theory that we acknowledge these problems and just stick to the simplest possible case (a list of plain variables) although even then, if one or more of them is an AutoGVar (or a thread-local variable with an initializer) then we have to be clear WHEN the side effects happen (before or after evaluating the RHS, for instance). However, in this case, I think we should keep it as simple as possible, which speaks against any notation for "capturing the remaining arguments".

There is also the question of what is on the other side. If a,b,c := foo(); matches return [1,2,3]; inside foo, then on the one hand we more opportunities to use this syntax with existing functions that return lists. On the other hand,a := foo(); is qualitatively different from a,b := foo() which is untidy. If the counterpart is instead return 1,2,3; then we have to do a little more work to support that, and match up call and return, but we can have a := foo(); set a to 1, which seems nicer to me. Magma has functions that return homomorphism, image, kernel. for instance, and it would be quite common to just want the first one.

Another way to deal with that one, and maybe also make some things a bit cleaner would be [a,b,c] := foo(); (vaguely inspired by PERL) This also allows you to pick up just the second value [,b] := foo(); and makes it clear what [a] := foo(); does. This could be extended to rec(x := a, y := b) := foo(); matching return rec(x := 1, y := 2, z := 3); in foo().

I'm not trying to be awkward, but for something as fundamental as assignment I think it's worth trying to find the "right" answer.

@fingolfin
Copy link
Member Author

I am glad to get feedback and to discuss this extensively, nothing awkward about it :-). I appreciate it, thank you for taking the time!

I find the idea of allowing return 1,2,3 quite interesting, will think more about it. Some ideas how it might "look like" compared to list unpacking:

gap> f := x -> x,1,2;; # errr... assuming we can and want to allow this?
gap> y := f(0);
0
gap> a,b,c := f(0);  # is the following the output we'd want?
0
1
2
gap> a; b; c;
0
1
2
gap> f(0);  # is the following the output we'd want? I.e. a "stack" of return values?
0
1
2

# To capture all return values, perhaps do something like this:
gap> l... := f(0);
[ 0, 1, 2 ]
# resp. this (syntax variant)
gap> *l := f(0);
[ 0, 1, 2 ]

# For comparison, this is what list unpacking would do
gap> g := x -> [x,1,2];;
gap> y := f(0);
[ 0, 1, 2 ]
gap> a,b,c := f(0);
[ 0, 1, 2 ]
gap> a; b; c;
0
1
2
gap> [a,b,c] := g(0);
Syntax error: BLA ???  or should it "work"? what about this:
gap> [a,b,c] := [1,2,3];
????

But it is seems to me to be somewhat orthogonal to (a) all the problems you point out, and (b) the desire to have list destructuring or unpacking, like in Python 3, where I can also do things like this:

>>> head,*tail = [1,2,3]
>>> head
1
>>> tail
[2, 3]

Regarding the "untidy" difference between a,b,c := foo(); vs. a := foo(): While I see the formal point, I don't really think it would be a problem in practice. At least Python 3 also seems to get away with it just fine. If one really wants to unpack a singleton list, then one can write a := foo()[1] or a,*ignore := foo().

As to people asking "Why doesn't x[1],y[2] := QuotientRemainder(...) work?" I'd have no qualms to answer that with "Because there are difficult semantical issues with that, which we so far had no time to resolve. For details, please refer to URL.". Also, it would be fairly easy to document: The LHS can either be a simple sequence ident1, ident2, ..., identN of variable identifiers, or else a variable identifier followed by selectors, as is the case now: var{poss1}[idx]{poss2}. Rather simple to document, and to explain, even when admittedly the reasons why it's not possible to do var1_with_selectors, ..., varN_with_selectors may not be immediately clear.

This leaves the question about auto vars (and thread locals with initializers); I am not afraid of dealing with them, but indeed, we should be very careful to design the semantics so that a future extension (where the LHS can be more general) is not hampered by having to deal with extending "old-style semantics" in an unnatural or super complicated way (i.e.: if we now said "first the RHS is evaluated, then the LHS", and later we want or need to do it differently in at least some cases, that'd be super awkward).

So, yeah, by all means, let's think about it, and see if we can come up with a plan that we consider future proof (whether we support selectors in the LHS var list at first or not is IMHO secondary).

I'll start by pointing out that right now, GAP evaluates the LHS first, then the RHS, simply because we are eager. That is a bit unfortunate, IMHO, as it doesn't match what many other languages do, and it indeed makes it much harder to extend things (this is also the reason we never implemented in-place modification operators like a += b, I think...)

To wit:

gap> x:=[1,2,3];; y:=x;; f:=function() x:=[42]; return 5; end;;
gap> x[2] := f();
5
gap> x; y;
[ 42 ]
[ 1, 5, 3 ]

In contrast, we get this in Python 3:

>>> def f():
...     global x
...     x = [42]
...     return 5
...
>>> x = [1,2,3]; y = x;
>>> x[0] = f()
>>> x
[5]
>>> y
[1, 2, 3]

Trying to assign to x[1] instead gives an error:

>>> x = [1,2,3]; y = x;
>>> x[1] = f()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range
>>> x
[42]
>>> y
[1, 2, 3]

Also interesting is to see what Python does here when using +=:

>>> x = 5
>>> x += f()
>>> x
10

So Python really treats this as x = x + f().

In particular, Python first evaluates the RHS, before it starts to evaluate the LHS. Which is what I'd like for GAP, except it's of course far too late for that, I am afraid: we eagerly evaluate the LHS first, then the RHS, in the order we read them.

Getting back to GAP: What should happen if x[1], x[2] := f(); is executed and f() modifies x? Well, I think we should take into account what the above example did: the original list (assuming it was one) stored in x is modified here, even if f changes the value of the global. And since everything (?) else in GAP works strictly from left to right, I'd also expect it here. So, I'd expect this:

gap> x:=[1,2,3];; y:=x;; f:=function() x:=[42]; return [0,[10],20]; end;;
gap> x[1], x, x[1] := f();
[ 0, [ 10 ], 20 ]
gap> x;  # first f is evaluated, but then we assign a new value to x afterwards
[ 10 ]
gap> y;  # f changes x, but both x[1] in the assignment are equivalent to y[1]
[ 20, 2, 3 ]

Regarding AutoGVar: I'd try to emulate whatever happens now. If x is an auto-gvar, what do we do when we execute x:=f(); resp. x[1] := f() assuming f also accesses x (thus causing it to be instantiated)? (I have to leave now, so I am not going to try it out for now).

(Doing that of course means we need to somehow track the LHS references, but having stared extensively at the code for a few days, I think it could be done with moderate effort... of course the devil is in the details, so in the end, we'll just have to try it and risk failure, but I think even failure would teach us valuable things).

Anyway, perhaps with these ideas, we can also revisit the idea of implementing a[1] += f() ?

If the above idea for semantics does not sounds to non-sensical and impossible, and if there is some general interest in at least considering this feature for inclusion (of course assuming the implementation has sane semantics etc. etc.), then I would dare trying to work on a prototype, with which one could hopefully discuss this further and find new obstacles etc. ;-).

Finally: I am not quite sure what the syntax [a,b,c] := f() solves compared to a,b,c := f(), other than the ability to do [,b,] := f(), but then, couldn't one also allow ,b,:=foo() ?
Perhaps to make it more similar to the proposed syntax for records (which is also quite intriguing, but I'd rather not tackle that for now..., and focus on lists)

@fingolfin
Copy link
Member Author

fingolfin commented Feb 23, 2018

Regarding return 1,2,3;: A logical extension then would be to also allow 1,2,3 as a general RHS, and so e.g. allow

x,y := y,x;

to swap the content of x and y, just like one can do in Python. However, the intent here is that one can use this to optionally ignore all values other than the first; so that means we'll get some behavior different from e.g. Python, too:

x := 1,2

would be equivalent to x:=1, while in Python it'd assign the tuple (1,2) to x.

I am a bit unhappy about the behavior of this last example; e.g. for us Germans, we use decimal commas, so it is quite conceivable that somebody mistypes x:=1.2; as x:=1,2; -- with the above approach, they wouldn't get any warning or error. So perhaps one should restrict to return 1,2,3 (though the swap pattern x,y := y,x is quite tempting). Or perhaps one should not implement this "magically drop everything but the first value approach at all?

Another question: If we were to implement this, should we also allow the following? (How does Magma handle this?)

f := x -> x,1,2;
a :=f(0);  # set a to 0;  drop 1 and 2
a,b :=f(0);  # set a to 0, b to 1; drop 2
a,b,c :=f(0);  # set a to 0, b to 1, c to 2
a,b,c,d :=f(0);  # raise error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gapdays2015-spring Issues and PRs that arose at https://www.gapdays.de/gapdays2015-spring gapsagedays2016 Issues and PRs that arose at https://www.gapdays.de/gap-sage-days2016 kind: new feature topic: GAP language
Projects
None yet
Development

No branches or pull requests

4 participants