Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Scope of pattern variables, and tuple decomposition #4781

Closed
gafter opened this issue Aug 25, 2015 · 71 comments
Closed

Discussion: Scope of pattern variables, and tuple decomposition #4781

gafter opened this issue Aug 25, 2015 · 71 comments

Comments

@gafter
Copy link
Member

gafter commented Aug 25, 2015

I’m doing a draft reimplementation of pattern matching (#206) now, and trying to figure out the right scoping of pattern variables. I had expected to use whatever we came up with for declaration expressions when those were under development, but it turns out that doesn’t make sense in many situations. For example, given the statement

M(x is int i ? i : 12);

There is no point in the variable i being in scope outside this statement, as it won’t be definitely assigned anywhere else. My current thinking is that a pattern variable introduced in an expression would be limited in scope to the nearest enclosing statement, with the single exception that a pattern variable introduced in the condition of an if statement would not be in scope in the else clause.

For a field initializer, the scope would be limited to that declaration. For an expression-bodied property or method, limited to that declaration. For a ctor-initializer, limited to that ctor-initializer. The scope of a pattern variable introduced in a catch clause is the catch block.

For decomposing a tuple (#347), I anticipate introducing a separate statement form, something like

pattern := expression ;

For example

var tuple = (12, "foo");
(int x, string y) := tuple;

This has the same semantics as the expression

expression is pattern

With the exceptions that

  • It is a statement form, not an expression form;
  • The compiler must be able to prove that the match always succeeds (otherwise it is a compile-time error); and
  • Variables introduced in the pattern remain in scope within the enclosing block

I welcome feedback and comments on this while is it fresh on the drawing board.
@MadsTorgersen @ljw1004 @semihokur @AnthonyDGreen @terrajobst @jaredpar @Pilchie @mattwar @stephentoub @vancem

@svick
Copy link
Contributor

svick commented Aug 25, 2015

What's the reasoning behind using :=? Wouldn't just = (e.g. (int x, string y) = tuple;) work?

Or is it meant to differentiate between pattern decomposition and normal assignment?

Also does this mean that (using examples from #206) Polar(var R, *) := Cartesian(3, 4); would work? Because that would be awesome.

@gafter
Copy link
Member Author

gafter commented Aug 25, 2015

@svick Yes, this is different from "normal" assignment. You're suggesting it could be thought of as an assignment expression inside an expression-statement, but the left-hand-side isn't an lvalue, the assignment itself doesn't appear to have a well-defined type, and the left-hand-side appears to be a type expression (from the tuples proposal #3913) rather than a valid syntactic form for an (lvalue) expression. This would fill in the "Deconstruction syntax" for #3913.

No, Polar(var R, *) := Cartesian(3, 4); would not work because the kinds of things the compiler would need to know to prove it will succeed are outside the scope of what we are likely to specify. Anything that depends on what user-definable code inside the constructor and/or decomposition operator (e.g. Polar.operator is) do would not be assumed to always succeed.

@MgSam
Copy link

MgSam commented Aug 25, 2015

Speaking purely from a usability perspective, adding := to the language just to deconstruct tuples seems odd and likely confusing. You would almost certainly have to add a compiler error/warning for the extremely common case when people type (int x, string y) = tuple; - at which point one would ask why isn't that the default anyway.

I understand it doesn't behave like a normal assignment, but it should still re-use the same = operator.

@dsaf
Copy link

dsaf commented Aug 25, 2015

Looking at current version of the language, this is similar to foreach. Reusing for's variable is a bad practice. Passing a variable into using is also not recommended.

Keeping things local reduces cognitive load. Even in current C# I sometime use curly brackets to make a local variable even more local:

public float BigCalculationThatShouldntBeSplit() 
{
    float result = 0;

    {
        float variableWithSomeMeaning = 0;
        //first leg of calculation
    }

    {
        float variableWithOtherMeaning = 0;
       //second leg of calculation
    }

    return result;
}

@dsaf
Copy link

dsaf commented Aug 25, 2015

@MgSam

Speaking purely from a usability perspective, adding := to the language just to deconstruct tuples seems odd and likely confusing.

What if it will be allowed to define custom operators like in F# and then := simply becomes a variadic generic extension operator on the new ValueTuple<> type (no idea if this is possible)?

public static ..T := (this (..T args) tuple)
{
    foreach (let arg in args)
    {
        yield return arg;
    }
}

@gafter
Copy link
Member Author

gafter commented Aug 25, 2015

There is also the question of how to handle the scoping of pattern variables appearing in the various parts of a query expression. I think they remain local to that expression.

@MgSam
Copy link

MgSam commented Aug 25, 2015

@dsaf Adding custom operators seems outside the scope of the discussion on pattern matching. It also hasn't even been seriously considered for C# 7.0 at this point so I don't think any proposal should incorporate that.

@gafter I think some more examples on how the proposed scoping rules would behave (especially in contrast to the discarded declaration expressions proposal) would be helpful.

@gafter
Copy link
Member Author

gafter commented Aug 25, 2015

@MgSam It is very simple - the variable simply would not be in scope outside the immediately enclosing statement.

M(x is int i ? i : 12); // i is scoped to this statement
Console.WriteLine(i); // error: i not found

In an if statement, it isn't even available in the else clause

if (x is int i)
{
    // int i is in scope
}
else if (x is string i) // int i not in scope here, so we can reuse the name
{
    // string i is in scope
}

Other cases

  1. In a field initializer, it is scoped to the field initializer
  2. In an expression-bodied method or property, scoped to the body
  3. In an expression-bodied lambda, scoped to the body
  4. In a ctor-initializer, scoped to the ctor-initializer
  5. In an attribute, scope to the attribute (but an error because not constant)
  6. In an expression part of a query expression, scoped to that expression part

@HaloFour
Copy link

@gafter

Paren in the wrong spot there?

I recall the argument that erupted from the scoping of declaration expressions. I was under the impression that was part of the reason that the feature was dropped/postponed.

@AdamSpeight2008
Copy link
Contributor

:= is used in named parameters, so to be consistent the identifier should be on the left of :=

@gafter
Copy link
Member Author

gafter commented Aug 25, 2015

@HaloFour The scoping rules for declaration expressions didn't make sense for pattern matching. That was part of the reason we put the feature off. I'm trying to restart the discussion in the context of pattern matching.

@gafter
Copy link
Member Author

gafter commented Aug 25, 2015

@AdamSpeight2008 There is no identifier in the syntax for the decomposition operator. The syntax is

pattern := expression ;

@AdamSpeight2008
Copy link
Contributor

There is also the question of how to handle the scoping of pattern variables appearing in the various parts of a query expression.
They are different variables, but have the same name, since it gets expanded out into function calls.
Would they just be considered as Lets

@gafter So I couldn't decompose a tuple into different variables.

( x, y, z ) = someTuple // x, y, z arerr already defined.
( x:= , y:= , z:= ) = someTuple // x,y,z are defined here

@gafter
Copy link
Member Author

gafter commented Aug 25, 2015

@AdamSpeight2008 I think you're talking about range variables, which have nothing to do with the current discussion.

@AdamSpeight2008
Copy link
Contributor

@gafter

M(x is int i ? i : 12); // i is scoped to this statement
This looks messy and confusing.
x is int a boolean expression
i ? i : 12 a inline condition
How can that work? I'd be spending more of my time trying to understand the syntax, rather doing work.

It needs to be a lot clearer, in its intent.

For example

case ( int, int ) into x, y when ( ( x > 0 ) && ( y > 0 ) ) :
  // foo

This checks the types, decomposes the value into variable named x and y (both of type int).
Then checks the values of them (a guard expression). If it evaluates to true then it proceed to run the code after // foo
Or in code.

If ( (t.Item1 is int) && (t.Item2 is int) )
{
  // (into) section 
  int x = t.Item1;
  int y = t.item2;
  // (when) section
    if( ( x > 0 ) && { y > 0 )
  {
  }
}

@gafter
Copy link
Member Author

gafter commented Aug 25, 2015

@AdamSpeight2008

I'd be spending more of my time trying to understand the syntax, rather doing work.

Yes, when you start seeing a new syntax you have to take time to learn it instead of forging ahead without understanding. Hopefully you would not have to learn a new syntax every time you see it for the rest of your life. This particular new syntax can always be distinguished by the identifier appearing after the type in an is.

The syntax you propose does not extend naturally to recursive pattern matching. I have no idea how it would be used in a method call per the original example. It also detaches the type names used to declare the pattern variables from the appearance of the pattern variables, making the syntax more difficult to comprehend, especially as the use cases get more complex.

@dsaf
Copy link

dsaf commented Aug 25, 2015

@AdamSpeight2008 In my opinion the syntax is as clear as it gets. But I guess you will be able to use the expression form of the switch to achieve the type of readability you are looking for:

M(switch(x) {
    case int i:
        return i;
    default:
        return 12; });

@gafter gafter closed this as completed Aug 25, 2015
@gafter gafter reopened this Aug 25, 2015
@AdamSpeight2008
Copy link
Contributor

In Nemerle is would be something like

M( match(x)
   { 
     | ( int ) into i => i;
     | _ => 12;
   } )

@dsaf The syntax is ambiguous in terms of possible meaning. if (x is int i)

  • it looks like a syntax error. eg a missing operator; maybe ==
    Now something that didn't compile before, does compile.
  • Why isn't i a boolean? Since the type signature is (obj,int)->bool.

@gafter
Copy link
Member Author

gafter commented Aug 25, 2015

@AdamSpeight2008

it looks like a syntax error

Every new feature uses syntax that used to be an error. That's how we manage to extend the language without changing the meaning of existing programs.

In Nemerle is would be something like

M( match(x)
   { 
     | ( int ) into i => i;
     | _ => 12;
   } )

No, it would not. It would be more like

M( match(x)
   { 
     | ( i is int ) => i
     | _ => 12
   } )

@mikedn
Copy link

mikedn commented Aug 26, 2015

@gafter

My current thinking is that a pattern variable introduced in an expression would be limited in scope to the nearest enclosing statement, with the single exception that a pattern variable introduced in the condition of an if statement would not be in scope in the else clause.

If the opposite is done and i is introduced in the enclosing scope then I suppose the following would fail to compile:

M(x is int i ? i : 12); 
M(x is int i ? i : 12); 

Or would the second statement attempt to use the existing i?

If not then I'm sure that there will be cases where it would be useful to have i declared in the enclosing scope but most of the time it would be highly annoying as you have to come up with new variable names for every such statement. So, short of using special syntax to select where i is declared (and having that would be quite weird IMO), the current scoping choice looks good to me.

@gafter
Copy link
Member Author

gafter commented Aug 26, 2015

@mikedn No, it won't reuse an existing variable. If there is already a variable of that name in scope it is an error

If not then I'm sure that there will be cases where it would be useful to have i declared in the enclosing scope

Such cases are rare, as the variable is only definitely assigned when true for the is-expression. Here is one realistic case:

if (!(o is int i)) throw new ArgumentException("o");
// now i is definitely assigned

The proposed scoping rules don't make this legal. On the plus side, they do make this legal:

if (o is string s) { /* code using string s */ }
else if (o is Something s) { /* code using Something s */ }

@whoisj
Copy link

whoisj commented Aug 26, 2015

Just so those of us with a slower uptake understand this.

This (x is int i ? i : 12) is functionally identical to this var i = x is int ? x : 12, ignoring current type issues?

@HaloFour
Copy link

@whoisj

Sorta. If I read this right the variable i would be out of scope outside of the expression, so you'd have to assign the result to another variable with a different name. Since the example is passing it as a parameter to a method called M that is not really a concern. Other than that you're right, the x is int i expression returns a bool so it can be used with ?: and the variable i would be in scope for the remainder of that expression.

@HaloFour
Copy link

@orthoxerox That's right, there are different boxing/unboxing rules for Nullable<T>. Either way, the important bit of this conversation is regarding the scope of those parameters, not necessarily which pattern is being matched. Type checking is just a simple pattern.

@orthoxerox
Copy link
Contributor

@HaloFour And that's why I questioned allowing universal use of pattern matching's is operator, since there's no compelling use case where it is superior to existing language features outside of actual pattern matching.

@MrJul
Copy link
Contributor

MrJul commented Aug 27, 2015

@orthoxerox I don't think the point of this issue is to question pattern matching. If you don't see its use, just don't use it :)

For more complex cases, it becomes superior IMHO:

M(something is Point(int x, int y) ? x * y : 12)

VS

var point = something as Point?;
M(point != null ? point.Value.X * point.Value.Y : 12)

And it really shines in a switch construct, as seen in the proposal.

@HaloFour
Copy link

Might be an argument to use a new keyword like match instead of is, but that probably belongs under #206.

@MrJul
Copy link
Contributor

MrJul commented Aug 27, 2015

@gafter

if (!(o is int i)) throw new ArgumentException("o");
The proposed scoping rules don't make this legal.

I think this could be solved by allowing to match into an existing variable:

int i;
if (!(o is int into i)) throw new ArgumentException("o");
// i is definitely assigned here.

(Please don't focus on the chosen syntax (into): I simply want to discuss using existing variables with pattern matching, even if it won't be the main use case).

@orthoxerox
Copy link
Contributor

@MrJul I didn't say pattern matching was bad, it's very good where it belongs, inside switches. Your more complex example could be rewritten using my proposal to autocast variables inside ifs:

if (something is Point)
    M(something.X * something.Y);
else
    M(12);

or actually using switch:

switch (something) {
    case Point(int x, int y):
        M(x, y);
        break;
    default:
        M(12);
        break;
}

The only use for a compound is+as operator the new is is is the need to get a downcast variable inside the switch, since it has no case-level scopes, e.g.

switch (token) {
    case Literal litToken:
        return litToken.ParseValue(); //can't use token automatically cast as Literal here
    ...
}

And this is once again a special case.

EDIT: I forgot that the pattern-matching switch has case-level scoping. There's absolutely no need to have x is type y constructs.

@HaloFour
Copy link

@MrJul Not a bad idea but I can see how it might be awkward. With the other forms of pattern matching I think it would be a little clearer, the type specific syntax above it would be a little weird.

object o = ...;
int x;
if (o is Point(out x, var y)) { } // reuse x, declare y

@gafter
Copy link
Member Author

gafter commented Aug 27, 2015

@AdamSpeight2008

The recursive pattern you suggest, doesn't indicate how knows each part of the pattern match
is referring to the .Left and .Right properties on the object..
Or that you are match ng against types ( or more abstractly the structure of object) and not against object values.

I suggest you read the whole specification (#206) for the details, not just the example. Or try the prototype. I'm working on reintegrating the prototype into Roslyn now, so it will be easier to experiment.

@alrz
Copy link
Contributor

alrz commented Aug 29, 2015

could KeyValuePair<,> be treated as a tuple and make this possible?

foreach ((var key, var value) in dictionary)
{

}

@gafter
Copy link
Member Author

gafter commented Aug 29, 2015

@alrz Treating KeyValuePair as a tuple doesn't get that behavior. We'd have to change the foreach statement to do decomposition.

@alrz
Copy link
Contributor

alrz commented Aug 30, 2015

@gafter in F#, patterns (which do the decomposition) are everywhere, as in let pattern = expression in ... , fun pattern -> expression and for pattern in enumerable-expression -> ... but still you can't have for key,value in dictionary where dictionary is a Dictionary<,> because KeyValuePair<,> is not a tuple.

@gafter
Copy link
Member Author

gafter commented Aug 30, 2015

@alrz If we had started with pattern matching we might have decomposition everywhere too. But we didn't so we don't.

@whoisj
Copy link

whoisj commented Aug 31, 2015

Treating KeyValuePair as a tuple doesn't get that behavior. We'd have to change the foreach statement to do decomposition.

@gafter can't you just add an implicit conversion to the KeyValuePair type?

@paulomorgado
Copy link

@whoisj, what would be the gain in doing that? Do you want the tuple or the tuple items?

@alrz
Copy link
Contributor

alrz commented Aug 31, 2015

@whoisj it would be a good idea but C# doesn't support generic operators

public static implicit operator (TKey,TValue)<TKey,TValue>(KeyValuePair<TKey,TValue> p)
    => (p.Key,p.Value);

which can be another feature request. also read the latest comments on #98.

@gafter
Copy link
Member Author

gafter commented Aug 31, 2015

@alrz If declared inside the KeyValuePair struct, this conversion operator would not have to be generic.

public static implicit operator (TKey,TValue)(KeyValuePair<TKey,TValue> p) => (p.Key,p.Value);

However, that still would not cause the foreach loop to perform decomposition.

@Eyas
Copy link
Contributor

Eyas commented Aug 31, 2015

@gafter Since we are talking about an entirely new feature anyways (pattern matching/decomposition), what would be the best place to talk about the possibilities of decomposition in a foreach loop? Or has it been discussed and determined to be impractical due to some deep language/compat change that the team is not willing to make?

@whoisj
Copy link

whoisj commented Aug 31, 2015

@whoisj, what would be the gain in doing that? Do you want the tuple or the tuple items?

Cleaner code, which expresses intent and not so much boiler plate.

@whoisj it would be a good idea but C# doesn't support generic operators

Touche.

@gafter
Copy link
Member Author

gafter commented Aug 31, 2015

@Eyas This is a fine place to discuss it, or in the tuple issue (#347). I'm just pointing out that you won't get decomposition in the foreach loop "for free" even if you get both tuples and a decomposition statement.

@alrz
Copy link
Contributor

alrz commented Sep 1, 2015

@gafter since foreach loop uses the assignment statement var current = enumerator.Current; I think it would.

@gafter
Copy link
Member Author

gafter commented Sep 1, 2015

@alrz Assignment and decomposition are different. For example, the pattern object o only matches a non-null value, but an assignment to it always succeeds. Since the elements that a foreach loop operates over may contain null, it must continue to use assignment (not decomposition) semantics.

@KrisVandermotten
Copy link
Contributor

I really dislike the idea the the if branch and the else branch of an if statement do not live in the same scope.

In C# 6, the statement

if ( expr )
{
    statement1;
}
else
{
    statement2;
}

can always legally be transformed into

if ( ! ( expr ) )
{
    statement2;
}
else
{
    statement1;
}

no matter wat expr, statement1 or statement2 really are. There is elegance in this symmetry.

Under the proposed scoping rules, that would no longer be the case.

That looks very weird to me.

Like many others, I like to test my inputs as soon as possible, throwing an exception when they are wrong, afterwards writing the code processing them when they are ok. So I would like to be able to write something like:

if (!(x is int i))
{
     throw new Exception("...")
}
else
{
    // use i
}

The alternative today works, but requires a redundant cast:

if (!(x is int))
{
     throw new Exception("...")
}
else
{
    int i = (int) x;
    // use i
}

@alrz
Copy link
Contributor

alrz commented Sep 1, 2015

@gafter thanks for explanation. but will tuples continue to use the underlying reference type System.Tuple (which is also used by F# and as discussed here it wasn't a good decision)? otherwise, being null wouldn't be the problem.

@gafter
Copy link
Member Author

gafter commented Sep 1, 2015

@alrz No, tuples are likely to be value types. But since the semantics of the foreach loop will continue to use assignment semantics for the reasons discussed, that won't make any difference for that use case.

@gafter
Copy link
Member Author

gafter commented Sep 1, 2015

@KrisVandermotten We are discussing a decomposition statement, something like

let Pattern = expression;

in which the match must be statically verifiable (the compiler can prove it always succeeds), and variables from the pattern are introduced into the enclosing scope. If the match can fail you'd use this form:

let Pattern = expression else statement

In the latter form there is an optional where expression clause after the expression.

This is perfect for the early error detection cases. Your example would be written

let int i = x else throw new Exception("...");

or, if you get paid by the line of code

let
    int i = x
else
{
    throw new Exception("...");
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests