Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name Lookup for Wildcards #14862

Closed
gafter opened this issue Nov 1, 2016 · 89 comments
Closed

Name Lookup for Wildcards #14862

gafter opened this issue Nov 1, 2016 · 89 comments

Comments

@gafter
Copy link
Member

gafter commented Nov 1, 2016

We are considering a number of small changes to deconstruction, out variables, and pattern-matching to allow the use of wildcards where one could declare a variable, but the user has no intention of using the value captured in that variable. Here we discuss an open issue around name lookup, and propose a resolution.

The summary of this proposal is that the use of a wildcard has no effect on name lookup elsewhere in the code.

Original (LDM) approach

The original approach described by LDM was that the lookup of the identifier _ would treat it as a wildcard if nothing were found:

class Sample
{
    void M()
    {
        M2(out _); // ok wildcard
    }
}

multiple declarations of the identifier _ in the same scope would cause _ to be considered a wildcard:

class Sample
{
    void M()
    {
        int _ = e1;
        int _ = e2;
        M(out _); // ok, wildcard
    }
}

Existing cases, where the identifier is declared once, would still introduce a variable named _:

class Sample
{
    void M(object o)
    {
        if (o is int _)
        {
            Console.WriteLine(_); //, uses declared pattern variable _
        }
    }
}

Criticism of the this approach

This approach treats an unbound identifier _ as a wildcard, a single definition of _ as causing _ to bind to that definition, and multiple definitions again treating the use of _ as a wildcard. It requires careful definition of what "multiple definitions" are. Would a declaration of _ in an outer scope, and redeclaration in an inner scope cause a use to be a wildcard?

class Sample
{
    void M(object o)
    {
        if (o is int _)
        {
            M(out int _); // allowed? wildcard?
        }
    }
}

There are corresponding implementation difficulties that may make this approach unattractive.

Proposed alternative

The proposed alternative is that

  • the declaration of an identifier _ in the following contexts would always be treated as a wildcard, and would have no effect on name lookup elsewhere:
    • an out argument,
    • the left-hand-side of a deconstruction, or
    • a pattern variable
  • the use of the simple identifier _ as an expression as an out argument, or as a variable in the left-hand-side of a deconstruction, would be bound to a variable of that name. If no variable of that name is found, it is treated as a wildcard. This is similar to the treatment of var as a type.
class Sample
{
    void M(object o)
    {
        if (o is int _) // wildcard by 1.
        {
            M(out int _); // wildcard by 1.
        }
        (int x, int _) = e; // wildcard by 1.

        M(out _); // wildcard by 2.
        (o, _) = e; // wildcard by 2.
        (int x, _) = e; // wildcard by 2.

        Console.WriteLine(_); // error: no _ in scope
    }
}

This is much simpler to implement and I believe much easier to explain.

@gafter
Copy link
Member Author

gafter commented Nov 1, 2016

/cc @dotnet/ldm @jcouv @AlekseyTs @jaredpar @VSadov

@HaloFour
Copy link

HaloFour commented Nov 1, 2016

To clarify, if there are no ambiguous overloads of a method declaring out parameters you may use _ as a simple wildcard without the declaration syntax?

//given
void M(out int x) { x = 123; }

// all are equivalent statements?  (assuming no variable _ in scope)
M(out var _);
M(out int _);
M(out _);

If no variable of that name is found, it is treated as a wildcard.

While I do think that this is an improvement on the original proposal I still there that there is too much room for ambiguity and the potential for accidentally overwriting legally declared fields/variables:

void Foo() {
    M(out _); // intended to throw away result
}

private int _; // oops, just added this field and changed the meaning of my program!

I'd argue that for out parameters you could only use a wildcard with the declaration:

void Foo() {
    M(out var _); // no ambiguity, definitely a wildcard
}

As for deconstruction, I'd argue that the use of _ without a declaration should always be considered a compiler error if there is a field/local with the name _ in scope:

private int _;

void Foo() {
    {
        int o, _;
        (o, _) = e; // compiler error, cannot deconstruct into _
        (o, var _) = e; // no ambiguity, definitely a wildcard
    }
    {
        int o;
        (o, _) = e; // no _ in scope, definitely a wildcard
    }
}

I acknowledge that this is a hard-line approach, but in my opinion it's worth being a little more obnoxiously stricter in order to avoid the potential that any legit variable/field is overwritten. For people who don't use the _ identifier there would effectively be no difference.

@vbcodec
Copy link

vbcodec commented Nov 1, 2016

@HaloFour
Forcing to deconstructions like (x, var _, var _, var _) and inability to deconstruct into _ is antithesis of improvement.

For private / protected / external _ there may be warning (where using _) for such cases.

@gafter
Disabling leakage var _ from deconstruction into enclosing scope is further complication already complicated rules of scoping described #12939

Do we need wildcards in typeswitch at all ? imo, it is useless here

Can you explain why characters #, @, $, % can't be used for wildcards ?

While presented alternative approach is simpler, I would stick with previous proposal which is bit more complex, but more aligned with rules for other variables.

@gafter
Copy link
Member Author

gafter commented Nov 1, 2016

Do we need wildcards in typeswitch at all ? imo, it is useless here

Providing the user a way to avoid declaring an identifier allows us to warn when the declared identifier is unused.

Can you explain why characters #, @, $, % can't be used for wildcards ?

We don't think any of those are an improvement over _, and most other languages with a similar concept appear to agree. A wildcard, representing as it does something that the user intends to ignore, should be as low profile as possible, and nothing is lower profile than the Unicode LOW LINE. Users already declare _ when they intend a variable to be ignored, and this choice recognizes and blesses user preference.

@dsaf
Copy link

dsaf commented Nov 1, 2016

Shouldn't that 0.01% of developers who are currently actually using _'s value be somehow punished anyway? I am more worried about general leaking to outer scope #14697.

@vbcodec

Do we need wildcards in typeswitch at all ? imo, it is useless here

Yeah, not sure why we need two ways:

if (o is int _) {}
if (o is int) {}

@gafter
Copy link
Member Author

gafter commented Nov 1, 2016

Do we need wildcards in typeswitch at all ? imo, it is useless here

Yeah, not sure why we need two ways:

if (o is int _) {}
if (o is int) {}

That isn't a typeswitch.

In the context of an is expression, however, this may disambiguate some situations, thereby enabling the use of tuple types:

if (o is (int x, int y) _) {} // test for the type ValueTuple<int, int>; nothing placed in scope
if (o is (int x, int y) {}  // test for any tuple whose value contains two integers; places x and y in scope

@dsaf
Copy link

dsaf commented Nov 1, 2016

It makes sense in such situations I guess. Although the names are probably not relevant in first case:

if (o is (int, int) _) {} 

or

if (o is (int _, int _) _) {} 

@dsaf
Copy link

dsaf commented Nov 1, 2016

Also why not this?

if (o is (int, int)) {} 

@HaloFour
Copy link

HaloFour commented Nov 1, 2016

@vbcodec

Forcing to deconstructions like (x, var _, var _, var _) and inability to deconstruct into _ is antithesis of improvement.

Hey, that's my word! 😁

The only thing worse than a slightly more verbose improvement is a syntax that can accidentally lead the developer into a subtle runtime bug. Allowing deconstruction into an existing _ is a big potential pit of failure.

Note that this would apply specifically to deconstructing into existing variables which I expect will be a minority case. For people using declaration deconstruction syntax you'd only need var (x, _, _, _). I doubt most people would even notice.

For private _ there may be warning for such cases.

It's my opinion that a warning wouldn't be sufficient. You could add a protected field named _ in some base class and have no idea what code you impact further down the pipeline, including in other projects.

@gafter

We don't think any of those are an improvement over _,

I tend to agree. The closest I might suggest is . but I imagine that comes with its own baggage of parsing issues. I still favor * over all, but I understand the type ambiguity that introduces (and I'm jealous of VB.NET for the first time in a very long time).

@dsaf

Shouldn't that 0.01% of developers who are currently actually using _'s value be somehow punished anyway?

You know that the most active "abusers" would be those Fortune 50s with massive code bases that have every other MS VP on speed dial and would call them up screaming if they even suspected that their builds might break. There are some threads on the coreclr repo sharing some interesting stories of just this sort of thing happening with updates to .NET impacting internals that nobody should have ever relied on anyway, updates that had to be rolled back.

Anyhow, MS made _ a legal identifier, why should people who took advantage of it be penalized?

@dsaf
Copy link

dsaf commented Nov 2, 2016

@HaloFour

You know that the most active "abusers" would be those Fortune 50s with massive code bases that have every other MS VP on speed dial and would call them up screaming if they even suspected that their builds might break. There are some threads on the coreclr repo sharing some interesting stories of just this sort of thing happening with updates to .NET impacting internals that nobody should have ever relied on anyway, updates that had to be rolled back.

And they are rushing to use C# vNext? And they will scream you say? Did it help e.g. save Silverlight? Based on what Microsoft is generally doing, enterprise is irrelevant. Unless it uses Node.js with Git and hosts it on Azure.

@vbcodec
Copy link

vbcodec commented Nov 2, 2016

@gafter

We don't think any of those are an improvement over _

But these characters do not conflict with any name, and there won't be needed any tricks to avoid ambiguities. Doesn't this outweigh 'low profile' of _ ?

@HaloFour
Copy link

HaloFour commented Nov 2, 2016

@dsaf

And they are rushing to use C# vNext?

This is also why the team has refused to add any new warnings to the compiler for existing code. Can't risk breaking some build that happens to be riddled with awful code.

https://github.com/dotnet/corefx/issues/1420#issuecomment-96430564

We have had to revert refactorings because customer applications were taking a dependency on the method names in various stacks. Ugh.

Who knows, maybe Microsoft could stand to be significantly more abusive towards their customer base. It seems to work for Apple.

@dsaf
Copy link

dsaf commented Nov 2, 2016

@HaloFour more abusive than the whole Sinofski fiasco? https://blog.ailon.org/how-one-announcement-destroyed-the-net-ecosystem-on-windows-19fb2ad1aa39#.2b9c211pz Renaming a method is kind of easier than switching a GUI framework.

@CyrusNajmabadi
Copy link
Member

Let's keep the conversation on topic. Thanks :)

Doesn't this outweigh 'low profile' of _ ?

Currently we don't think so. "_" as a wildcard fits into how people are generally using that identifier already today. So we think it's nice to be able to just codify that concept, even if it means we do need to put in the due diligence to make sure we don't break any existing code.

@dsaf
Copy link

dsaf commented Nov 2, 2016

@CyrusNajmabadi sorry, I'll stop now :).

@DavidArno
Copy link

DavidArno commented Nov 2, 2016

Regarding the new proposals, I think @HaloFour has nicely summarised what I also see as a better solution than @gafter is proposing. I too think we need very strict - _ variable in scope or _ is a wildcard - dichotomy solution here to minimise the chance of really hard to find bugs.

There is one use-case that's missing from both the OP and @HaloFour's post. I think is important to clarify that following extra wildcard case should also be supported:

_ = e; // wildcard, but not explicitly covered by @gafter's 2nd case

@bondsbw
Copy link

bondsbw commented Nov 2, 2016

How about using ~ (tilde)?

if (o is int ~)
{
    M(out int ~);
}
(int x, int ~) = e;

M(out ~);
(o, ~) = e;
(int x, ~) = e;

It is almost as low profile as _, and has an appearance that conveys the ephemeral nature of wildcards.

@DavidArno
Copy link

@bondsbw,

~ is the bitwise complement operator. Just like -, #, * etc, it therefore already has a meaning. _ is already used by many of us as a pseudo wildcard, so already has that meaning in the minds of 99% (pure guess) of those that use it.

@bondsbw
Copy link

bondsbw commented Nov 2, 2016

What makes ~ better, in my opinion, is that it cannot be used in similar circumstances. Nobody is going to confuse bitwise complement with wildcard.

And to echo @HaloFour's point, it doesn't suddenly change meaning when you are working elsewhere.

@bbarry
Copy link

bbarry commented Nov 2, 2016

If you were considering other symbols, I'd suggest @. It is already a valid character at the start of an identifier (though it doesn't affect the name of the identifier). Reading if (o is int @) as if it were valid c# without a wildcard appears to be a pattern into a variable with no name.

That said, I like _ as a wildcard even if it is ambiguous with existing code. I'd be happy with the rule:

  1. Any pattern or deconstruction syntax involving a wildcard where there is a variable by the name _ in scope is an error.

@CyrusNajmabadi
Copy link
Member

as mentioned already, we are trying to avoid using other symbols. There's no end to the number of syntacticly unambiguous options we have if we go with something else. But that's not the goal here. The goals are:

  1. to use the character today that people already routinely use to mean 'i don't care about this variable'.
  2. to use a variable today that has a familiar meaning for people using other languages that supports this concept.

It's similar to the issue we had when we introduced generics into the language. We could have gone with different syntax than <>. It would have helped us avoid ambiguities in the language. However, we felt that there was enough merit in those characters (esp. with how parametric polymorphism occurs in other languages) to use them, even though they were more difficult to fit into our language.

To that end specified above, '_' is an ideal wildcard character. It just needs to be added to the language in a careful manner as we view back compat as a very high bar.

@vbcodec
Copy link

vbcodec commented Nov 5, 2016

@HaloFour

turning them into compiler errors when (intentionally or unintentionally)

Compiler post error only for technically incorrect statements, which is not case here.
How compiler should know that variable is unintentionally changed ? Compiler do not track how code is written, so it don't know if deconstructions or variable was created as first. That's why I propose only warning here (for deconstruction and calls with out).

Eliminate the short form wildcard from out declarations.

This is asking for breaking compatibility, because "Code like var _ = 123; and M(out _) would suddenly take on a completely new meaning." (citation of your post).

Every dev must learn every new feature, and be responsible while creating code. Team and compiler should not take responsibility for irresponsible devs. There are already features that can introduce 'subtle ambiguities' that are much more serious than changing value of some 'do not care' variable.

@HaloFour
Copy link

HaloFour commented Nov 5, 2016

@vbcodec

Compiler post error only for technically incorrect statements, which is not case here.

This would make that the case here. The compiler already does this in numerous examples today. For example, requiring a break in a switch to avoid unintentional fall-through.

This is asking for breaking compatibility, because "Code like var _ = 123; and M(out _) would suddenly take on a completely new meaning." (citation of your post).

No it wouldn't. That would mean exactly what it has meant in C# 1.0 through C# 6.0. In fact, it would also mean the same thing that it means based on the proposed changes in this proposal.

Every dev must learn every new feature, and be responsible while creating code. Team and compiler should not take responsibility for irresponsible devs.

It's called the "pit of success". The language does have the responsibility of providing an environment that leads developers towards success. The C# team has invoked it on numerous occasions already.

There are already features that can introduce 'subtle ambiguities' that are much more serious than changing value of some 'do not care' variable.

Name 'em. Specifically those that don't cause compiler errors and introduce subtle value changes.

@HaloFour
Copy link

HaloFour commented Nov 5, 2016

@vbcodec

Note that I believe that based on the reconsidered rules above that the following would already fail to compile:

void M(out string s) { s = "456"; }

var _ = 123;
M(out _); // CS1503: cannot convert from "out int" to "out string"

@vbcodec
Copy link

vbcodec commented Nov 5, 2016

@HaloFour

but now correctly code

void M(out string s) { s = "456"; }

var _ = '123';
M(out _); // fine

will fail to compile with your second rule.

@HaloFour
Copy link

HaloFour commented Nov 5, 2016

@vbcodec

You misunderstand. That code would compile just fine, as it does today. What wouldn't compile is M (out _) when there is no variable named _ in scope, which is also not a change from today. If you wanted to use a wildcard you'd use M (out var _) or M (out string _), which would always be a wildcard and never introduce a new variable into scope.

@vbcodec
Copy link

vbcodec commented Nov 5, 2016

@HaloFour

So, there is no need for error. Some devs may protect themselves by placing var before, _, while others may use just _. If someone never use _ for variables, why to force him to always write var ? It destroys most of this feature.

@alrz
Copy link
Contributor

alrz commented Nov 19, 2016

What's the reason for var _? Is there any use case where _ is ambigious and var would help with that?

@iam3yal
Copy link

iam3yal commented Nov 19, 2016

@alrz Not sure but maybe it's slightly less work? I mean otherwise, this introduces a special case, just a hunch. :)

However, I agree, M(out _) is more attractive.

@jcouv
Copy link
Member

jcouv commented Nov 20, 2016

@alrz @eyalsk Yes, var _ is there for consistency with typed discards (int _). It may also be useful to clarify you want a discard when there is already a local named underscore (but do not want to reference that local).
Like you, I expect people will mostly use the short discard (just _ where allowed).

@jaredpar jaredpar modified the milestones: 2.0 (RC.2), 2.0 (RC.3) Dec 8, 2016
@MattGertz MattGertz modified the milestones: 2.0 (RTM), 2.0 (RC.3) Jan 14, 2017
@MattGertz
Copy link
Contributor

@gafter @jaredpar I'm cleaning up the RTW list starting today -- is this one a done deal?

@gafter
Copy link
Member Author

gafter commented Jan 15, 2017

@MattGertz The compiler work is all done. The language specification may require spec text to be written, which is why this LanguageDesign issue is still open.

@jaredpar jaredpar modified the milestones: 2.0 (RTM), Unknown Mar 27, 2017
@gafter
Copy link
Member Author

gafter commented Mar 27, 2017

Issue moved to dotnet/csharplang #364 via ZenHub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment