-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make 𝟏, 𝟎, 𝟙, 𝟘 into valid identifiers for DSLs #26808
Comments
I can see the appeal of the idea, but I think there's too little benefit for the potential readability and maintanence costs with this. Between font variations and (anti-)aliasing and rendering choices and syntax highlighting, the distinctions between the different zeros (0 𝟎 𝟘) or ones (𝟙 1 𝟏) can get pretty blurry. The idea of potential gotchas in such basic entities as 0s and 1s (and the confused stackoverflow questions resulting from them) is not an appealing prospect. |
I think you can make that case about almost all unicode characters that have a similar ascii character. Whether it makes sense in a particular case or not is a very reasonable question, and library specific. In libraries like ApproxFun.jl, they use symbols like 𝒟, which looks like a D matches the math notation of using script to denote differential operators. The only difference with what I am suggesting is that (right now) library writers don't have the option to make an alias to variable names that start with the fancy number looking characters. If this was changed, then the discussion could come to what you are bringing up: are introducing those aliases a good idea (since they never should be required). Your perspective is reasonable, but it may be domain specific |
There are three choices here:
The last option seems confusing and fairly pointless to me—unlike characters like The current behavior of disallowing digit variants entirely seems like a waste of potentially nice syntax. I have yet to encounter a font where these digits variants render and are not visually distinguishable from the corresponding digits. That leaves option 2: allowing digit-variant characters to be used as letters, which is what this issue proposes. I can understand that people might now want to use these bindings, which is fine—in that case, don't use them. But why should we prevent people who want to from doing so? Especially given that the only other potential use for them is not really sensible. |
True, that's why I mentioned "such basic entities as 0s and 1s". 'Is this identifier a 𝒟 or a D' is a very different sort of question from 'is this thing here a literal or an identifier'. It's a small mental cost when going through a codebase, but such costs add up pretty quickly.
I'm a fan of DSLs and would in theory love to have custom infix operators (#16985) and even custom infix named functions, hoping the users use them wisely. But sometimes the guardrails have to be in the language, and in my opinion this is one of those cases.
The same reason the codepoints were restricted in the first place (#5936) - code gets passed down and across teams and people, and sometimes it's more important to prevent "crazy things" being introduced by someone, than to provide a minor nicety. |
As far as I can tell, any argument that this is confusing applies equally to |
Agreed; we're way past the point of having any sort of policy against potentially-confusable characters. I agree with Stefan that when fonts have 𝟘 and 𝟙 they tend to be more distinguishable than some other examples like e and ℯ. |
The reason to restrict code points was to allow for implementing sane uses of code points in the future without breaking code, not to prevent people from doing silly things. If people want to write unreadable code, they will, no matter what we do to try to prevent it. I think the de facto policy with potentially-confusable characters is that we identify characters that are easily confused both on input and appearance so there's a real chance that someone may input one when they intended to input the other and not be easily able to tell that this is what has happened. The normal "e" versus Euler's "ℯ" fails this test on both counts: there's little chance that anyone will have input "ℯ" by accident when they meant "e" since "e" is on every keyboard and "ℯ" is on none; they also look fairly distinct in most fonts so even if someone managed to do this somehow, they'd be able to notice what's going on. The case of "μ" and "µ" satisfies this criterion since neither character is on a standard keyboard and some input methods give you one while others give you the other and they look identical so it's extremely hard to discover that this is what's going on after the fact. Applying this test to the "1" versus "𝟙" case leads to the same conclusion as "e" versus "ℯ"—i.e. that they should be considered distinct characters. |
My concern was about later readability than about ambiguity during input, "code is read a lot more than it's written" and all that. But since this is probably going in, can we have it so that there's one canonical identifier zero (not multiple) to go alongside the one canonical literal 0 (and similarly for 1)? My vote is for the |
I see no reason to limit this to just one, when so many of the "1"s are easily distinguished. No one is going to confuse any of the following for each other or for |
ref: #10762 |
@JeffBezanson @StefanKarpinski (cc @dlfivefifty ) I realized that a feature freeze is coming soon and was wondering if you would still support having a PR that implements this? It would be very nice to sneak into the 1.3 release. |
For the record, 1.3 has a lot of exciting stuff in it already, and so postponing this to 1.4+ makes sense to me. |
Oh for sure. This would not be the highlight of the release by any means! But if it is a low "cost" and low probability of side effect issue, it would mean I can write some cool DSLs 6 months earlier. |
Triage is ok with this. |
Explicitly, triage is ok with option 2: Allow digit-variant characters to be used as letters, distinct from the digits they correspond to. Now it merely needs an implementation. |
Fixed by #32838 |
Looking at https://docs.julialang.org/en/latest/manual/unicode-input/#Unicode-Input-1 There are a few identifiers that would make excellent identifiers for linear algebra and probability DSLs.
Note that this is conservative in leaving as many other of the unicode numbers as invalid identifies. In particular,
\bsanszero
and\bsansone
look similar, but are left as invalid identifiers for now.The main use-case for these is to be able to add in automatically reshaping matrices/vectors of 1s and 0s into https://github.com/JuliaArrays/FillArrays.jl in the spirit of the
UniformScaling
operator, currently denoted byI
. Of course, this library would not intend to lay claim to that notation, but would want to use it. The 𝟘 and 𝟙 might be useful for people who wish to useconst 𝟙 = 𝟏
to match their latex notation, or could allow writing a new indicator functions, etc. I know I would use𝟙(a > b)
for that to match algebra.The text was updated successfully, but these errors were encountered: