-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Unicode tables to 9.0 #34599
Update Unicode tables to 9.0 #34599
Conversation
r? @eddyb (rust_highfive has picked a reviewer for you, use r? to override) |
cc @SimonSapin, @rust-lang/libs Thoughts on the backwards compatibility implications of a change like this? This seems like something we'd want to do although if it has bad implications we may just want to think through it. |
For reference, here are the official Unicode 9.0.0 changes, and the Migration section in particular should help evaluate compatibility. I don't know enough about how these properties are used to answer myself. |
When we've discussed unicode and compatibility in the past, I recall we've leaned toward giving ourselves leeway to upgrade. @cuviper based on the unicode changelog do you know what impacts this has on specific rust functions? If this makes e.g. changes to Unicode identifiers (which it looks like it does) that impacts the Rust language definition. |
@brson Yes, there are changes to the XID tables. I believe these are mostly additions for the new scripts, but I'm not sure of that. The UAX #31 Migration talks about changing the formal definitions of ID/XID, which isn't clear to me either, but I think it's just changing emphasis. |
To the best of my knowledge we have no public unicode functionality in libraries shipped with rustc or compiler itself which would be impacted by move to 9.0.
The only thing we might want to do is check our “easily confused symbols” table thing and see if it needs adjustment for the new codepoints (doubtful about it). |
To the PR author, you might need to adjust script more for new properties and similar changes. To the future reviewers: make sure the tables related to new properties are indeed correct and exhaustive. |
@nagisa There is one new property, props = load_properties("PropList.txt",
["White_Space", "Join_Control", "Noncharacter_Code_Point", "Pattern_White_Space"]) And then only So it seems |
I was more worried about
but it seems fine too, since we do not load that either. |
Is there anything waiting on me here? Or is this just waiting for a review decision? |
@cuviper ah no it's all on our end, the libs team just needs to discuss this basically. (would love to get @SimonSapin's thoughts as well) |
In general I’m in favor of keeping up to date with Unicode. Hard-coding a Unicode version was one of the big issues of IDNA 2003. And I believe the Unicode Consortium to be mindful of backward compatibility when making changes. http://www.unicode.org/policies/policies.html talks about stability. And for what it’s worth, a second-hand story from The Olden Days: https://tools.ietf.org/html/rfc3629#section-5
That said I haven’t looked at all at what changed in 9.0. http://unicode.org/versions/Unicode9.0.0/#Migration would be the thing to review. |
Ok, cool, thanks @SimonSapin! I suspect that @rust-lang/libs will probably all respond with "lgtm" @cuviper |
lgtm |
I'm happy to delegate to the experts here, so lgtm :) |
Update Unicode tables to 9.0 I just updated `unicode.py`'s generated copyright year, then ran it.
I just updated
unicode.py
's generated copyright year, then ran it.