-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deprecate utf8 for String #16469
deprecate utf8 for String #16469
Conversation
@test ["a","b"].*["c","d"]' == ["ac" "ad"; "bc" "bd"] | ||
|
||
# Make sure NULL pointer are handled consistently by | ||
# `String`, `ascii` and `utf8` | ||
# Make sure NULL pointer are handled consistently by String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pointers are
Nice cleanup! |
3347632
to
be45b8b
Compare
How could the branch be called Highlander 4? There can only be one. |
https://en.wikipedia.org/wiki/Highlander_II:_The_Quickening (not recommended) |
0% on Rotten Tomatoes. Ouch. |
@test_throws UnicodeError utf8(b"\xf0\x80\x80\x80") | ||
# Test ends of long encoded surrogates | ||
@test_throws UnicodeError utf8(b"\xf0\x8d\xa0\x80") | ||
@test_throws UnicodeError utf8(b"\xf0\x8d\xbf\xbf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these throw if you try to call convert them to String
on themString
? What's the strategy going to be for dealing with invalid UTF8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the new String type, it will accept any data, valid UTF-8 or not – but doesn't yet, so we're in a slightly awkward intermediate state. I promise I will add lots of tests for all of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like convert(String, data)
currently errors but String(data)
does not. Will additional changes make them both behave the same but then you opt in to validating data with some other function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the conversion stuff is a total mess right now – there's no coherent concept of what conversion means:
julia> String(UInt8[226,136,128])
"∀"
julia> String(UInt[226,136,128])
ERROR: MethodError: Cannot `convert` an object of type Array{UInt64,1} to an object of type String
This may have arisen from a call to the constructor String(...),
since type constructors fall back to convert methods.
in String(::Array{UInt64,1}) at ./sysimg.jl:50
in eval(::Module, ::Any) at ./boot.jl:225
in macro expansion at ./REPL.jl:92 [inlined]
in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:46
julia> String(Int32[226,136,128])
"â\u88\u80"
julia> String(UInt32[226,136,128])
"â\u88\u80"
julia> String(Char[226,136,128])
"â\u88\u80"
Having the meaning of calling String
on a vector of integers depend on the details of the representation of the integers like this is a total disaster. Only String(UInt8[...])
should work since that provides the actual data for the String
object. Perhaps a Vector{Char}
should be convertible to strings but it should be a convert call, not a constructor call. I've added a work item for the next round of Stringapalooza to fix this trainwreck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least the data being tested here is useful, so ok if I add these back under convert
for now? The same data can be tested under whatever the new design ends up being.
Part of #16107.