Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate utf8 for String #16469

Merged
merged 2 commits into from
May 20, 2016
Merged

deprecate utf8 for String #16469

merged 2 commits into from
May 20, 2016

Conversation

StefanKarpinski
Copy link
Member

@StefanKarpinski StefanKarpinski commented May 20, 2016

Part of #16107.

@test ["a","b"].*["c","d"]' == ["ac" "ad"; "bc" "bd"]

# Make sure NULL pointer are handled consistently by
# `String`, `ascii` and `utf8`
# Make sure NULL pointer are handled consistently by String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pointers are

@nalimilan
Copy link
Member

Nice cleanup!

@StefanKarpinski StefanKarpinski merged commit 00e6129 into master May 20, 2016
@StefanKarpinski StefanKarpinski deleted the sk/highlander4 branch May 20, 2016 16:53
@StefanKarpinski StefanKarpinski mentioned this pull request May 20, 2016
32 tasks
@ararslan
Copy link
Member

How could the branch be called Highlander 4? There can only be one. :trollface:

@StefanKarpinski
Copy link
Member Author

@ararslan
Copy link
Member

0% on Rotten Tomatoes. Ouch.

@test_throws UnicodeError utf8(b"\xf0\x80\x80\x80")
# Test ends of long encoded surrogates
@test_throws UnicodeError utf8(b"\xf0\x8d\xa0\x80")
@test_throws UnicodeError utf8(b"\xf0\x8d\xbf\xbf")
Copy link
Contributor

@tkelman tkelman May 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these throw if you try to call String on them convert them to String? What's the strategy going to be for dealing with invalid UTF8?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the new String type, it will accept any data, valid UTF-8 or not – but doesn't yet, so we're in a slightly awkward intermediate state. I promise I will add lots of tests for all of that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like convert(String, data) currently errors but String(data) does not. Will additional changes make them both behave the same but then you opt in to validating data with some other function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the conversion stuff is a total mess right now – there's no coherent concept of what conversion means:

julia> String(UInt8[226,136,128])
""

julia> String(UInt[226,136,128])
ERROR: MethodError: Cannot `convert` an object of type Array{UInt64,1} to an object of type String
This may have arisen from a call to the constructor String(...),
since type constructors fall back to convert methods.
 in String(::Array{UInt64,1}) at ./sysimg.jl:50
 in eval(::Module, ::Any) at ./boot.jl:225
 in macro expansion at ./REPL.jl:92 [inlined]
 in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:46

julia> String(Int32[226,136,128])
"â\u88\u80"

julia> String(UInt32[226,136,128])
"â\u88\u80"

julia> String(Char[226,136,128])
"â\u88\u80"

Having the meaning of calling String on a vector of integers depend on the details of the representation of the integers like this is a total disaster. Only String(UInt8[...]) should work since that provides the actual data for the String object. Perhaps a Vector{Char} should be convertible to strings but it should be a convert call, not a constructor call. I've added a work item for the next round of Stringapalooza to fix this trainwreck.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least the data being tested here is useful, so ok if I add these back under convert for now? The same data can be tested under whatever the new design ends up being.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants