Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deprecations with copy! and Vector{UInt8}(str) #234

Merged
merged 1 commit into from
Jan 29, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion REQUIRE
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
julia 0.6
Compat 0.44.0
Compat 0.45.0
Nullables 0.0.1
13 changes: 9 additions & 4 deletions src/Parser.jl
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,7 @@ function read_unicode_escape!(ps)
end
end

get_bytes(str) = Vector{UInt8}(@static isdefined(Base, :codeunits) ? codeunits(str) : str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not necessary given the changes I suggest below.

However, it would be good to have a Compat.codeunits(s) function that just returns Vector{UInt8}(s) in Julia 0.6.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed a PR for Compat to add codeunits and ncodeunits

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If those are merged soon, then yes, I'll change this to use codeunits.


function parse_string(ps::ParserState)
b = UInt8[]
Expand All @@ -272,7 +273,7 @@ function parse_string(ps::ParserState)
if c == BACKSLASH
c = advance!(ps)
if c == LATIN_U # Unicode escape
append!(b, Vector{UInt8}(string(read_unicode_escape!(ps))))
append!(b, get_bytes(string(read_unicode_escape!(ps))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

append!(b, codeunits(s)) would work here … no need to force a copy to be made as in get_bytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem was that codeunits wasn't defined in Compat.jl to make it work on v0.6.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even using codeunits is not the best approach here.
Instead of having to create a string just to append it, it would be better to pass b to a read_unicode_escape!(b, ps) function, that directly push!ed the 1-4 bytes from the unicode escape.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question - why do you think my change would force a copy? It uses Vector{UInt8} on v0.6, which doesn't copy (and hence could be unsafe), and codeunits on v0.7, which doesn't copy either.
I think your objection was incorrect.

else
c = get(ESCAPES, c, 0x00)
c == 0x00 && _error(E_BAD_ESCAPE, ps)
Expand Down Expand Up @@ -381,9 +382,11 @@ function unparameterize_type(T::Type)
candidate <: Union{} ? T : candidate
end

function parse(str::AbstractString; dicttype::Type{<:AbstractDict}=Dict{String,Any}, inttype::Type{<:Real}=Int64)
function parse(str::AbstractString;
dicttype::Type{<:AbstractDict}=Dict{String,Any},
inttype::Type{<:Real}=Int64)
pc = ParserContext{unparameterize_type(dicttype), inttype}()
ps = MemoryParserState(Vector{UInt8}(String(str)), 1)
ps = MemoryParserState(get_bytes(str), 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better not to make a copy here.

Since MemoryParserState is only ever created from a String, it would be better to just (a) change the utf8data field to a String field and (b) call codeunit(s, i) to extract a byte from the string (e.g. in the byteat function). Note that codeunit already works in Julia 0.6.

Copy link
Contributor Author

@ScottPJones ScottPJones Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I'm trying to make the minimal changes here (as I was told many times before to do 🙂 ) to make it work on both v0.6 and later.
For a fast JSON parser, I'll probably just do it based on my Strs.jl package, not String.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestions require hardly any more code, and don't make more copies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't make less copies either. My change actually requires gets rid of the copies on v0.7, and doesn't add them for v0.6.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your version has makes more copies on 0.7 than 0.6, whereas mine does not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On v0.7, it calls codeunits. Isn't that what your's does on v0.7?

v = parse_value(pc, ps)
chomp_space!(ps)
if hasmore(ps)
Expand All @@ -392,7 +395,9 @@ function parse(str::AbstractString; dicttype::Type{<:AbstractDict}=Dict{String,A
v
end

function parse(io::IO; dicttype::Type{<:AbstractDict}=Dict{String,Any}, inttype::Type{<:Real}=Int64)
function parse(io::IO;
dicttype::Type{<:AbstractDict}=Dict{String,Any},
inttype::Type{<:Real}=Int64)
pc = ParserContext{unparameterize_type(dicttype), inttype}()
ps = StreamingParserState(io)
parse_value(pc, ps)
Expand Down
4 changes: 2 additions & 2 deletions src/specialized.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ function parse_string(ps::MemoryParserState)
# We can just copy the data from the buffer in this case.
if fastpath
s = ps.s
unsafe_copy!(b, 1, ps.utf8data, s + 1, len)
unsafe_copyto!(b, 1, ps.utf8data, s + 1, len)
ps.s = s + len + 2
else
parse_string(ps, b)
Expand Down Expand Up @@ -96,7 +96,7 @@ function parse_string(ps::MemoryParserState, b)
c = ps.utf8data[s]
if c == LATIN_U # Unicode escape
ps.s = s + 1
for c2 in Vector{UInt8}(string(read_unicode_escape!(ps)))
for c2 in get_bytes(string(read_unicode_escape!(ps)))
i += 1
b[i] = c2
end
Expand Down