Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deprecations with copy! and Vector{UInt8}(str) #234

Merged
merged 1 commit into from
Jan 29, 2018

Conversation

ScottPJones
Copy link
Contributor

No description provided.

@@ -272,7 +273,7 @@ function parse_string(ps::ParserState)
if c == BACKSLASH
c = advance!(ps)
if c == LATIN_U # Unicode escape
append!(b, Vector{UInt8}(string(read_unicode_escape!(ps))))
append!(b, get_bytes(string(read_unicode_escape!(ps))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

append!(b, codeunits(s)) would work here … no need to force a copy to be made as in get_bytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem was that codeunits wasn't defined in Compat.jl to make it work on v0.6.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even using codeunits is not the best approach here.
Instead of having to create a string just to append it, it would be better to pass b to a read_unicode_escape!(b, ps) function, that directly push!ed the 1-4 bytes from the unicode escape.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question - why do you think my change would force a copy? It uses Vector{UInt8} on v0.6, which doesn't copy (and hence could be unsafe), and codeunits on v0.7, which doesn't copy either.
I think your objection was incorrect.

pc = ParserContext{unparameterize_type(dicttype), inttype}()
ps = MemoryParserState(Vector{UInt8}(String(str)), 1)
ps = MemoryParserState(get_bytes(str), 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better not to make a copy here.

Since MemoryParserState is only ever created from a String, it would be better to just (a) change the utf8data field to a String field and (b) call codeunit(s, i) to extract a byte from the string (e.g. in the byteat function). Note that codeunit already works in Julia 0.6.

Copy link
Contributor Author

@ScottPJones ScottPJones Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I'm trying to make the minimal changes here (as I was told many times before to do 🙂 ) to make it work on both v0.6 and later.
For a fast JSON parser, I'll probably just do it based on my Strs.jl package, not String.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestions require hardly any more code, and don't make more copies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't make less copies either. My change actually requires gets rid of the copies on v0.7, and doesn't add them for v0.6.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your version has makes more copies on 0.7 than 0.6, whereas mine does not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On v0.7, it calls codeunits. Isn't that what your's does on v0.7?

@@ -262,6 +262,7 @@ function read_unicode_escape!(ps)
end
end

get_bytes(str) = Vector{UInt8}(@static isdefined(Base, :codeunits) ? codeunits(str) : str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not necessary given the changes I suggest below.

However, it would be good to have a Compat.codeunits(s) function that just returns Vector{UInt8}(s) in Julia 0.6.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed a PR for Compat to add codeunits and ncodeunits

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If those are merged soon, then yes, I'll change this to use codeunits.

@ScottPJones
Copy link
Contributor Author

I tried to put this bump message on JuliaLang/Compat.jl#474 but wasn't able to:

The failure on Travis is because of something unrelated on master:

Error During Test at /home/travis/.julia/v0.7/Compat/test/runtests.jl:1079
  Test threw an exception of type MethodError
  Expression: Compat.IteratorSize(v) == Base.HasShape()
  MethodError: no method matching Base.HasShape()
  Stacktrace:
   [1] top-level scope at /home/travis/.julia/v0.7/Compat/test/runtests.jl:1079

Could this be merged so that I can fix #234 the way Steven would like?

@TotalVerb
Copy link
Collaborator

TotalVerb commented Jan 29, 2018

Sure, performance can be fixed later. I'd rather not have so many deprecations.

@TotalVerb TotalVerb merged commit 6a61914 into JuliaIO:master Jan 29, 2018
@stevengj
Copy link
Member

stevengj commented Jan 29, 2018

@TotalVerb, I think @ScottPJones was asking for the Compat PR to be merged, not this one.

I really wish this PR had not been merged. Now, instead of noisy deprecations, we have silent performance problems that we will have to remember to chase down later.

@TotalVerb TotalVerb mentioned this pull request Jan 29, 2018
5 tasks
@TotalVerb
Copy link
Collaborator

@stevengj I can revert if necessary, but I intend to revisit MemoryParserState anyway (as would be necessary to fix #232), so I figured it would not hurt that much. We can avoid tagging in the meantime.

@ScottPJones
Copy link
Contributor Author

I just came back to ask when the update to Compat would be tagged, so that I could push an update to this PR to use codeunits and ncodeunits.
As soon as that is tagged, I'll make another PR to JSON to use them (I don't like performance problems either, as you well know!). However, regarding performance problems, there are a lot of much more significant ones in this code, such as creating strings just to calculate the length when there are \u escapes, which is why (in this PR), I didn't worry about fixing all of it's problems (having had keeping PRs focused rather beaten into me!)

@ScottPJones ScottPJones deleted the spj/fixdeps branch January 29, 2018 13:16
@stevengj
Copy link
Member

stevengj commented Jan 29, 2018

Don't blame this on your tendency to raise unrelated issues (which you are doing again). The problem with this PR has nothing to do with minimalism: it updates code which made no copies in 0.6 to code that makes copies in 0.7, rather than keeping the equivalent behavior. It is a regression. I'm not in favor of PRs that introduce regressions in order to silence deprecations, whereas avoiding the regression required few or no additional lines of code.

@stevengj
Copy link
Member

@TotalVerb, I'd prefer that you revert.

TotalVerb added a commit that referenced this pull request Jan 30, 2018
@ScottPJones
Copy link
Contributor Author

ScottPJones commented Jan 30, 2018

it updates code which made no copies in 0.6 to code that makes copies in 0.7, rather than keeping the equivalent behavior

@stevengj: Stop making these incorrect statements, please. This change was NOT a regression.
I made a change that did not introduce any additional copies, and was simpler than your proposed change.

Also, I'd said that I'd change it to use codeunits as soon as that is available in Compat.
Your PR has been merged, but no new release has yet been tagged, so it still cannot be used.

@stevengj
Copy link
Member

stevengj commented Feb 1, 2018

Your change did introduce additional copies. Vector{UInt8}(codeunits(s)) makes a copy in 0.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants