Perf: Don't allocate as much in Decoder::string #168
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
On a sample module, time taken just in the Decoder::string method was 15% of the entire parse time, a significant majority of which is in the
extend
call reallocating over and over. This is obviously a hot spot, so it should be optimized a little more.So, instead of consuming words and building up a buffer, do a scan over the bytes, extract the string, and advance the proper length. This does only one allocation, the
.to_string
at the end, as opposed to the log(n) allocations before.This new method is only 5% of the entire parse time. Parse time for the module went from
14.866765ms
to12.630514ms
, a 15% improvement. 1% of that 5% time is scanning for the null byte, another 1% for utf8 validation, and 3% for theto_string
allocation/copy.(Unfortunately there is no std function for null byte scan and utf8 validation at the same time, which would speed this up even more 😢. Also, there's an optimized
memchr
function in std, but frustratingly, it's not exposed anywhere - it's so close right here, but they discard the position :( https://github.com/rust-lang/rust/blob/efbaa413061c2a6e52f06f00a60ee7830fcf3ea5/library/core/src/slice/cmp.rs#L273)