Find and fix possible performance bottlenecks #16

mre · 2018-07-03T22:17:51Z

Yesterday I did some profiling using the setup described here.
The resulting callgrind file is attached. This can be opened with qcachegrind on Mac or kcachegrind on Linux.

callgrind.out.35583.zip

If you don't have any of those programs handy, I've added a screenshot for the two main bottlenecks that I can see. I'm not an expert, but it looks like we spend a lot of time allocating, converting, and dropping the BTreeMap, which will be converted to a dictionary and returned to Python in the end.

I guess we could save a lot of time by making this part more efficient. E.g. by copying less and instead working on references. Might be mistaken, though. Help and pull requests are very welcome.
😊

mre · 2018-07-03T22:18:28Z

@RSabet, @wdv4758h fyi

wdv4758h · 2018-07-04T13:31:15Z

Nice to know, thanks for keep me posted.

I already notice the performance is not so awesome the last time I randomly try on my machine with timeit.

mre · 2018-07-04T13:48:23Z

Yeah, it's definitely not. I guess at least partially because of needless copying. Would be nice to have a tool to visualize allocations and memory usage. I looked around but couldn't find any...

I guess I would start by optimizing this BTreeMap now.
The way I generated the above screenshot was by doing the following:

cargo build
valgrind --tool=callgrind --main-stacksize=1000000000 target/debug/hyperjson-bench
callgrind_annotate --auto=yes callgrind.out.35583 >out.rs
qcachegrind callgrind.out.35583

Running this from a Mac.
Note that I'm using a debug build here as suggested in this article to preserve the symbols in the binary and have "pretty output" in the end.
I also created a stand-alone bench.rs binary for this. Could have used benchmark tests, but that would be one more thing that requires nightly Rust and it wasn't obvious to me how to call cachegrind on the binary created from the benchmark test. We can change that later. For now, I guess it's okay for quickly testing performance improvements.

mre · 2018-07-09T08:45:07Z

Oh yeah, and if you profile don't forget to do that on a release build with debug flags.
Add this to your Cargo.toml:

[profile.release]
debug = true

More info:
https://doc.rust-lang.org/book/second-edition/ch14-01-release-profiles.html

mre · 2018-11-19T23:43:59Z

This profiling run is outdated by now. Also, the issue is not really actionable as profiling and performance improvements will always be an ongoing effort. Therefore I'm closing this to keep the issue tracker clean.
As a not, if you have Linux and would like to profile, simply run make profile and you get a flamegraph. See the instructions in the README.md on how to set up perf for that.

mre added the enhancement New feature or request label Jul 3, 2018

mre added the mentorship label Jul 7, 2018

mre closed this as completed Nov 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find and fix possible performance bottlenecks #16

Find and fix possible performance bottlenecks #16

mre commented Jul 3, 2018

mre commented Jul 3, 2018

wdv4758h commented Jul 4, 2018

mre commented Jul 4, 2018

mre commented Jul 9, 2018 •

edited

Loading

mre commented Nov 19, 2018

Find and fix possible performance bottlenecks #16

Find and fix possible performance bottlenecks #16

Comments

mre commented Jul 3, 2018

mre commented Jul 3, 2018

wdv4758h commented Jul 4, 2018

mre commented Jul 4, 2018

mre commented Jul 9, 2018 • edited Loading

mre commented Nov 19, 2018

mre commented Jul 9, 2018 •

edited

Loading