Skip to content
This repository has been archived by the owner on Sep 14, 2023. It is now read-only.

Find and fix possible performance bottlenecks #16

Closed
mre opened this issue Jul 3, 2018 · 5 comments
Closed

Find and fix possible performance bottlenecks #16

mre opened this issue Jul 3, 2018 · 5 comments
Labels
enhancement New feature or request mentorship

Comments

@mre
Copy link
Owner

mre commented Jul 3, 2018

Yesterday I did some profiling using the setup described here.
The resulting callgrind file is attached. This can be opened with qcachegrind on Mac or kcachegrind on Linux.

callgrind.out.35583.zip

If you don't have any of those programs handy, I've added a screenshot for the two main bottlenecks that I can see. I'm not an expert, but it looks like we spend a lot of time allocating, converting, and dropping the BTreeMap, which will be converted to a dictionary and returned to Python in the end.

I guess we could save a lot of time by making this part more efficient. E.g. by copying less and instead working on references. Might be mistaken, though. Help and pull requests are very welcome.
😊

hyperjson-bench

@mre mre added the enhancement New feature or request label Jul 3, 2018
@mre
Copy link
Owner Author

mre commented Jul 3, 2018

@RSabet, @wdv4758h fyi

@wdv4758h
Copy link
Contributor

wdv4758h commented Jul 4, 2018

Nice to know, thanks for keep me posted.

I already notice the performance is not so awesome the last time I randomly try on my machine with timeit.

@mre
Copy link
Owner Author

mre commented Jul 4, 2018

Yeah, it's definitely not. I guess at least partially because of needless copying. Would be nice to have a tool to visualize allocations and memory usage. I looked around but couldn't find any...

I guess I would start by optimizing this BTreeMap now.
The way I generated the above screenshot was by doing the following:

cargo build
valgrind --tool=callgrind --main-stacksize=1000000000 target/debug/hyperjson-bench
callgrind_annotate --auto=yes callgrind.out.35583 >out.rs
qcachegrind callgrind.out.35583

Running this from a Mac.
Note that I'm using a debug build here as suggested in this article to preserve the symbols in the binary and have "pretty output" in the end.
I also created a stand-alone bench.rs binary for this. Could have used benchmark tests, but that would be one more thing that requires nightly Rust and it wasn't obvious to me how to call cachegrind on the binary created from the benchmark test. We can change that later. For now, I guess it's okay for quickly testing performance improvements.

@mre mre added the mentorship label Jul 7, 2018
@mre
Copy link
Owner Author

mre commented Jul 9, 2018

Oh yeah, and if you profile don't forget to do that on a release build with debug flags. :trollface:
Add this to your Cargo.toml:

[profile.release]
debug = true

More info:
https://doc.rust-lang.org/book/second-edition/ch14-01-release-profiles.html

@mre
Copy link
Owner Author

mre commented Nov 19, 2018

This profiling run is outdated by now. Also, the issue is not really actionable as profiling and performance improvements will always be an ongoing effort. Therefore I'm closing this to keep the issue tracker clean.
As a not, if you have Linux and would like to profile, simply run make profile and you get a flamegraph. See the instructions in the README.md on how to set up perf for that.

@mre mre closed this as completed Nov 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request mentorship
Projects
None yet
Development

No branches or pull requests

2 participants