-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hyperfine benchmarks #163
Hyperfine benchmarks #163
Conversation
I noticed that when using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, great idea with Docker! This way the benchmarks are reproducible. And if something (e.g. compile flags) are not fair, people can correct them.
Yeah, the features of the various clients are different. Every new feature costs a little bit performance. Tealdeer without custom formatting and no support for a config file is surely faster than the version we've arrived at by now. fast-tldr states "We strive to be the fastest tldr-client with no compromises regarding features". However, speed and features are always a compromise 🙂
I tried to test this, but right now fast-tldr complains about a missing glibc:
./fast-tldr: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by ./fast-tldr)
PS: Make sure to initialize all caches, and ensure that the clients don't clash (e.g. if they use the same config path or cache path). If possible, use a client-local cache directory, to ensure proper separation.
Just so I'm up to date: @niklasmohrin this is still WIP, right? (I'll wait with another review until you think it's ready.) |
@dbrgn Yeah I am a bit out of the loop myself, I will have to come back to this with the zig build some time soon. I will just request the review once it's ready. About the results: I think it would be fairest if we included some evaluation with the results, for example that fast-tldr doesn't really parse the files and just outputs variables as they are read, otherwise we really are just breaking our own legs here We have some performance PRs on the way, I am excited about #187. Is there any news on #108 or a way to get rid of docopt otherwise? |
Yeah, I agree, clients that simply stream a file to stdout aren't really fair comparison. |
I'm working on #108 right now. It would be kind of nice to have benchmarks before and after, but that can also be done after the PR was merged. |
Oh, before and after seems like a nice thing to have too, well then I can't rest on that argument anymore 😆 |
Yeah, but as I mentioned, we can simply backport the hyperfine docker script to an old version of the repository. That's no problem 🙂 |
7026b2d
to
a44815e
Compare
@dbrgn I think I am satisfied with the current state of Dockerfile. I found a way to drop the disk caches from within the container, so that we can test with cold caches (which is closer to the actual use case). Unless you have something else to add, the only thing left is getting outfieldr to build, don't know when dev will respond (only messaged earlier). With #187 merged, we are also standing pretty strong against fast-tldr again, only falling 5-10% behind. I am wondering if we can reproduce the 30x speedup that outfieldr claim on their README. Nevertheless, I think having all the results in a table that outlines the tradeoffs is fair to everyone |
9ac45da
to
32148c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool, thanks @niklasmohrin!
I think the Node and Python clients should be listed as well in the comparison, because both are quite commonly being used (the Python client is the one packaged in official Arch repos) and it's important to show the difference in invocation time between interpreted and compiled languages. However, feel free to merge the current state as-is, we can always improve it later on.
Here are my numbers for the benchmarks on a recent Ryzen 5900X system:
tealdeer 14.0 ms ± 0.8 ms
tldr-c 29.5 ms ± 1.4 ms
tldr-hs 19.6 ms ± 0.6 ms
fast-tldr 14.3 ms ± 0.7 ms
tldr-bash 25.8 ms ± 0.6 ms
outfieldr 6.1 ms ± 0.5 ms
tldr-node 403.8 ms ± 9.0 ms
It's interesting that here the tealdeer numbers are better than fast-tldr, I wonder why that's the case? (Feel free to use those numbers if you want to make us "the second-fastest client on my system" 😜)
PS: I'm wondering whether in the long run the benchmarking should be moved to a separate repository to simplify maintenance (the benchmarks can be interesting as a commonly-maintained reference for all tldr clients), but for now we can keep it inside the tealdeer repository as long as we don't get a "please add/update this implementation in the benchmark" request every week 🙂
From this perspective, having node and python in the list makes sense, yeah. I didn't want to run the node app 500 times, but I have two ideas for how to present that:
I think I'd prefer the first option, because it's less (redundant) numbers on the screen and still presents the main point that node is an order of magnitude slower. What do you think? Having this code in a separate repo means it is its own thing, whereas here it's just documentation for how numbers in a readme were generated. I would like to keep the impression of the latter for as long as possible. I was wondering if we should set up a gh action to run the benchmarks (I think you can run privileged docker containers inside of the vm), but the ci runners are afaik quite the opposite of a cold machine, so I'm not sure (it's more work too ofc). It would add credibility though, since the numbers don't just come from a single person |
50 runs are perfectly fine. Btw, why don't we simply omit the |
Without this, it would do too few runs leading to too much variance. I also used |
Doesn't hyperfine take the variance into account? I assumed so, but maybe I'm wrong... |
I am not sure, maybe it does to a degree, but with the small number of runs it was still somewhat unstable when I tested |
Ok, then let's not hold up this PR any more, we can do additional improvements later. Feel free to merge! |
(I gotta add python first though) |
@dbrgn I think everything is ready |
Can you build the Dockerfile? On my machine,
|
Ah, I think it's the |
That works, although I'd pin the version to |
Prior to today, I had it pinned to 8.6, but the build would fail. I think that I think running on a cold cache is most realistic, what do you think? |
Ok, I don't know about that, you might be right. (I personally prefer to pin versions to avoid unexpected incompatibilities when a newer version of the image is pushed.)
Yes, I agree! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to squash 🙂
whoops, apparently I think I would separate the docker and readme changes, because we may want to trace the table in the readme separately from the benchmark setup |
Yep, that's the case.
Yeah, that makes sense. I was referring to the fixup commit 🙂 Feel free to merge now. |
Closes #129
I started putting together a Dockerfile that gets all the clients and benches them with hyperfine. I decided to do it all in Docker, because I would not want to have all the clients on my system and to make it accessible for third parties to easily verify the results that our benchmarks produce.
At the time of writing, only tealdeer and the C client are compared, but I will add more clients.
Eventually, we could think about running benchmarks in CI, but this is not the focus of this PR.