Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperfine benchmarks #163

Merged
merged 2 commits into from
Jul 30, 2021
Merged

Conversation

niklasmohrin
Copy link
Collaborator

Closes #129

I started putting together a Dockerfile that gets all the clients and benches them with hyperfine. I decided to do it all in Docker, because I would not want to have all the clients on my system and to make it accessible for third parties to easily verify the results that our benchmarks produce.

At the time of writing, only tealdeer and the C client are compared, but I will add more clients.

Eventually, we could think about running benchmarks in CI, but this is not the focus of this PR.

@niklasmohrin niklasmohrin marked this pull request as draft February 4, 2021 15:44
@niklasmohrin
Copy link
Collaborator Author

I noticed that when using fast-tldr (or tldr-hs), the output still has the {{ and }} which is not intended by the client specification. I am not sure how to compensate these differences fairly. After all, fast-tldr runs considerably faster than tealdeer in hyperfine.

Copy link
Collaborator

@dbrgn dbrgn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great idea with Docker! This way the benchmarks are reproducible. And if something (e.g. compile flags) are not fair, people can correct them.

Yeah, the features of the various clients are different. Every new feature costs a little bit performance. Tealdeer without custom formatting and no support for a config file is surely faster than the version we've arrived at by now. fast-tldr states "We strive to be the fastest tldr-client with no compromises regarding features". However, speed and features are always a compromise 🙂

I tried to test this, but right now fast-tldr complains about a missing glibc:

./fast-tldr: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by ./fast-tldr)

PS: Make sure to initialize all caches, and ensure that the clients don't clash (e.g. if they use the same config path or cache path). If possible, use a client-local cache directory, to ensure proper separation.

benchmarks/Dockerfile Outdated Show resolved Hide resolved
@dbrgn
Copy link
Collaborator

dbrgn commented May 9, 2021

Just so I'm up to date: @niklasmohrin this is still WIP, right? (I'll wait with another review until you think it's ready.)

@niklasmohrin
Copy link
Collaborator Author

@dbrgn Yeah I am a bit out of the loop myself, I will have to come back to this with the zig build some time soon. I will just request the review once it's ready.

About the results: I think it would be fairest if we included some evaluation with the results, for example that fast-tldr doesn't really parse the files and just outputs variables as they are read, otherwise we really are just breaking our own legs here

We have some performance PRs on the way, I am excited about #187. Is there any news on #108 or a way to get rid of docopt otherwise?

@dbrgn
Copy link
Collaborator

dbrgn commented May 9, 2021

Yeah, I agree, clients that simply stream a file to stdout aren't really fair comparison.

@dbrgn
Copy link
Collaborator

dbrgn commented May 9, 2021

I'm working on #108 right now. It would be kind of nice to have benchmarks before and after, but that can also be done after the PR was merged.

@niklasmohrin
Copy link
Collaborator Author

Oh, before and after seems like a nice thing to have too, well then I can't rest on that argument anymore 😆

@dbrgn
Copy link
Collaborator

dbrgn commented May 9, 2021

Yeah, but as I mentioned, we can simply backport the hyperfine docker script to an old version of the repository. That's no problem 🙂

@niklasmohrin niklasmohrin force-pushed the hyperfine branch 2 times, most recently from 7026b2d to a44815e Compare June 2, 2021 22:35
@niklasmohrin
Copy link
Collaborator Author

@dbrgn I think I am satisfied with the current state of Dockerfile. I found a way to drop the disk caches from within the container, so that we can test with cold caches (which is closer to the actual use case).

Unless you have something else to add, the only thing left is getting outfieldr to build, don't know when dev will respond (only messaged earlier).

With #187 merged, we are also standing pretty strong against fast-tldr again, only falling 5-10% behind. I am wondering if we can reproduce the 30x speedup that outfieldr claim on their README.

Nevertheless, I think having all the results in a table that outlines the tradeoffs is fair to everyone

README.md Outdated Show resolved Hide resolved
benchmarks/Dockerfile Outdated Show resolved Hide resolved
@niklasmohrin niklasmohrin force-pushed the hyperfine branch 2 times, most recently from 9ac45da to 32148c5 Compare June 5, 2021 16:08
@niklasmohrin niklasmohrin requested a review from dbrgn June 5, 2021 16:17
Copy link
Collaborator

@dbrgn dbrgn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool, thanks @niklasmohrin!

I think the Node and Python clients should be listed as well in the comparison, because both are quite commonly being used (the Python client is the one packaged in official Arch repos) and it's important to show the difference in invocation time between interpreted and compiled languages. However, feel free to merge the current state as-is, we can always improve it later on.

Here are my numbers for the benchmarks on a recent Ryzen 5900X system:

tealdeer 14.0 ms ± 0.8 ms
tldr-c 29.5 ms ± 1.4 ms
tldr-hs 19.6 ms ± 0.6 ms
fast-tldr 14.3 ms ± 0.7 ms
tldr-bash 25.8 ms ± 0.6 ms
outfieldr 6.1 ms ± 0.5 ms
tldr-node 403.8 ms ± 9.0 ms

It's interesting that here the tealdeer numbers are better than fast-tldr, I wonder why that's the case? (Feel free to use those numbers if you want to make us "the second-fastest client on my system" 😜)

PS: I'm wondering whether in the long run the benchmarking should be moved to a separate repository to simplify maintenance (the benchmarks can be interesting as a commonly-maintained reference for all tldr clients), but for now we can keep it inside the tealdeer repository as long as we don't get a "please add/update this implementation in the benchmark" request every week 🙂

README.md Outdated Show resolved Hide resolved
@niklasmohrin
Copy link
Collaborator Author

From this perspective, having node and python in the list makes sense, yeah. I didn't want to run the node app 500 times, but I have two ideas for how to present that:

  1. We list node in the table, but with an asterisk to note that the mean is only from 50 runs or
  2. There are actually two columns, one for a benchmark with 500 runs and one with a benchmark with only 50 runs. Node just wouldn't appear in the 500 column.

I think I'd prefer the first option, because it's less (redundant) numbers on the screen and still presents the main point that node is an order of magnitude slower. What do you think?


Having this code in a separate repo means it is its own thing, whereas here it's just documentation for how numbers in a readme were generated. I would like to keep the impression of the latter for as long as possible. I was wondering if we should set up a gh action to run the benchmarks (I think you can run privileged docker containers inside of the vm), but the ci runners are afaik quite the opposite of a cold machine, so I'm not sure (it's more work too ofc). It would add credibility though, since the numbers don't just come from a single person

@dbrgn
Copy link
Collaborator

dbrgn commented Jun 6, 2021

50 runs are perfectly fine. Btw, why don't we simply omit the --runs 400 argument? Hyperfine will try to determine on its own how many runs are required to achieve statistical significance. The number of runs is not really relevant to the user, as long as the numbers are repeatable.

@niklasmohrin
Copy link
Collaborator Author

Without this, it would do too few runs leading to too much variance. I also used --min-runs but that would just mean that everyone ran about that number of times and then I came to the conclusion that I could just fix the number of runs 🤷‍♂️ I'll see how reliable 50 runs are for me and check back in with you

@dbrgn
Copy link
Collaborator

dbrgn commented Jun 10, 2021

Doesn't hyperfine take the variance into account? I assumed so, but maybe I'm wrong...

@niklasmohrin
Copy link
Collaborator Author

I am not sure, maybe it does to a degree, but with the small number of runs it was still somewhat unstable when I tested

@dbrgn
Copy link
Collaborator

dbrgn commented Jun 11, 2021

Ok, then let's not hold up this PR any more, we can do additional improvements later. Feel free to merge!

@niklasmohrin
Copy link
Collaborator Author

(I gotta add python first though)

@niklasmohrin
Copy link
Collaborator Author

@dbrgn I think everything is ready

@dbrgn
Copy link
Collaborator

dbrgn commented Jul 30, 2021

Can you build the Dockerfile? On my machine, tldr-hs fails every time (even with --no-cache).

Step 10/39 : RUN git clone https://github.com/psibi/tldr-hs.git         && cd tldr-hs         && stack build --install-ghc
 ---> Running in ea506c72af75
Cloning into 'tldr-hs'...
Downloading lts-18.0 build plan ...
RedownloadInvalidResponse Request {
  host                 = "raw.githubusercontent.com"
  port                 = 443
  secure               = True
  requestHeaders       = []
  path                 = "/fpco/lts-haskell/master//lts-18.0.yaml"
  queryString          = ""
  method               = "GET"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutDefault
  requestVersion       = HTTP/1.1
}
 "/root/.stack/build-plan/lts-18.0.yaml" (Response {responseStatus = Status {statusCode = 404, statusMessage = "Not Found"}, responseVersion = HTTP/1.1, responseHeaders = [("Connection","keep-alive"),("Content-Length","14"),("Content-Security-Policy","default-src 'none'; style-src 'unsafe-inline'; sandbox"),("Strict-Transport-Security","max-age=31536000"),("X-Content-Type-Options","nosniff"),("X-Frame-Options","deny"),("X-XSS-Protection","1; mode=block"),("Content-Type","text/plain; charset=utf-8"),("X-GitHub-Request-Id","C986:10058:47268:7522A:6103AE47"),("Accept-Ranges","bytes"),("Date","Fri, 30 Jul 2021 07:46:15 GMT"),("Via","1.1 varnish"),("X-Served-By","cache-fra19120-FRA"),("X-Cache","MISS"),("X-Cache-Hits","0"),("X-Timer","S1627631175.106098,VS0,VE149"),("Vary","Authorization,Accept-Encoding"),("Access-Control-Allow-Origin","*"),("X-Fastly-Request-ID","0468e71bfb3d2f26e693cecec5a72f9a0374bb0b"),("Expires","Fri, 30 Jul 2021 07:51:15 GMT"),("Source-Age","0")], responseBody = (), responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose})
The command '/bin/sh -c git clone https://github.com/psibi/tldr-hs.git         && cd tldr-hs         && stack build --install-ghc' returned a non-zero code: 1

@niklasmohrin
Copy link
Collaborator Author

Ah, I think it's the haskell:8.6, I reverted too far after trying to avoid building half the haskell ecosystem... I tested a no-cache rebuild (when I still had just haskell) and it worked there, I am gonna build again now before pushing :D

@dbrgn
Copy link
Collaborator

dbrgn commented Jul 30, 2021

That works, although I'd pin the version to haskell:9.0 instead. Building works for me with that version.

@dbrgn
Copy link
Collaborator

dbrgn commented Jul 30, 2021

It's interesting how big the effect of the disk cache is! After running sync; echo 3 | tee /proc/sys/vm/drop_caches, a tealdeer run takes around 14 ms. For subsequent calls, it's only 2-4 ms.

Here are my numbers:

2021-07-30-132454_1164x1013_scrot

@niklasmohrin
Copy link
Collaborator Author

That works, although I'd pin the version to haskell:9.0 instead. Building works for me with that version.

Prior to today, I had it pinned to 8.6, but the build would fail. I think that --install-ghc always figures out the needed version, which is why I just had it be the latest

I think running on a cold cache is most realistic, what do you think?

@dbrgn
Copy link
Collaborator

dbrgn commented Jul 30, 2021

which is why I just had it be the latest

Ok, I don't know about that, you might be right. (I personally prefer to pin versions to avoid unexpected incompatibilities when a newer version of the image is pushed.)

I think running on a cold cache is most realistic, what do you think?

Yes, I agree!

Copy link
Collaborator

@dbrgn dbrgn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to squash 🙂

@niklasmohrin
Copy link
Collaborator Author

whoops, apparently --autosquash does nothing without --interactive 🤔

I think I would separate the docker and readme changes, because we may want to trace the table in the readme separately from the benchmark setup

@dbrgn
Copy link
Collaborator

dbrgn commented Jul 30, 2021

whoops, apparently --autosquash does nothing without --interactive 🤔

Yep, that's the case.

I think I would separate the docker and readme changes, because we may want to trace the table in the readme separately from the benchmark setup

Yeah, that makes sense. I was referring to the fixup commit 🙂 Feel free to merge now.

@niklasmohrin niklasmohrin merged commit 123c809 into tealdeer-rs:master Jul 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Benchmarks with hyperfine
3 participants