Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve GitVersion performance #2885

Closed
candritzky opened this issue Oct 22, 2021 · 14 comments · Fixed by #3585
Closed

Improve GitVersion performance #2885

candritzky opened this issue Oct 22, 2021 · 14 comments · Fixed by #3585

Comments

@candritzky
Copy link
Contributor

GitVersion unnecessarily slows down the build process on solutions with many projects.

Detailed Description

GitVersion.exe has a performance problem when used in large solutions with several hundred projects.
Even though it takes less than a second for each project, this sums up to a couple of avoidable minutes for each (re-)build.
This is not so much of a problem on a build agent, but it is annoying during local development when the Git history hasn't changed but larger parts of the application need to be built due to my code changes.

Significant overhead is caused by the YamlDotNet library which is used for gitversion_cache and config file parsing.
The deserialization code is pretty slow on the first call ("cold start") for each deserialized type (here: Dictionary<string,string>, see VersionVariables.FromFileInternal and Config, see ConfigSerializer.Read). (Obviously it needs some internal "warmup" and caches the generated deserializers.)

Here's a performance trace (taken with JetBrains dotTrace) that shows how slow e. g. the gitversion_cache file parsing currently is:

image

And here's how fast it could be with some very simple/naive, hand-crafted parsing logic (that currently works only for the simply structured gitversion_cache file):

image

A similar performance hit (> 100 ms) can be seen when parsing the GitVersion Config file.

An additional performance gain seems to be achievable by streamlining the startup procedure (CreateHostBuilder and friends).

Context

Performance improvements are important because I'm slowed by down GitVersion by additional build times during development.
gitversion.exe is executed thousands of times per day and should be strongly optimized performance.

Possible Implementation

Replace YamlDotNet with a faster YAML parser, or find a way to somehow pre-compile the internal setup of its deserializers (optimized for the required data types).

@asbjornu
Copy link
Member

Interesting. But perhaps instead of replacing YamlDotNet, we can attempt the fast, naïve parser first, and if it fails, we can fallback to YamlDotNet?

@candritzky
Copy link
Contributor Author

Unfortunately the fast, naïve parser is only capable of parsing the flat structure used by the gitversion_cache file, but it's not (yet) capable of parsing the nested Config structure.

After more thinking I came to the conclusion that the best solution would be to execute the GitVersion logic in-process with MSBuild so that it is tied to the lifetime of the host process (MSBuild.exe) and therefore loaded (=cold-started) and initialized only once during compilation of a solution.

@jbaehr
Copy link
Contributor

jbaehr commented Oct 25, 2021

After more thinking I came to the conclusion that the best solution would be to execute the GitVersion logic in-process with MSBuild so that it is tied to the lifetime of the host process (MSBuild.exe) and therefore loaded (=cold-started) and initialized only once during compilation of a solution.

This sounds somewhat familiar... IIRC it used to be that way, i.e. no separate process but GitVersionTask, linked (and maybe ilmerged) to GitVersion.Core. I also vaguely remember a performance discussion here regarding this separate process when building large solutions; long before GitVersion.MsBuild replaced the GitVersionTask. Unfortunately I don't find it any more. @ruhullahshah, do you still have any references for that?


In addition to getting rid of spawning a process over and over again, I have another idea (no code, sorry) on how we could speed up the config/cache reading part:

  • The config is YAML because it's intended for humans, which is perfectly reasonable. And requires some compatibility guarantees, this this has to stay.
  • The cache however, is a private implementation detail, so this could be switched to some other format, optimized for fast parsing. It does not require to be portable across machines or even GitVersion releases. Going for System.Text.Json could be an option if you want to stay plain-text (and thus aiding a human debugger) but the fastest may be just using System.IO.BinaryWriter with a fixed record format, starting with an integer version number.
  • Parsing the YAML config could be kept to a minimum if we store the parsed content in the cache and use a hash to check whether it's still valid. This up-to-date-check could maybe even be done by LibGit2Sharp.

@candritzky
Copy link
Contributor Author

@jbaehr Here is a (stale) branch in the GitVersion repo that still has GitVersionTask:
https://github.com/GitTools/GitVersion/tree/feature/IntroduceInMemoryGitMetadata/src/GitVersionTask

Please note that the gitversion_cache parsing was just one example of the many performance issues that gitversion.exe has.
Parsing config file is even more expensive because of the more complex structure of the Config data type.
There is also significant overhead caused by the host builder and dependency injection bootstrap process.

@asbjornu
Copy link
Member

asbjornu commented Oct 26, 2021

The config is YAML because it's intended for humans, which is perfectly reasonable. And requires some compatibility guarantees, this this has to stay.

Agreed.

The cache however, is a private implementation detail, so this could be switched to some other format, optimized for fast parsing. It does not require to be portable across machines or even GitVersion releases.

I agree that we can chose a more efficient format for the cache. BinaryPack looks promising, but there's probably more formats and libraries we should consider.

Parsing the YAML config could be kept to a minimum if we store the parsed content in the cache and use a hash to check whether it's still valid. This up-to-date-check could maybe even be done by LibGit2Sharp.

As I just wrote in #2862 (review), I see no reason why Config can't be an immutable singleton, initialized at application start.

@jbaehr
Copy link
Contributor

jbaehr commented Oct 27, 2021

The cache however, is a private implementation detail, so this could be switched to some other format, optimized for fast parsing. It does not require to be portable across machines or even GitVersion releases.

I agree that we can chose a more efficient format for the cache. BinaryPack looks promising, but there's probably more formats and libraries we should consider.

I've no experience with BinaryPack specifically, but at a first glance it raises two concerns:

  • The author itself states in the readme that this library is no longer maintained and he recommends looking at MessagePack-Sharp first.
  • The speed claims focus on throughput and storage size. Both is not an issue for the GitVersion cache as it's only read (and maybe written) once in the lifetime of the process. What is critical though is the initial setup performance. And especially for this use case, reflection based dynamic IL-generation doesn't seem to be the best fit -- this only pays off when using the generated serializer over and over again.

There is also significant overhead caused by the host builder and dependency injection bootstrap process.

Indeed. When looking at the above pictures from @candritzky's benchmarks it seems to be around 300 ms. For a manually started executable I think it's an acceptable tradeoff between performance and maintainability of the code. But for an MSBuild Task that is potentially executed several hundred times per build it's not.

I'd really like to get rid of this external process spawning (which is somewhat expensive on its own) when talking about the MSBuild-integration. That means the MSBuild Task should link against the GitVersionCore (and IL-merged) as it used to be. So that the Task and the CLI are two independent front-ends, where each can make design decisions for it's own use case not affecting the other one. I.e. for the task we can go without a dynamic DI-container and wire up the required components manually; not affecting the structure of the standalone exe.

@asbjornu
Copy link
Member

We did have rather serious issues with the MSBuild task and EXE having very different capabilities because they were essentially completely separate codebases. But I agree that MSBuild should not need to call the EXE (if that's indeed what's happening) in order to get feature parity. The common parts should be moved into Core which can be used by both front-ends, preferably. Do you have any idea how much refactoring this would require?

@jbaehr
Copy link
Contributor

jbaehr commented Oct 27, 2021

Do you have any idea how much refactoring this would require?

Not really... It's been a long time since I poked around in the code base; still in the .NET Framework days. And a lot has changed since.

@stale
Copy link

stale bot commented Mar 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. After 30 days from now, it will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Mar 2, 2022
@candritzky
Copy link
Contributor Author

Comment just to keep the issue open.

@stale stale bot removed the stale label Mar 2, 2022
@github-actions
Copy link

github-actions bot commented Mar 4, 2023

This issue has been automatically marked as stale because it has not had recent activity. After 30 days from now, it will be closed if no further activity occurs.

@github-actions github-actions bot added the stale label Mar 4, 2023
@ChristoWolf
Copy link

Bump.

Is there an option that I am missing which would compzte a single version for the whole solution which would then get injected into each project?
For now I am manually doing this using the GitVersion CLI in my pipelines.
But it would also be nice to achieve this with the package.

@arturcic arturcic removed the stale label Mar 4, 2023
@asbjornu
Copy link
Member

asbjornu commented Mar 4, 2023

@ChristoWolf, no such option exists. Running GitVersion once on the CLI for an entire repository is the recommended workaround.

@arturcic
Copy link
Member

🎉 This issue has been resolved in version 6.0.0-beta.3 🎉
The release is available on:

Your GitReleaseManager bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants