-
-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Jest startup time and test runtime, particularly when running with coverage, by caching micromatch and avoiding recreating RegExp instances #10131
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@@ -37,8 +37,55 @@ export type TestSelectionConfig = { | |||
watch?: boolean; | |||
}; | |||
|
|||
const globsToMatcher = (globs: Array<Config.Glob>) => (path: Config.Path) => | |||
micromatch([replacePathSepForGlob(path)], globs, {dot: true}).length > 0; | |||
const globsMatchers = new Map< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this work with watch mode, and changing the globs? I guess since the key is the glob itself and we don't collect all entries in the Map
we won't pull stale data out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's my understanding!
@jonschlinkert hiya, thoughts on this? |
I was profiling some Jest runs at Airbnb and noticed that on my MacBook Pro, we can spend over 2 seconds at Jest startup time in SearchSource getTestPaths. I believe that this will grow as the size of the codebase increases. Looking at the call stacks, it appears to be calling micromatch repeatedly, which calls picomatch, which builds a regex out of the globs. It seems that the parsing and regex building also triggers the garbage collector frequently. Upon testing, I noticed that the globs don't actually change between these calls, so we can save a bunch of work by making a micromatch matcher and reusing that function for all of the paths. micromatch has some logic internally to handle lists of globs that may include negated globs. A naive approach of just checking if it matched any of the globs won't capture that, so I copied and simplified the logic from within micromatch. https://github.com/micromatch/micromatch/blob/fe4858b0/index.js#L27-L77 In my profiling of this change locally, this brings down the time of startRun from about 2000ms to about 200ms.
To avoid having to stack a bunch of separate PRs, I've added another commit to this one to optimize the startup even more. |
After optimizing globsToMatcher, I noticed that there was still a lot of unnecessary overhead at Jest startup time spent recreating the same RegExp instances repeatedly. Thankfully, we can be a little smarter about this and create them all ahead of time and just reuse them. On top of my globsToMatcher optimization, this brings the speed of the ArrayMap in startRun down from about 160ms to about 7ms.
Alright friends, I think I'm done hacking on this now. Please let me know if you'd like me to make any changes. I'm really looking forward to seeing how this performs for us in CI! |
globs: Array<Config.Glob>, | ||
): (path: Config.Path) => boolean { | ||
if (globs.length === 0) { | ||
return (_: Config.Path): boolean => false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these type annotations needed? It's not inferred from the outer function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't expect them to be needed, but vscode showed me an eslint error when I didn't have them:
Missing return type on function. eslint@typescript-eslint/explicit-module-boundary-types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, wonder if that's fixed by the fresh 3.2 release? Either typescript-eslint/typescript-eslint#2169 or typescript-eslint/typescript-eslint#2176
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, seems promising. It looks like Jest is still on v2.30 though, so I think I'd like to say that updating the eslint plugin is out of scope for this PR if that's alright with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW the breaking changes seem to be pretty okay for this repo: https://github.com/typescript-eslint/typescript-eslint/releases/tag/v3.0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's out of scope for this 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SimenB I'm happy to tackle that upgrade, if you want to create an issue and assign it to me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to start using this in more places to improve performance. Moving it to jest-util seems like a better spot. Now that it is a standalone module, I decided to write some unit tests for this function. In doing so, I uncovered a small difference between the behavior of this function and micromatch when overlapping glob patterns are used which I also fixed.
While incorporating this function into more places, I discovered a discrepancy here with how micromatch works. We can fix this by creating a fast path for when there are no globs at all.
I've been profiling running Jest with code coverage at Airbnb, and noticed that shouldInstrument is called often and is fairly expensive. It also seems to call micromatch and `new RegExp` repeatedly, both of which can be optimized by caching the work to convert globs and strings into matchers and regexes. I profiled this change by running a set of 27 fairly simple tests. Before this change, about 6-7 seconds was spent in shouldInstrument. After this change, only 400-500 ms is spent there. I would expect this delta to increase along with the number of tests and size of their dependency graphs.
I was profiling some Jest runs at Airbnb and noticed that on my MacBook Pro, we can spend over 30 seconds after running Jest with code coverage as the coverage reporter adds all of the untested files. I believe that this will grow as the size of the codebase increases. Looking at the call stacks, it appears to be calling micromatch repeatedly, which calls picomatch, which builds a regex out of the globs. It seems that the parsing and regex building also triggers the garbage collector frequently. Since this is in a tight loop and the globs won't change between checks, we can greatly improve the performance here by using our new and optimized globsToMatcher function, which avoids re-parsing globs unnecessarily. This optimization reduces the block of time here from about 30s to about 10s. The aggregated total time of coverage reporter's onRunComplete goes from 23s to 600ms.
[replacePathSepForGlob(path.relative(config.rootDir, filename))], | ||
options.collectCoverageFrom, | ||
).length === 0 | ||
!globsToMatcher(options.collectCoverageFrom)( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that this will end up using the dot: true
option where before that wasn't used here (and above in this file). This seems like it should be fine, but I don't really have enough context here to know for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, seems fine. HasteMap should filter out any dotfiles we don't want already (I think)
} | ||
>(); | ||
|
||
// Every time micromatch is called, it will parse the glob strings and turn them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the windows builds was flaking and there's no way to re-run, so I decided to add some code comments to this file.
amazing job @lencioni :) |
This LGTM, I just want some more people to review it before merging |
The logic here might be a little confusing, so I am adding some comments that I hope will help make it easier for future explorers to understand. While I was doing this, I noticed a small way to simplify this function even more.
Tested this branch on my company project: duration decreased from 11 minutes to 4 minutes. Seems that it is one of the main reasons for the #9457 (comment) regression. @lencioni awesome job, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
Thanks @lencioni! Published as 26.1.0 |
In CI we currently spread our tests across 10 separate machines. When running all tests, the 10 machines on 26.0.1 had runtimes of:
On 26.1.0:
n=1 Not as much as I had wished for, but this is definitely a nice win for us! Thank you for the reviews and the release! |
… with coverage, by caching micromatch and avoiding recreating RegExp instances (jestjs#10131) Co-authored-by: Christoph Nakazawa <cpojer@fb.com>
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I was profiling some Jest runs at Airbnb and noticed that on my
MacBook Pro, we can spend over 2 seconds at Jest startup time in
SearchSource getTestPaths. I believe that this will grow as the size
of the codebase increases.
Looking at the call stacks, it appears to be calling micromatch
repeatedly, which calls picomatch, which builds a regex out of the
globs. It seems that the parsing and regex building also triggers the
garbage collector frequently.
Upon testing, I noticed that the globs don't actually change between
these calls, so we can save a bunch of work by making a micromatch
matcher and reusing that function for all of the paths.
micromatch has some logic internally to handle lists of globs that
may include negated globs. A naive approach of just checking if it
matched any of the globs won't capture that, so I copied and
simplified the logic from within micromatch.
https://github.com/micromatch/micromatch/blob/fe4858b0/index.js#L27-L77
In my profiling of this change locally, this brings down the time of
startRun from about 2000ms to about 200ms.
Before:
Before, zoomed in:
After:
Avoid recreating RegExp instances in regexToMatcher
After optimizing globsToMatcher, I noticed that there was still a
lot of unnecessary overhead at Jest startup time spent recreating
the same RegExp instances repeatedly. Thankfully, we can be a little
smarter about this and create them all ahead of time and just reuse
them.
On top of my globsToMatcher optimization, this brings the speed of
the ArrayMap in startRun down from about 160ms to about 7ms.
Before:
After:
Optimize micromatch and RegExps in shouldInstrument
I've been profiling running Jest with code coverage at Airbnb, and
noticed that shouldInstrument is called often and is fairly
expensive. It also seems to call micromatch and
new RegExp
repeatedly, both of which can be optimized by caching the work to
convert globs and strings into matchers and regexes.
I profiled this change by running a set of 27 fairly simple tests.
Before this change, about 6-7 seconds was spent in shouldInstrument.
After this change, only 400-500 ms is spent there. I would expect
this delta to increase along with the number of tests and size of
their dependency graphs.
A typical shouldInstrument call before this change:
A typical shouldInstrument call after this change (most are actually too fast to even show up in the profiler now):
I was profiling some Jest runs at Airbnb and noticed that on my
MacBook Pro, we can spend over 30 seconds after running Jest with code
coverage as the coverage reporter adds all of the untested files. I
believe that this will grow as the size of the codebase increases.
Looking at the call stacks, it appears to be calling micromatch
repeatedly, which calls picomatch, which builds a regex out of the
globs. It seems that the parsing and regex building also triggers the
garbage collector frequently.
Since this is in a tight loop and the globs won't change between
checks, we can greatly improve the performance here by using
micromatch.matcher.
This optimization reduces the block of time here from about 30s to
about 10s. The aggregated total time of coverage reporter's
onRunComplete goes from 23s to 600ms.
Before:
Before, zoomed in:
After:
After, zoomed in:
Summary
Motivation: Improve slow Jest startup and runtime speed, particularly when running with coverage
Test plan
I ran jest in the Airbnb frontend monorepo with and without coverage options, with a path argument.