Haste map optimizations #1289

cpojer · 2016-07-13T11:18:17Z

I'm beginning to work on some rewrites of jest-haste-map and I'm streamlining the projects around it a little bit. The HasteResolverContext was misnamed and wrongly wrapped in Promises, so I unwrapped it and fixed everything up.

The overall goal is to improve performance of jest-haste-map and to split up work into smaller chunks. I will send subsequent pull requests with a new implementation of the HasteContext type and new types for the FileMap and ModuleMap objects. I'm planning to experiment with a trie to store the file map to reduce data storage. It should not significantly increase access times for file metadata, so this should be fine. Second, I am working on persisting the file map and module map separately, as test workers only require the module map. Third, I can persist the two different maps at different times and delay some of this work until later and after tests have finished running. Finally, I have a few more optimizations, like for example reducing the path lengths in the module map.

I refactored jest-resolve into jest-resolve-dependencies. The parent thread will need an instance of both (see SearchSource) but child workers will only need an instance of jest-resolve. This should cut the time it takes to read the haste map in a worker by half, which should improve starting up a worker from 300ms to 150ms on a really really large codebase.

I created jest-file-exists. For the longest time I had the goal of improving the best case time to a lookup in the file map, which is always up-to-date. This should work great as we expand on things like snapshot cleanups, where we are testing whether the associated file exists – most of the time it does and we can short-circuit it.

cc @DmitriiAbramov

aaronabramov · 2016-07-13T22:10:31Z

why would you optimize the worker startup time? if we're optimizing the startup time of a single test running in band would be a better option in my opinion

aaronabramov · 2016-07-13T22:24:00Z

packages/jest-cli/src/jest.js

-      } else {
-        throw error;
-      }
+  return Runtime.buildHasteMap(config, {maxWorkers})


this promise chain is super hard to read and most of it are tiny details that are hard to understand in runJest context
It would be much easier if it was composed of separate functions in declarative way

Runtime.buildHasteMap(config, {maxWorkers}) .then(getPaths) .then(runTests) .then(processResults) .then(onComplete) .catch(handleErrors)

I agree this is shitty but out of the scope of this diff to fix it. We should refactor and write tests for this.

cpojer · 2016-07-14T00:25:28Z

So this diff serves as the base for both speeding up single test runs and smaller test runs and I have further optimizations for small test runs planned. At the end, it probably won't make a big difference whether we run 1 or 10 tests at FB, unless of course one of those tests takes multiple seconds longer than the others by itself.

Right now we do:

Read the haste map, check file system delta, read the changed files, update the file map, create a new module map and persist both in one file.
In every worker, read the entire haste map before we can do anything else.

I want to get to:

Only update the module map when changes actually happen.
Persist the module map after (or during) an in-band-test-run so tests aren't blocked by it.
Only read the module map in workers (less than half the size).

This all just means we'll do less work, smarter, which will speed up things. Together with some of the things I mentioned on the top, it should get us into a good place.

aaronabramov · 2016-07-14T00:42:53Z

yeah. that makes sense.
Would it be even possible to not have haste-map in workers at all and use some interface to communicate between worker and master process?
so that worker processes just ask for the list of files and master process sends them just the result

cpojer · 2016-07-14T00:49:14Z

The workers don't need the list of files but they do need the map from module name to file path.

Unfortunately, sending the entire module map for every test is really slow in node. Unfortunately we don't have shared memory – because this data is immutable during a single run.

In February in #599 I tried to actually statically analyze every single test and it's transitive dependencies and only give each worker what they actually needed. All the static analysis actually made the runtime of a test go up from 90 seconds to more than one hour. This could have been sped up, definitely, but what is the point of doing this static analysis when it doesn't actually improve anything and makes tests more strict (like, no dynamic requires in JS tests, which is sometimes useful). It would have required to painfully and manually clean up 1000 test files at FB. You can find more info in this specific commit out of the 41 from that PR: b6eb94c

github-actions · 2021-05-14T14:13:22Z

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

ghost added the CLA Signed ✔️ label Jul 13, 2016

aaronabramov reviewed Jul 13, 2016
View reviewed changes

cpojer force-pushed the haste-map-optimizations branch 2 times, most recently from 39ae874 to 02c7af0 Compare July 14, 2016 09:16

cpojer added 2 commits July 15, 2016 10:09

Start refactoring jest-haste-map.

0dc9803

Create jest-file-exists and jest-resolve-dependencies.

23ce2ba

cpojer force-pushed the haste-map-optimizations branch from 02c7af0 to 0632d89 Compare July 15, 2016 01:09

Flow fixes.

f2a1d31

cpojer force-pushed the haste-map-optimizations branch from 0632d89 to f2a1d31 Compare July 15, 2016 01:10

cpojer merged commit 4cf8713 into jestjs:master Jul 15, 2016

github-actions bot locked as resolved and limited conversation to collaborators May 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Haste map optimizations #1289

Haste map optimizations #1289

cpojer commented Jul 13, 2016

aaronabramov commented Jul 13, 2016

aaronabramov Jul 13, 2016

cpojer Jul 14, 2016

cpojer commented Jul 14, 2016

aaronabramov commented Jul 14, 2016

cpojer commented Jul 14, 2016

github-actions bot commented May 14, 2021

Haste map optimizations #1289

Haste map optimizations #1289

Conversation

cpojer commented Jul 13, 2016

aaronabramov commented Jul 13, 2016

aaronabramov Jul 13, 2016

Choose a reason for hiding this comment

cpojer Jul 14, 2016

Choose a reason for hiding this comment

cpojer commented Jul 14, 2016

aaronabramov commented Jul 14, 2016

cpojer commented Jul 14, 2016

github-actions bot commented May 14, 2021