Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Haste map optimizations #1289

Merged
merged 3 commits into from
Jul 15, 2016
Merged

Conversation

cpojer
Copy link
Member

@cpojer cpojer commented Jul 13, 2016

I'm beginning to work on some rewrites of jest-haste-map and I'm streamlining the projects around it a little bit. The HasteResolverContext was misnamed and wrongly wrapped in Promises, so I unwrapped it and fixed everything up.

The overall goal is to improve performance of jest-haste-map and to split up work into smaller chunks. I will send subsequent pull requests with a new implementation of the HasteContext type and new types for the FileMap and ModuleMap objects. I'm planning to experiment with a trie to store the file map to reduce data storage. It should not significantly increase access times for file metadata, so this should be fine. Second, I am working on persisting the file map and module map separately, as test workers only require the module map. Third, I can persist the two different maps at different times and delay some of this work until later and after tests have finished running. Finally, I have a few more optimizations, like for example reducing the path lengths in the module map.

I refactored jest-resolve into jest-resolve-dependencies. The parent thread will need an instance of both (see SearchSource) but child workers will only need an instance of jest-resolve. This should cut the time it takes to read the haste map in a worker by half, which should improve starting up a worker from 300ms to 150ms on a really really large codebase.

I created jest-file-exists. For the longest time I had the goal of improving the best case time to a lookup in the file map, which is always up-to-date. This should work great as we expand on things like snapshot cleanups, where we are testing whether the associated file exists – most of the time it does and we can short-circuit it.

cc @DmitriiAbramov

@ghost ghost added the CLA Signed ✔️ label Jul 13, 2016
@aaronabramov
Copy link
Contributor

why would you optimize the worker startup time? if we're optimizing the startup time of a single test running in band would be a better option in my opinion

} else {
throw error;
}
return Runtime.buildHasteMap(config, {maxWorkers})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this promise chain is super hard to read and most of it are tiny details that are hard to understand in runJest context
It would be much easier if it was composed of separate functions in declarative way

Runtime.buildHasteMap(config, {maxWorkers})
  .then(getPaths)
  .then(runTests)
  .then(processResults)
  .then(onComplete)
  .catch(handleErrors)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is shitty but out of the scope of this diff to fix it. We should refactor and write tests for this.

@cpojer
Copy link
Member Author

cpojer commented Jul 14, 2016

So this diff serves as the base for both speeding up single test runs and smaller test runs and I have further optimizations for small test runs planned. At the end, it probably won't make a big difference whether we run 1 or 10 tests at FB, unless of course one of those tests takes multiple seconds longer than the others by itself.

Right now we do:

  • Read the haste map, check file system delta, read the changed files, update the file map, create a new module map and persist both in one file.
  • In every worker, read the entire haste map before we can do anything else.

I want to get to:

  • Only update the module map when changes actually happen.
  • Persist the module map after (or during) an in-band-test-run so tests aren't blocked by it.
  • Only read the module map in workers (less than half the size).

This all just means we'll do less work, smarter, which will speed up things. Together with some of the things I mentioned on the top, it should get us into a good place.

@aaronabramov
Copy link
Contributor

yeah. that makes sense.
Would it be even possible to not have haste-map in workers at all and use some interface to communicate between worker and master process?
so that worker processes just ask for the list of files and master process sends them just the result

@cpojer
Copy link
Member Author

cpojer commented Jul 14, 2016

The workers don't need the list of files but they do need the map from module name to file path.

Unfortunately, sending the entire module map for every test is really slow in node. Unfortunately we don't have shared memory – because this data is immutable during a single run.

In February in #599 I tried to actually statically analyze every single test and it's transitive dependencies and only give each worker what they actually needed. All the static analysis actually made the runtime of a test go up from 90 seconds to more than one hour. This could have been sped up, definitely, but what is the point of doing this static analysis when it doesn't actually improve anything and makes tests more strict (like, no dynamic requires in JS tests, which is sometimes useful). It would have required to painfully and manually clean up 1000 test files at FB. You can find more info in this specific commit out of the 41 from that PR: b6eb94c

@cpojer cpojer force-pushed the haste-map-optimizations branch 2 times, most recently from 39ae874 to 02c7af0 Compare July 14, 2016 09:16
@cpojer cpojer merged commit 4cf8713 into jestjs:master Jul 15, 2016
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants