Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--link-duplicates flag. Creates hardlinks to duplicate files in node_modules #2620

Merged
merged 7 commits into from
Feb 9, 2017

Conversation

bestander
Copy link
Member

Summary
An attempt to mitigate inefficient hoisting for create-react-app and #2306.

Inspired by https://github.com/jeffbski/pkglink, instead of running copy for all repeated modules in node_modules this algorithm does only one copy and then hardlinks every file in the duplicate modules.

The win is in speed and size.

$ yarn install
yarn install v0.21.0-0
[1/4] 🔍  Resolving packages...
[2/4] 🚚  Fetching packages...
[3/4] 🔗  Linking dependencies...
[4/4] 📃  Building fresh packages...
✨  Done in 112.94s.

$ du -sh node_modules/
530M	node_modules/

$ rm -rf node_modules/
$ yarn --link-duplicates
yarn install v0.21.0-0
[1/4] 🔍  Resolving packages...
[2/4] 🚚  Fetching packages...
[3/4] 🔗  Linking dependencies...
[4/4] 📃  Building fresh packages...
✨  Done in 49.44s.

$ du -sh node_modules/
111M	node_modules/

$ npm -v
3.10.8
$ rm -rf node_modules/
$ time npm install
real	1m22.725s
$ du -sh node_modules/
126M	node_modules/

Test plan

  • Added tests.
  • Enabled by default and ran all integration tests
  • tested on create-react-app

@bestander
Copy link
Member Author

Works on Mac and Windows 10 NTFS.
TODO: add test whether hardlinks are supported (should fail on fat FS)

@wtgtybhertgeghgtwtg
Copy link
Contributor

How does this relate to #2306? Is the hoisting algorithm changed, as well?

@bestander
Copy link
Member Author

It does not implement #2306, we'll need to improve hoisting algorithm anyway.
This one just reduces the folder size

@gaearon
Copy link
Contributor

gaearon commented Feb 4, 2017

WOW.
Do you think we could enable this by default in CRA? Are there good reasons not to?

@bestander
Copy link
Member Author

If we don't find any bad side effects I think we could enable it by default everywhere.
Otherwise maybe CRA should have a .yarnrc file that would have this flag turned on.
Also we should fix the root cause (hoisting algorithm) anyway.

Copy link
Member

@Daniel15 Daniel15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me :)

src/util/fs.js Outdated
// custom concurrency logic as we're always executing stacks of 4 queue items
// at a time due to the requirement to push items onto the queue
while (queue.length) {
const items = queue.splice(0, 4);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe pull out the 4 as a constant (const CONCURRENT_QUEUE_ITEMS = 4 or something like that, that explains what it means)

src/util/fs.js Outdated
}

// correct hardlink
if (bothFiles && (srcStat.ino === destStat.ino)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work on Windows given Windows doesn't use inodes? Maybe this should check that srcStat.ino is not null, just in case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed there were some issues with ino on Windows back in Node 0.10 and 0.12 times nodejs/node-v0.x-archive#2670.
The tests on Appveyor pass though.
I'll google a bit more to double check if we can rely on ino not being null

src/util/fs.js Outdated
@@ -511,6 +734,22 @@ export async function writeFilePreservingEol(path: string, data: string) : Promi
await promisify(fs.writeFile)(path, data);
}

export async function hardlinksWork(cwd: string): Promise<boolean> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just call the param dir, as it could be called with any directory - It doesn't have to be the current directory.

@@ -1,8 +1,10 @@
/* @flow */

import {getPackageVersion, runInstall} from '../_helpers.js';
import * as fs from '../../../src/util/fs.js';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to do * as here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean I could only import stat instead of * as fs?
Not sure if that would be better when the number of imports from fs grow.
Everything grouped into one import makes sense rather than a lot functions that need sorting and formatting.
What should be the guideline here?

@bestander
Copy link
Member Author

Updated with fixes

@bestander
Copy link
Member Author

ping @Daniel15 @cpojer.
Good to be merged?

@sebmck sebmck merged commit 3dc9cf4 into yarnpkg:master Feb 9, 2017
@sebmck
Copy link
Contributor

sebmck commented Feb 9, 2017

I'm going to work on a prepass to optimise module hoisting today, this will go nicely in conjunction with it. Thank you so much @bestander!

mnsn pushed a commit to mnsn/yarn that referenced this pull request Feb 15, 2017
…modules (yarnpkg#2620)

* Feature: duplicated dependencies are hardlinked from the first copy

* cleanup and tests

* made linking non default

* added fallback if links are not supported

* lint fix

* nits

* nits
@ghost ghost mentioned this pull request Feb 15, 2017
@STRML
Copy link
Contributor

STRML commented Feb 20, 2017

As this tweet indicates, this may not make it to v21 final - anecdotally, this flag got our React app's install size down by 20%, whereas #2676 only reduced it by about 2%.

Unfortunately, --link-duplicates threw this error at the end:

error An unexpected error occurred: "EEXIST: file already exists, link '/proj/frontend/node_modules/@kadira/storybook/node_modules/enhanced-resolve/node_modules/memory-fs/.gitattributes' -> '/proj/frontend/node_modules/babel-loader/node_modules/enhanced-resolve/node_modules/memory-fs/.gitattributes'".

Please indicate if you'd like me to move this to a separate issue.

@bestander
Copy link
Member Author

@STRML, if there this feature gives benefits to the community then we will keep it, thanks for speaking up.
Could you open a new issue and provide a way to repro?

@STRML
Copy link
Contributor

STRML commented Feb 20, 2017

Sure, I'll try to get a minimal repo going then open the issue.

@STRML STRML mentioned this pull request Feb 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants