Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERF] switch from realpathSync -> readlinkSync #12

Closed
wants to merge 1 commit into from

Conversation

stefanpenner
Copy link
Contributor

@joliss has already identified nodes performance issues with realpath here

As per @krisselden's suggestion, we can likely just use readlinkSync here instead.

real life example: https://github.com/ember-cli/stress-app
before: 8668.970928ms
after: 7152.094867ms

18% improvement to incremental builds

As realpathSync uses readlinkSync internally, this also reduces calls to readlinkSync
before: 17826 (134.230726ms)
after: 6171 ( 93.061130ms)

@joliss has already identified nodes performance issues with realpath [here](nodejs/node-v0.x-archive#7902)

As per @krisseldens suggestion, we can likely just use readlinkSync here instead.

real life example: https://github.com/ember-cli/stress-app
before: 8668.970928ms
after: 7152.094867ms

18% incremental build improvement

As realpathSync uses readlinkSync internally, this also reduces calls to readlinkSync
before: 17826 (134.230726ms)
after:   6171 ( 93.061130ms)
@joliss
Copy link
Member

joliss commented Sep 2, 2015

readlink just gives you the verbatim path of the symlink target, which may be relative to the symlink location. realpath is needed to turn it into an absolute path.

(One might be tempted to use path.resolve to resolve relative paths, but that isn't enough either, because symlinks inside the symlink path need to be resolved to deal with .. correctly. For example, /Users/stefanpenner/some_symlink is no problem, but /Users/stefanpenner/some_symlinked_directory/some_symlink needs to resolve some_symlinked_directory first.)

I believe most of realpathSync's overhead is in Node land, so if we could get a version that calls the POSIX realpath(3), we might get a similar speedup.

@stefanpenner
Copy link
Contributor Author

Should we just branch if the result is not absolute?

Short of a native extension (which causes portability issues) I'm not sure how to use the posix version

@stefanpenner
Copy link
Contributor Author

It may be an option to create a faster user land variation of realpathSync.

If you can provide a failing test case, it would be handy. That way I can be sure the scenario you describe is handled correctly

@stefanpenner
Copy link
Contributor Author

Ya, I'm pretty sure we can implement a better then node's provided one, in user-space. In addition to that, if we took a similar approach as https://github.com/stefanpenner/mkdirp-bulk We could really cut down on duplicate internal stats and readlinks.

@joliss
Copy link
Member

joliss commented Sep 2, 2015

My preferred solution would be to fix Node to call realpath(3) on POSIX.

@joliss
Copy link
Member

joliss commented Sep 2, 2015

It's tempting to branch on absolute paths, but path.isAbsolute seems buggy on Windows (nodejs/node#2656), so I'd prefer to put the proper fix.

@stefanpenner
Copy link
Contributor Author

Fixing node seems fine, but that actually means months or years until our users gain access to it. I wish adding native extensions wasn't such a hazard.

Someone needs to bite the bullet an fix the node side of things. So future travelers don't have to endure this.

The more I think about the more I think the batching approach will likely yield the best short-term outcome for us. So I will likely explore both a faster user land variation and on top of it a batching version.

@stefanpenner
Copy link
Contributor Author

@joliss i found https://gist.github.com/joliss/3292eda35cd4d564cdd7 i suspect this is the failing scenaro?

If so i'll add it to this suite.

@joliss
Copy link
Member

joliss commented Sep 3, 2015

Yeah that's the scenario.

@stefanpenner
Copy link
Contributor Author

@joliss thanks

@stefanpenner
Copy link
Contributor Author

readlink just gives you the verbatim path of the symlink target, which may be relative to the symlink location. realpath is needed to turn it into an absolute path.

Does it sound reasonable for us to implement both? First use readLink, if the result of readLink is absolute, carry on. Otherwise fallback to realpath.

In our use-cases it is fairly uncommon to hit the relative scenario (although not impossible).

@joliss
Copy link
Member

joliss commented Nov 25, 2015

Does it sound reasonable for us to implement both? First use readLink, if the result of readLink is absolute, carry on. Otherwise fallback to realpath.

Sounds very reasonable. readlinkSync + isAbsolute is more than 10x faster than realpath on my system, so this should give a good speedup.

If you want to give it a try, note that on Windows isAbsolute doesn't handle \foo\bar correctly (nodejs/node#2656). (This matters when src and dest are on different drives.) We can maybe just use a regex to account for \foo\bar cases.

@stefanpenner
Copy link
Contributor Author

Thanks for the Windows tip. I was not aware of that issue

@stefanpenner stefanpenner deleted the perfs branch April 24, 2016 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants