Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

win, fs: fix realpath behavior on substed drives #7559

Closed
wants to merge 3 commits into from

Conversation

bzoz
Copy link
Contributor

@bzoz bzoz commented Jul 6, 2016

Checklist
  • make -j4 test (UNIX), or vcbuild test nosign (Windows) passes
  • benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

fs, win

Description of change

Work in progress for issues related to new realpath implementation (#7175).

Before v6 on drives created with subst and network mapped drives realpath used to return filename on the mapped drive. With v6 (b488b19) it now returns filename on the original device or on the network share. This restores the old behavior by:

  1. Restoring old javascript implementation (separate commit just for reviewing here, can be squashed with the next)
  2. Using it when path returned by new implementation is on a different device than the original path

This PR addresses only issues related to the filename resolved being different on v6 and v4 (e.g. #7294). There are other issues, for example related to permissions on Windows like #7192 and other on Linux as described in #7175. Some of those are to be addressed in different patch (comment).

@trevnorris this fixes some of the Windows issues. Can you review to make sure it's compatible with your changes?

bzoz added 2 commits July 6, 2016 12:59
This is a partial revert of nodejs@b488b19
It restores old javascript implementation of realpath and realpathSync.
The names have been changed: added JS and not exported in fs module.
Before v6 on drives created with subst and network mapped drives
realpath used to return filename on the mapped drive. With v6 it now
returns filename on the original device or on the network share. This
restores the old behavior by using old javascript implementation when
path returned by new implementation is on a different device than the
original path.

Fixes: 7294
@nodejs-github-bot nodejs-github-bot added the fs Issues and PRs related to the fs subsystem / file system. label Jul 6, 2016
@Fishrock123
Copy link
Contributor

#7548 Should make this unnecessary?

@bzoz
Copy link
Contributor Author

bzoz commented Jul 6, 2016

No, #7294 is different beast than ELOOP.

@bzoz bzoz changed the title Bartosz restore js realpath win, fs: fix realpath behavior on substed drives Jul 6, 2016
@addaleax addaleax added the windows Issues and PRs related to the Windows platform. label Jul 6, 2016
@@ -1562,6 +1562,8 @@ fs.unwatchFile = function(filename, listener) {
}
};

// Cache for JS real path
var realpathCache = {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One global realpath cache without some kind of cache validation is probably a no-go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a thing that needs to be solved. One global cache is probably not a good idea. But some caching is needed for performance reasons.

I would guess user could add cache object to options. It could be also used on Linux, to cache realpath results (not the partial ones as in JS implementation)

@mscdex
Copy link
Contributor

mscdex commented Jul 6, 2016

It seems like this could be handled better in C++ instead of resorting to reinstating the js realpath implementation?

@Trott Trott added the wip Issues and PRs that are still a work in progress. label Jul 6, 2016
@Trott
Copy link
Member

Trott commented Jul 6, 2016

Labelling as in progress. Of course, if that's no longer the case, please comment or remove the label.

@jasnell
Copy link
Member

jasnell commented Jul 6, 2016

Would definitely prefer to see if there's a fix that can be made in libuv before bringing back the js realpath impl.

@bzoz
Copy link
Contributor Author

bzoz commented Jul 6, 2016

@mscdex, @jasnell: old JS implementation didn't check if the drive itself was not a "link" - e.g. subst or mapped network location. It just assumed its a hard drive, only checking if it exists. I don't think there is a workaround for this in uv - it calls GetFinalPathNameByHandle, and that will always return original drive/network share name and not the mapped one.

One thing that could be done, is to port old js implementation to C++, as a fallback for this corner case, but that would not be an easy task.

@mscdex
Copy link
Contributor

mscdex commented Jul 6, 2016

@bzoz What I had in mind was to check if the drive is mapped when the input path starts with a drive letter that is different than that of the output path. If it is, then simply replace the beginning, common portion of the input and output paths with the drive letter from the input path.

@addaleax
Copy link
Member

addaleax commented Jul 6, 2016

I think the other Windows issues that were arising from the v6 realpath change are a lot easier to fix when building upon this, likely by simple additions to realpathJSNeeded; so for now I’m +1 on this approach.

I’d also be +1 on moving the entire old implementation to C++, but as @bzoz said, that’s something for a different time scale.

@bzoz
Copy link
Contributor Author

bzoz commented Jul 7, 2016

@mscdex: that would unfortunately not work:

  • x: is mapped drive
  • x:\test\ is a link to c:\test\
  • we want realpath of x:\test\file.txt

Old JS would follow the link to C drive, returning c:\test\file.txt, just like native implementation does it now.

@mscdex
Copy link
Contributor

mscdex commented Jul 7, 2016

@bzoz I'm confused now. I thought you said the C++ realpath and js realpath were returning different paths, so why would both return c:\test\file.txt?

@bzoz
Copy link
Contributor Author

bzoz commented Jul 7, 2016

@mscdex: I was referring to this idea, of replacing common parts of paths. With this solution in my example x:\test\file.txt would be returned which would be incorrect.

@bzoz
Copy link
Contributor Author

bzoz commented Jul 7, 2016

Maybe to clarify: return value of realpathSync('x:\test\file.txt)`:

No What is x: What is x:\test\ JS version uv version
1 physical drive mklink /d x:\test c:\test c:\test\file.txt c:\test\file.txt
2 physical drive mkdir x:\test x:\test\file.txt x:\test\file.txt
3 subst x: c:\substed mklink /d c:\substed c:\test c:\substed\file.txt c:\substed\file.txt
4 subst x: c:\substed mkidr x:\test x:\test\file.txt c:\substed\test\file.txt

If we would use only uv version and try to replace the beginning, we would have a bug in case No 3, as it would get us x:\test\file.txt, when both new and old agree it should be c:\substed\file.txt. Hope this helps.

@trevnorris
Copy link
Contributor

This does step on my realpath() fix. Also I feel like this is a little too much copy/paste when I see comments like:

// On windows, check that the root exists. On unix there is no need.

even though this PR only addresses Windows specific behavior. Would it not be possible to resolve the path normally then pass it to a function for fixup afterwards, instead of adding the entire JS implementation again?

@addaleax
Copy link
Member

addaleax commented Jul 7, 2016

@trevnorris #6861 #7044 #7192 #7294 are a couple of Windows issues that were arising from the libuv-based realpath implementation, and I think your PR is orthogonal to fixing them?

@mscdex
Copy link
Contributor

mscdex commented Jul 8, 2016

I think this can be addressed in C++ land by passing the original drive letter to QueryDosDevice(). If that function returns a path starting with \??\ then it's a subst'ed drive and we can simply do the appropriate string manipulation to arrive at the pre-node-v6 result.

@bzoz
Copy link
Contributor Author

bzoz commented Jul 8, 2016

@trevnorris: JS realpath implementation is copy/pasted from before the change to uv. I didn't want to change it so it would be easier to review rest of the changes. But yes, I'll clean it up, remove those if (isWindows) etc.

What I try to do here is pretty much resolving the path normally using the new native realpath, and then running "fixup" if needed (if (realpathJSNeeded(...)). Unfortunately, the only fixup function that I come up with was the old implementation. The idea @mscdex has would be great, but there are corner cases. I can think of one, No 3 from the table above, but maybe there are more.

@addaleax: From the issues you mentioned my PR will only fix #7294, but as you noticed earlier we could extend this fix for the other issues that you mentioned.

@mscdex: Sorry, I just don't see how it could work. As I understand your idea, with No 3:

  1. uv would return c:\substed\file.txt
  2. we check that it is on different drive than x:\test\file.txt, so we need to fix
  3. "common part" is file.txt, so we would use x:\test\ + file.txt
    The problem here is that pre-v6 implementation would return c:\substed\file.txt

@mscdex
Copy link
Contributor

mscdex commented Jul 8, 2016

@bzoz Normal junctions can be detected as well.

@bzoz
Copy link
Contributor Author

bzoz commented Jul 8, 2016

@mscdex I just cannot see how it could be done other than reimplementing JS code in C++. We would have to check every path segment to see if it is not a link.

@trevnorris
Copy link
Contributor

@addaleax

I think your PR is orthogonal to fixing them?

It is, but also I don't believe re-adding the entire JS implementation is necessary to solve this issue.

@bzoz
It seems the key issue here with the new implementation is how it would previously "Skip over roots" (comment in start(), in fs.realpathSync()). Specifically when it came to network drives. So taking @mscdex's suggestion, if a network path such as \\servername\share$\... were returned then the common path would be taken. This removes the issue that you've outlined in (3). What cases wouldn't this cover?

In my initial implementation I actually was implementing a way of resolving arbitrarily long nested symlinks using uv_fs_readlink(). So basically a minimal version of the JS implementation. If we absolutely must do this, I think it would be advantageous to do this in C++ and reap greater benefits.

@bzoz
Copy link
Contributor Author

bzoz commented Jul 8, 2016

I'll try to elaborate on 3: realpathSync('x:\dir_a\dir_b\file');. The x: is substed to c:\x_drive\.

If the dir_a and dir_b are normal directories C++ will return c:\x_drive\dir_a\dir_b\file. We then check that x:\ is c:\x_drive, substitute and get x:\dir_a\dir_b\file - just as pre-v6 did. That is case 4.

Case 3 is when dir_a is a link to c:\x_drive\dir_b. Then C++ will return c:\x_drive\dir_b\file - just as pre-v6. If we now substitute c:\x_drive\ we will get x:\dir_b\file which is wrong.

We could add some conditions to make it work. For example if we take x:\dir_b\file and change x: to c:\x_drive\ we get different path than the original. So we could assume that there was a link.

But when dir_a is a link to x:\dir_b: then C++ will return c:\x_drive\dir_b\file and pre-v6 should return x:\dir_b\file.

Now, we have x:\dir_a\dir_b\file as input, C++ returns c:\x_drive\dir_b\file and we need to return c:\x_drive\dir_b\file or x:\dir_b\file, depending on what dir_a is. There is no other way to do this, than checking each path segment - just like old JS code.

As for performance - isn't disk access the bottleneck anyway? Would it really be faster in C++? Also, here we have a working implementation, with caching etc. Sure, we could reimplement it in C++, but I don't think it would be easy to do so.

const uvResult = binding.realpath(pathModule._makeLong(resolvedPath),
options.encoding);
if (realpathJSNeeded(resolvedPath, uvResult))
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does make lint not catch this?

@saghul
Copy link
Member

saghul commented Jul 13, 2016

IMHO "fixing" case 3 is not correct. It might be how pre-v6 behave, but that doesn't make it correct. I guess it boils down to how we want to interpret drive subst-ing. In my mind it's like a symlink from /c/xthing to /x/ so since symlinks should be resolved by realpath, we shouldn't put the substed drive letter, but keep the resolved one.

As a general note: it might be better to have a meta-issue with the realpath edge cases that need fixing, there are other reasons why might want / need to go back to the JS implementation than this specific one: ELOOP, ramdisks, etc.

@saghul saghul mentioned this pull request Jul 14, 2016
7 tasks
@bzoz
Copy link
Contributor Author

bzoz commented Jul 14, 2016

I also think that "case 3" was pretty much a bug in pre-v6, but from discussion at #7175 I understand, that v6 change didn't suppose to change the way realpath work.

@saghul
Copy link
Member

saghul commented Jul 14, 2016

Please see #7726.

Saúl Ibarra Corretgé
http://bettercallsaghul.com

@bzoz
Copy link
Contributor Author

bzoz commented Aug 10, 2016

This will be addressed when #7899 lands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. windows Issues and PRs related to the Windows platform. wip Issues and PRs that are still a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants