Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

path.join('C:') = 'C:.' only in windows node LTS version #14405

Closed
XGHeaven opened this issue Jul 21, 2017 · 20 comments
Closed

path.join('C:') = 'C:.' only in windows node LTS version #14405

XGHeaven opened this issue Jul 21, 2017 · 20 comments
Labels
path Issues and PRs related to the path subsystem. windows Issues and PRs related to the Windows platform.

Comments

@XGHeaven
Copy link
Contributor

  • 6.11.1:
  • Window 10 x64:

In windows, path.join('C:') return 'C:.' not 'C:'
In other system, such as linux,macos always return 'C:'

In windows, 'path.join('C:\')' is corrrect, return 'C:\'

But you give path.join more than one args, it's correct.
such as path.join('C:', 'a') == 'C:\a'

here is a interesting thing, if you run path.format(path.parse(path.join('C:'))) in windows,
returns 'C:C:.'

I don't know why...

@TimothyGu
Copy link
Member

In general, the path normalization behavior is platform-dependent. You might find path.win32.* and path.posix.* to be helpful for testing on different platforms.

Now, the issue itself is actually two issues as far as I can tell.

The path.win32.join('C:') === 'C:.' issue is actually coming from path.win32.normalize('C:') === 'C:.'. It should normalize to 'C:\\' IMO.

The join->parse->format issue is more complex:

> path.win32.parse('C:')
{ root: 'C:', dir: 'C:', base: '', ext: '', name: '' }
> path.win32.parse('C:.')
{ root: 'C:', dir: '', base: 'C:.', ext: '', name: 'C:.' } // <-- ISSUE

I would expect 'C:.' to parse to

{ root: 'C:', dir: 'C:', base: '.', ext: '', name: '.' }

Similarly:

> path.win32.parse('C:abcd')
{ root: 'C:', dir: '', base: 'C:abcd', ext: '', name: 'C:abcd' }

seems to be faulty as well.

@TimothyGu TimothyGu added path Issues and PRs related to the path subsystem. windows Issues and PRs related to the Windows platform. labels Jul 21, 2017
@TimothyGu
Copy link
Member

TimothyGu commented Jul 21, 2017

These issues are also reproducible on master.

@tniessen
Copy link
Member

Well, what can I say... Paths on Windows are a mess, and node really isn't dealing with that mess too well yet.

Okay, the colon has three separate meanings under Windows, and it is not trivial to dinstinguish them:

  1. The colon can be used after a drive letter and before a backslash as part of a so called "disk designator", resulting in an absolute path:
C:\>echo A > F:\test\foo
C:\>type F:\test\foo
A
  1. The colon can be used after a drive letter but without a following backslash. Windows will then resolve the path relative to the current working directory on that drive:
C:\>cd F:\test
C:\>echo foo > F:something
C:\>type F:\test\something
foo
  1. The colon can also be used to access NTFS alternate data streams, but only under certain circumstances:
F:\test>echo foo > test
F:\test>echo bar > test:alt
F:\test>streams test

streams v1.60 - Reveal NTFS alternate streams.
Copyright (C) 2005-2016 Mark Russinovich
Sysinternals - www.sysinternals.com

F:\test\test:
             :alt:$DATA 6

This document describes some of these aspects. Consider this paragraph:

A file name is relative to the current directory if it does not begin with one of the following:

  • A UNC name of any format, which always start with two backslash characters ("\"). For more information, see the next section.
  • A disk designator with a backslash, for example "C:\" or "d:\".
  • A single backslash, for example, "\directory" or "\file.txt". This is also referred to as an absolute path.

Note that C: is not an absolute path according to Microsoft. It is quite the opposite:

If a file name begins with only a disk designator but not the backslash after the colon, it is interpreted as a relative path to the current directory on the drive with the specified letter. Note that the current directory may or may not be the root directory depending on what it was set to during the most recent "change directory" operation on that disk. Examples of this format are as follows:

  • "C:tmp.txt" refers to a file named "tmp.txt" in the current directory on drive C.
  • "C:tempdir\tmp.txt" refers to a file in a subdirectory to the current directory on drive C.

We always tried to keep the path module as platform-independent as possible. POSIX paths are easy to work with, they follow simple rules, but Windows paths do not.

Note that the path API has some other platform-specific problems, mostly related to path.xxx.resolve, see #1999, #13738 and #14107. I was hoping to have some time to fix some of the problems soon.


I will try to explain why some of your examples @XGHeaven and @TimothyGu are not as unambiguous as you might think:

path.win32.join('C:')

should either throw, return C: or the current working directory on drive C:.

path.win32.join('C:\\')

should return C:\.

path.win32.join('C:', 'a')

I am not sure whether C:\a is actually correct. Windows will return C:a in this case, even though most users would probably expect C:\a, which Windows will return if you pass in C:\ and a.

path.win32.parse('C:')

Again, this is not an absolute path:

C:\>cd F:\test
C:\>dir F:
 Verzeichnis von F:\test

22.07.2017  17:21    <DIR>          .
22.07.2017  17:21    <DIR>          ..
22.07.2017  17:17                16 foo
22.07.2017  17:20                 6 something
22.07.2017  17:21                 6 test
path.win32.parse('C:abcd')

I am not sure how we would represent this.


tl;dr:
It will be impossible to solve these problems without a very tight connection to the host platform, therefore making it impossible to use the same API on other platforms. Windows paths follow highly complex patterns and cannot be parsed as easily as POSIX paths.

cc @refack

@XGHeaven
Copy link
Contributor Author

XGHeaven commented Jul 22, 2017

path.win32.join('C:')

Whatever it return, I think C:. is absolute wrong. fix this first.

path.win32.join('C:', 'a')

I prefer to return C:a. But if you want to get C:\a, best to use

path.win32.join('C:\\', 'a')
// or
path.win32.join('C:/', 'a') // don't care about using '/' or '\\' when you write cross-platform app

And colon usage case 3 in windows, it never used in development or cmd for myself.
So I think case 3 has a low priority.

@refack
Copy link
Contributor

refack commented Jul 23, 2017

tl;dr:
It will be impossible to solve these problems without a very tight connection to the host platform, therefore making it impossible to use the same API on other platforms. Windows paths follow highly complex patterns and cannot be parsed as easily as POSIX paths.

cc @refack

Sometimes I wish GitHub would add 😭 as a reaction.

@refack
Copy link
Contributor

refack commented Jul 23, 2017

Actually, beside being ugly C:. is correct (as in c: is for the drive, . is for current directory)

d:\code\node-cur$ c:

C:\$ cd windows

C:\WINDOWS$ d:

d:\code\node-cur$ pushd c:.

C:\WINDOWS$ popd

d:\code\node-cur$ pushd c:\

c:\$

But I agree that path.format(path.parse(path.join('C:'))) === 'C:C:.' is a bug.

@refack
Copy link
Contributor

refack commented Jul 23, 2017

We always tried to keep the path module as platform-independent as possible. POSIX paths are easy to work with, they follow simple rules, but Windows paths do not.

path should not only be platform independent, it should also be stateless, performing pure string manipulation with no system calls. So we have two issues here resolving C: is state dependant, and resolving . is state dependant... 🤷‍♂️

@TimothyGu
Copy link
Member

@tniessen It might be helpful to look at what other platforms are doing. Python 3.5.3 does:

>>> import ntpath
>>> ntpath.normpath('C:')
'C:'
>>> ntpath.normpath('C:.')
'C:'
>>> ntpath.normpath('C:\\')
'C:\\'
>>> ntpath.normpath('C:\\abc')
'C:\\abc'
>>> ntpath.normpath('C:abc')
'C:abc'
>>>
>>> ntpath.join('C:')
'C:'
>>> ntpath.join('C:\\')
'C:\\'
>>> ntpath.join('C:', '.')
'C:.'
>>> ntpath.join('C:', 'abc')
'C:abc'
>>> ntpath.join('C:\\', 'abc')
'C:\\abc'
>>> ntpath.join('C:\\abc', '.')
'C:\\abc\\.'

From what I'm reading in this thread, the C:. -> C: normalization should be expected and correct, and it is IMO what we should do as well. On the other hand, Python's ntpath.join doesn't normalize the path after returning, so joining 'C:' and '.' returns 'C:.'.

About parse, I guess now I would still expect C:. to parse to

    { root: 'C:', dir: '', base: '.', ext: '', name: '.' }
not { root: 'C:', dir: '', base: 'C:.', ext: '', name: 'C:.' } (current)

The root is C: alright, while the actual path is dir + path.sep + name so ..

By the same logic C:abc should be

    { root: 'C:', dir: '', base: 'abc', ext: '', name: 'abc' }
not { root: 'C:', dir: '', base: 'C:abc', ext: '', name: 'C:abc' } (current)

@TimothyGu
Copy link
Member

Here's the Python equivalent for parse:

>>> from pathlib import PureWindowsPath
>>> def PrintPath(p):
...   print({ 'drive': p.drive, 'root': p.root, 'anchor': p.anchor, 'parent': p.parent, 'name': p.name })
... 
>>> PrintPath(PureWindowsPath('C:abc'))
{'drive': 'C:', 'anchor': 'C:', 'parent': PureWindowsPath('C:'), 'name': 'abc', 'root': ''}
>>> PrintPath(PureWindowsPath('C:\\abc'))
{'drive': 'C:', 'anchor': 'C:\\', 'parent': PureWindowsPath('C:/'), 'name': 'abc', 'root': '\\'}
>>> PrintPath(PureWindowsPath('C:.'))
{'drive': 'C:', 'anchor': 'C:', 'parent': PureWindowsPath('C:'), 'name': '', 'root': ''}

PureWindowsPath normalizes the path (as seen in the last pair of lines), but otherwise I think it's pretty consistent with what I proposed above. Our root is basically Python's anchor BTW.

@tniessen
Copy link
Member

Actually, beside being ugly C:. is correct (as in c: is for the drive, . is for current directory)

Yes, it is correct, but I am not entirely sure we should prefer that over C: without the dot (see python ↑). Apparently, C:. comes as a surprise to many people. The main question is whether C: behaves the same as C:. when given as input to other API functions.

@TimothyGu Sounds good. Your interpretation of parse seems to be reasonable. We are still stuck with the problem of implementing resolve to deal with these paths...

Currently, the only platform-dependent functions are the resolve implementations, and we won't be able to change that. In retrospect, resolve might have been better suited for os to distinguish it from its platform-independent siblings, but we would still need to use it within path.relative etc., or change the semantics of those functions. My best approach to solving some platform-dependent problems was this variant of resolve (I was about to implement it for #13738, but did not have time yet):

  1. If the API platform is the same as the host platform, use the old implementation.
  2. If cwd is not required to compute the result, use the old implementation.
  3. If the result may be relative (e.g. to use the result in path.relative), let cwd be . and return a relative path.
  4. Otherwise, throw.

In my opinion, this is our best shot at a compromise between correctness and platform-independentness. This behavior will need to be documented precisely. For the vast majority of users, this won't change anything. I hope to get some time within the next days to propose a solution to some of these problems.

@refack
Copy link
Contributor

refack commented Jul 23, 2017

In my opinion, this is our best shot at a compromise between correctness and platform-independentness. This behavior will need to be documented precisely. For the vast majority of users, this won't change anything. I hope to get some time within the next days to propose a solution to some of these problems.

There's also my suggested #13887 (comment) opt-in way, either:

const path = require('path');
path.strict.posix.resolve('.\\gaga') // throws
const { strict: spath } = path;
spath.posix.resolve('.\\gaga') // throws

or

const { Pure } = require('path');
const purePath1 = new Pure('C:\\fakepath')
purePath1.resolve('gaga') === 'C:\\fakepath\\gaga';
const purePath2 = new Pure();
purePath1.resolve('gaga') === Pure.resolve('gaga') // throws or returns just 'gaga'

@TimothyGu
Copy link
Member

@tniessen This issue is about path normalization (which doesn't depend on the local file system), not resolution.

@tniessen
Copy link
Member

@TimothyGu I know, I just thought the other problems were related enough to bring them up here. Sure, this issue can be solved independently.

@tniessen
Copy link
Member

@TimothyGu According to our documentation, root should be a part of dir in your last examples in #14405 (comment).

@TimothyGu
Copy link
Member

@tniessen Cool. I'll implement this in an upcoming PR.

@TimothyGu
Copy link
Member

I just opened #14440 after auditing all path functions, which fixes parse() and adds tests to make sure all of the pure functions work with volume-relative paths. I intentionally did not change path.normalize('C:') === 'C:.' however, since path.normalize('') === '.' under the current Node.js implementation and the current behavior just seems to be consistent with it.

TimothyGu added a commit to TimothyGu/node that referenced this issue Jul 28, 2017
`path.resolve()` and `path.join()` are left alone in this commit due to
the lack of clear semantics.

Fixes: nodejs#14405
@tniessen
Copy link
Member

@XGHeaven Thank you for your report, @TimothyGu fixed the behavior of the path API to correctly work with volume-relative paths.

@XGHeaven
Copy link
Contributor Author

@TimothyGu Thanks. I have a question about if this bug fixed in LTS? I remember this bug just fixed in master.

@tniessen
Copy link
Member

Backporting to release lines can take some time. cc @nodejs/lts

addaleax pushed a commit that referenced this issue Aug 1, 2017
`path.resolve()` and `path.join()` are left alone in this commit due to
the lack of clear semantics.

PR-URL: #14440
Fixes: #14405
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
@gibfahn
Copy link
Member

gibfahn commented Aug 4, 2017

@XGHeaven This will be considered in the next v6.x release. If 2791761 cherry-picks cleanly to v6.x, and the risk of the change is deemed to be low, it should go into the next release. If it doesn't cherry-pick cleanly it'll have to be backported (process here).

v4.x is in maintenance, so most fixes aren't backported.

I suggest you subscribe to #14440, any discussion about backporting to v6.x will happen on there.

MylesBorins pushed a commit that referenced this issue Aug 14, 2017
`path.resolve()` and `path.join()` are left alone in this commit due to
the lack of clear semantics.

Backport-PR-URL: #14787
PR-URL: #14440
Fixes: #14405
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
MylesBorins pushed a commit that referenced this issue Aug 16, 2017
`path.resolve()` and `path.join()` are left alone in this commit due to
the lack of clear semantics.

Backport-PR-URL: #14787
PR-URL: #14440
Fixes: #14405
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
abhishekumar-tyagi pushed a commit to abhishekumar-tyagi/node that referenced this issue May 5, 2024
`path.resolve()` and `path.join()` are left alone in this commit due to
the lack of clear semantics.

PR-URL: nodejs/node#14440
Fixes: nodejs/node#14405
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
path Issues and PRs related to the path subsystem. windows Issues and PRs related to the Windows platform.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants