Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support long path names on all Windows versions #2188

Closed
wants to merge 2 commits into from

Conversation

DemiMarie
Copy link

@DemiMarie DemiMarie changed the title Initial RFC commit Support long path names on all Windows versions Oct 24, 2017
@retep998
Copy link
Member

retep998 commented Oct 25, 2017

Required reading

If you want to be able to provide useful comments on this RFC, it is essential that you actually understand the various forms of paths on Windows. Therefore, please read this article before commenting:

https://googleprojectzero.blogspot.com/2016/02/the-definitive-guide-on-win32-to-nt.html

Thank you.

@chris-morgan
Copy link
Member

Could we expose the transformation function publicly in std::os::windows? Some people are inevitably going to want to call an NT API directly at some point, it’d be nice if we had a function that said “pass your paths through here before calling the NT API directly, to make sure they’ll work on long paths”.

(On the other hand, I don’t think we expose the UTF-16 conversion function that is also needed, and this isn’t quite so useful without that. I do know that I’ve copied code from the standard library to achieve these things before. But exposing that is definitely out of scope for this RFC.)

@gurry
Copy link

gurry commented Oct 26, 2017

I'm building a P2P filesharing application which would be deemed seriously crippled if it cannot support longer paths. So this is very much a needed fix.

@chris-morgan I, for one, would vote for making the std APIs automatically and seemlessly work with longer paths. Requiring the caller to call extra functions for longer paths would be awkward.

@chris-morgan
Copy link
Member

@gurry I do not describe a function required for normal use, but rather a low-level function for cases where the user is calling the NT API directly.

@retep998
Copy link
Member

retep998 commented Oct 26, 2017

Calling NT directly is very special because you're passing NT paths, not win32 paths. A path that starts with \\?\ is a win32 path, not an NT path, and win32 transforms that into a path that starts with \??\ before calling an NT api. However, NT paths are not required to start with \??\ and merely need to start with \ (?? is a directory in the object manager).

However I am still interested in these API additions:

  • Path normalization. On Windows this is different from canonicalization. Normalization involves calling GetFullPathName.
  • Turning an absolute path into a root local device path, by adding the \\?\ prefix as necessary if appropriate.
  • Getting rid of the \\?\ prefix and turning the path into a normal absolute path for compatibility with software that breaks on paths starting with \\?\.

I thought that CoreFX (.NET Core standard library) used `GetFullPathNameW`, but it doesn’t.
@pitdicker
Copy link
Contributor

I once wrote the beginnings of an RFC for this, but never finished it. It goes a different route, doing the conversion ourselves. It is not very difficult, and also works for files that do not (yet) exist in the filesystem. for reference (I did not recently re-read it)

@retep998
Copy link
Member

retep998 commented Nov 6, 2017

It is not very difficult, and also works for files that do not (yet) exist in the filesystem.

GetFullPathName does not touch the filesystem so it works perfectly fine on paths that don't exist.

@DemiMarie
Copy link
Author

As I am looking at implementation strategies, I am finding that:

We don’t really want bug-for-bug compatibility with Windows.

The Windows behavior:

  • Is poorly documented (or sometimes not documented at all).
  • May change between Windows versions, as has already happened with paths that begin with \??\ (the behavior of which changed between XP and Vista).
  • Error-prone: C:\Users\Lightning\CON opening \??\CON (at the NT level) is not expected, and difficult to work around.
  • Limits our options. Windows supports the equivalent of Unix’s openat(2) via NtCreateFile, but that takes NT paths.

So, I propose that Rust do its own normalization. This will require me to write LOTS of test cases!

@ssokolow
Copy link

@DemiMarie

When I read that big statement, my initial expectation was that you were going to continue with a list of points in favour of using the Windows normalization function to avoid introducing bugs where Rust's behaviour diverges from the system behaviour. (ie. "We don't want to be responsible for having to match Windows bug-for-bug to avoid problems.")

@retep998
Copy link
Member

retep998 commented Nov 12, 2017

Regardless of what decision we end up taking, we will need a ridiculous number of test cases.

I also recommend first putting the normalization implementation into an external crate and getting it tested in all sorts of zany setups to make sure it behaves as people expect in the most obscure of corner cases.

@DemiMarie
Copy link
Author

@ssokolow I would much rather use the system implementation. However, parts of Windows pathname handling are such security footguns (such as DOS devices) that I really really don’t want Rust programs to inherit them.

Personally, I think that we should act just like the Windows API, except for a certain list of special cases, each of which needs to be documented and justified. The ones that come to mind are:

  1. Long path names should be transparently supported by libstd. This one is almost certainly the least controversial.
  2. DOS devices should not be recognized transparently, although one can always access them using \\.\ or the like. This is for security reasons.
  3. Named pipe names should not be normalized. Likewise.
  4. Trailing whitespace and periods should not be stripped from filenames. This is so that one can access any filename on the system, and so that one can feed the results from (say) listing a directory directly into the path-opening functions.

2 and 4 can be accomplished by padding the path with a suitable suffix (such as rust or just a) before passing to GetFullPathNameW and then stripping it afterwards. 1 is easily accomplished by ensuring that every path passed to CreateFileW et al has already been normalized and prefixed with \\?\. 3 is a matter of checking (before passing to GetFullPathNameW) if the path is that of a named pipe (\\.\pipe\something or \\server\pipe\something) and skipping the GetFullPathNameW call if it is.

@ssokolow
Copy link

ssokolow commented Nov 13, 2017

I certainly agree.

I was more expressing that, to someone like me who hasn't used Windows on anything other than retro-gaming PCs in over a decade and has only started to learn the depth of the rabbit hole more recently, the degree to which Windows pathname handling is broken boggles the mind. The part about trailing periods and spaces didn't even come to mind for me.

(ie. It was more intended as a very sloppy attempt to indicate that more supporting detail for the rationale wouldn't go amiss.)

@retep998
Copy link
Member

retep998 commented Nov 13, 2017

2 and 4 can be accomplished by padding the path with a suitable suffix (such as rust or just a) before passing to GetFullPathNameW and then stripping it afterwards.

X:\COM1.blah will still get transformed into \\.\COM1 or \??\COM1, so simply tacking on a suffix won't work.

Have I mentioned the sheer number of test cases we're going to need for this?

@DemiMarie
Copy link
Author

DemiMarie commented Nov 13, 2017 via email

@DemiMarie
Copy link
Author

@retep998 I agree with the huge number of test cases!

Here is the algorithm I am leaning towards right now:

  1. If the path begins with \\?\GLOBALROOT\ or \??\GLOBALROOT\, delete everything up to but not including the last backslash in the prefix. Pass the rest directly to NT.
  2. If the path begins with \\?\ or \??\, replace the prefix with \??\ and pass to NT.
  3. If the first four characters match [\\/][\\/].[\\/]: replace prefix with \??\ and pass to NT (??? should we normalize here ???)
  4. If the first two characters match [\\/][\\/]: TODO
  5. TODO

@retep998
Copy link
Member

We don't pass paths directly to NT though. We always call win32 functions so stripping the \\?\GLOBALROOT prefix would result in a path like \foo\bar which win32 interprets as a rooted path relative to the current drive. Also, because \\?\ is the officially documented prefix for verbatim paths, I'd rather normalize to that than \??\.

@Centril Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Dec 5, 2017
@DemiMarie
Copy link
Author

I have thought more on this and done some testing, and the following seems to work for every case I have come across:

Subroutine canon_with_suffix:

This adds a suffix (such as \rust) containing a \ to the end of the path, passes it to GetFullPathNameW, and finally strips the suffix. The suffix is needed to prevent unwanted transformations.

Subroutine num_slashes:

This is easiest to just show in code:

fn num_slashes(path: &[u8]) -> usize {
    let mut count = 0;
    for i in path {
        match i {
            b'\\' | b'/' => { count += 1; }
            _ => break
        }
    }
    count
}
The algorithm:
if path.len() > 4 {
    match path[..4] {
        br"\\?\" | br"\??\" => return Ok(to_u16(path)),
        _                   => (),
    }
}
match num_slashes(path) {
    0 | 1 => canon_with_suffix(path),
    2     => process_unc(path),
    _     => Err("The path is invalid"),
}

I haven’t finished process_unc, and it’s late, so I will fill that in later.

@KodrAus
Copy link
Contributor

KodrAus commented Jan 23, 2018

Just leaving a comment so I can come back shortly and spend some more time digging in to this!

For reference (mostly my own), looks like CoreFX (.NET's std) does use GetFullPathNameW when expanding short path names during normalisation, which in turn is called when getting the full path name, which in turn is called by pretty much everything that uses paths.

I like @retep998's suggestion of getting an implementation out in the wild and iterating on it there.

@alexcrichton
Copy link
Member

When considering a change like this I'd personally be quite curious to explore what other language ecosystems are doing to see if we can learn about the pros/cons of various strategies.

From some googling I've found:

Perhaps some more research could be done into these various ecosystems to see what they're doing and list out some pros/cons of the various strategies?

@retep998
Copy link
Member

Manifests

Pros

  • Works on Windows 10.
  • No changes to user code needed.
  • Even affects C/C++ libraries that are linked in.

Cons

  • Does not work on Windows 8.1 and older.
  • Requires serious design work around how to handle manifests, especially if we don't want to stop the user from using their own manifest.

GetFullPathNameW

Pros

  • Works on all versions of Windows.
  • No changes to user code are needed.

Cons

  • Requires internal modifications to std.
  • Does not automatically affect third party libraries that call system functions themselves.
  • Some care needs to be done to ensure that transforming the path ourselves is no different than what win32 would automatically do when converting the path before passing it on to NT. An extensive test suite would help.

Status quo

Pros

  • Developers of Rust itself can be lazy and not care about Windows users.

Cons

  • Windows users are left by the side of the road with no long path support.

@KodrAus
Copy link
Contributor

KodrAus commented Jan 30, 2018

Thanks @retep998! I think we should include your summary in some form in the Alternatives section of the RFC.

The GetFullPathNameW approach seems like the only immediately viable one to me, since Windows Server 2012 / 8.1 are still in support for a long time, so we should try and support them. I'm feeling like iteration/discussion over an implementation is probably the next step, so perhaps we can keep this RFC intentionally vague on details and move it forward and/or work on a possible implementation in the nursery.

What do you all think?

@alexcrichton
Copy link
Member

@rfcbot fcp close

This RFC has been open for quite some time and I'd like to propose that we close this RFC. My previous findings seem to indicate that there's not necessarily broad consensus among language standard libraires about whether this is a good idea or not, and I'd personally prefer to err on the side of less "magic" in the standard library.

While I think that it's always possible we could support the manifest option more easily in Rust I'd specifically like to propose closing this RFC with the idea of inserting expansion of paths on all system calls in the standard library.

@rfcbot
Copy link
Collaborator

rfcbot commented Aug 23, 2018

Team member @alexcrichton has proposed to close this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-close This RFC is in PFCP or FCP with a disposition to close it. labels Aug 23, 2018
@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Sep 5, 2018
@rfcbot
Copy link
Collaborator

rfcbot commented Sep 5, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot
Copy link
Collaborator

rfcbot commented Sep 15, 2018

The final comment period, with a disposition to close, as per the review above, is now complete.

By the power vested in me by Rust, I hereby close this RFC.

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. closed This FCP has been closed (as opposed to postponed) and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. disposition-close This RFC is in PFCP or FCP with a disposition to close it. labels Sep 15, 2018
@rfcbot rfcbot closed this Sep 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed This FCP has been closed (as opposed to postponed) finished-final-comment-period The final comment period is finished for this RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants