-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Equivalent to Python's os.path.normpath #2208
Comments
Heavens, yes! I can't believe rust doesn't have something like this! I think it ought to be a method on |
cc @rust-lang/libs |
I think I've successfully written a good first draft of this method. My aim was to replicate the version in Python (linked above) and Go. The documentation for the Go version in turn links to the Plan 9 document detailing the high-level algorithm of this sort of normalization, handy! Right now, it's a free function, not in the std::path crate yet. As such, the method takes in an explicit use std::path::Path;
use std::path::PathBuf;
use std::path::Component;
fn normalize(p: &Path) -> PathBuf {
let mut stack: Vec<Component> = vec![];
// We assume .components() removes redundant consecutive path separators.
// Note that .components() also does some normalization of '.' on its own anyways.
// This '.' normalization happens to be compatible with the approach below.
for component in p.components() {
match component {
// Drop CurDir components, do not even push onto the stack.
Component::CurDir => {},
// For ParentDir components, we need to use the contents of the stack.
Component::ParentDir => {
// Look at the top element of stack, if any.
let top = stack.last().cloned();
match top {
// A component is on the stack, need more pattern matching.
Some(c) => {
match c {
// Push the ParentDir on the stack.
Component::Prefix(_) => { stack.push(component); },
// The parent of a RootDir is itself, so drop the ParentDir (no-op).
Component::RootDir => {},
// A CurDir should never be found on the stack, since they are dropped when seen.
Component::CurDir => { unreachable!(); },
// If a ParentDir is found, it must be due to it piling up at the start of a path.
// Push the new ParentDir onto the stack.
Component::ParentDir => { stack.push(component); },
// If a Normal is found, pop it off.
Component::Normal(_) => { let _ = stack.pop(); }
}
},
// Stack is empty, so path is empty, just push.
None => { stack.push(component); }
}
},
// All others, simply push onto the stack.
_ => { stack.push(component); },
}
}
// If an empty PathBuf would be return, instead return CurDir ('.').
if stack.is_empty() {
return PathBuf::from(Component::CurDir.as_ref());
}
let mut norm_path = PathBuf::new();
for item in &stack {
norm_path.push(item.as_ref());
}
norm_path
} Some sample (Unix) file path tests: fn main() {
let mut paths = vec![];
paths.push(Path::new("../../home/thatsgobbles/././music/../code/.."));
paths.push(Path::new("/home//thatsgobbles/music/"));
paths.push(Path::new("/../../home/thatsgobbles/././code/../music/.."));
paths.push(Path::new(".."));
paths.push(Path::new("/.."));
paths.push(Path::new("../"));
paths.push(Path::new("/"));
paths.push(Path::new(""));
// More tests for Windows (especially with drive letters and UNC paths) needed.
for p in &paths {
let np = normalize(&p);
println!("{:?} ==> {:?}", &p, &np);
}
} |
Brute force comparison: https://gist.github.com/ExpHP/3f7d8c03be1a45ebe5abd3ad5a517d73 fn gen_strings(alphabet: &[&str], s: String, depth: u8, emit: fn(&str)) {
emit(&s);
if let Some(deeper) = depth.checked_sub(1) {
for &ch in alphabet {
gen_strings(alphabet, s.clone() + ch, deeper, emit);
}
}
}
fn main() {
gen_strings(&["a", ".", "/"], Default::default(), 12, |s| {
let p = Path::new(&s);
println!("{:>15} -> {:>15}", p.display(), normalize(&p).display());
});
} import os
def gen_strings(alphabet, s, depth):
yield s
if depth > 0:
for ch in alphabet:
yield from gen_strings(alphabet, s + ch, depth - 1)
for s in gen_strings("a./", '', 12):
print("{:>15} -> {:>15}".format(s, os.path.normpath(s))) Generating all strings of "a", ".", and "/" up to length 12 on Arch, there is one behavioral difference between your implementation and normpath, which is that, if a path begins with It'd be nice to survey more languages and know what decisions and tradeoffs are on the table. |
@ExpHP Thank you for the extended tests! I don't think I would have found that corner case without your help. The double forward slash case is quite strange, I admit. I'm surprised that it ends up resolving to just /. I'm even newer at Go than Rust, but I can see if I can whip up a comparable test using the previously linked I'm going to look into installing Rust on my Windows machine tonight, and trying out some tests on there. |
Closing in favor of rust-lang/rust#47402. |
@Centril is this correct to close this issue, given that the rust-lang/rust#47402 is (just?) a "tracking issue", which actually points back to this very issue? I'm really sorry if this is something normal, but as a Rust newbie, I really don't understand what's the status of this in such a situation :( Is it open? closed? worked on? not worked on? what could I do if I wanted to, say, submit a solution proposal? |
I recall that this was associated with a PR that was never finished after a long period of back-and-forth. Looking at the linkbacks, this is it: Since we haven't heard from the author for a long time, I'd reckon it's fair game for anybody who wants to rebase that PR and address the outstanding feedback. Issues on the RFC repository generally aren't regarded as actionable items (people for some reason have been using them as feature requests or like a discussion forum), so their open/close state is largely immaterial beyond the potential to confuse people. I'd always look at the linkbacks. |
Original author here, apologies for the radio silence. It's been a long time since I've had time to work on this, not to mention my inexperience with contributing to open source in general. Aside from that, the one issue I encountered with this was that Windows paths weren't all getting normalized as expected, as per a recommended Windows path test suite (seen in the discussion on the PR rust-lang/rust#47363). My hope was that using |
I'm going to reopen this instead of rust-lang/rust#47402 as the associated PR never landed so unfortunately we don't actually have anything to track at this point (beyond the desire for the feature itself, which is betters suited to the rfcs repo). |
Is this still looking for someone to implement it? I'm happy to give it a go, as long as someone familiar with Windows paths can give me some edge cases to test against. Also, given that this won't access the filesystem (that's |
On Windows |
I wrote a port of some functions I needed from the Go's
|
Has anyone poked the author of the path-clean crate to see if he'd be interested in getting it into It looks to be an MIT/Apache-2.0 pure-Rust implementation of the same Plan 9 algorithm Go implemented for doing this. (It does need a little work but it'd need that either way so, if he's willing to work on it...) |
It uses |
Then I can point you to |
@gdzx I said "where I don't consider it necessary". As far as I'm concerned, this algorithm can be implemented perfectly well, relying only on the If I run out of projects that don't need this functionality before someone contributes code that meets my standards, I'm perfectly willing to implement it myself, with |
Whether So I personally would consider it to be incorrect use of |
That too. I'll freely admit that, as soon as I saw From my perspective, it didn't matter what it was being used for if the algorithm "wasn't supposed to be" using it in the first place. |
FWIW, it can be a pretty severe pain point to deal with paths in some cases because its internal wtf-8 representation isn't exposed. Without exposing it, there will always be a strong temptation to do exactly this sort of thing to get access to it. What I've done in the past to simplify is to use the Unix specific APIs to get the underlying |
@ssokolow There is a beautiful thing on GitHub, it's called "Create an issue". You can call it "Remove uses of unsafe when unnecessary". That would be a great start instead of talking nonsense about "disqualification", on a project that isn't even released yet. Also, it is quite an inconsequential issue compared to actually implementing a lexical path processing algorithm, checking its correctness, and its properties. |
Then let me reestablish the truth: my implementation of |
Maybe @ssokolow was a little harsh, but yes, it would also be a deal breaker for me too. It's not just unnecessary. It's incorrect. See my previous comment. |
@gdzx I was about to apologize for being snippy when @BurntSushi beat me to acknowledging it (I tend to not notice when I'm emotionally compromised until someone points it out). That said, I am still curious why you would think I would spontaneously choose to reply in a communication channel different than the one that we began the interaction in without being explicitly prompted to. As for |
@BurntSushi I'll take "a little harsh" as an apology :D. @ssokolow I don't understand why you would want to "reply in a communication channel different than the one that we began the interaction in". Also I will repeat that you can just take Anyway, if you have any idea for improvement, feel free to open an issue (just like you did on the other crate you linked to which also uses |
You said...
...which, to me, felt like "You should have replied in the bug tracker instead of here. Because you didn't read my mind and understand that, you have done wrong."
If I'd have to copy your code into my project and take responsibility for maintaining a copy of it to avoid having to re-audit the crate every so often, I think I'd prefer to just rewrite the algorithm in a style more intuitive to me: my own. Honestly, thanks to what the
That's the problem. You wrote it as if it were part of If your code were part of |
Right, you can't do that. That's what is incorrect because it relies on an implementation detail that is not part of the stable API. The next release of Rust could silently break your code. And since you're using unsafe, that breakage could result in memory unsafety. |
You really have a disrespectful tone, I don't understand why. I didn't ask you to read my mind, I asked you to do the same as what you did on the other crate to remove the use of
Of course, that's why it is nice to contribute to existing project that already support the use case, have CI, and work on Unix and Windows, so they can improve and eventually release.
Well, I can do it, but not without the limitations you outlined. However, I have the intention to make it part of the standard library, and it will serve as a good-enough PoC (which it still is now, since it doesn't have any release yet). But if you want to propose an improvement to make this crate suit your use-case, remove all use of |
How is it disrespectful to be honest about how I perceived what you'd said? You said something here, I tactlessly replied here, and you responded with what I, rightly or wrongly, perceived as an attack for having responded here rather than in the bug tracker.
I think that might be the point of miscommunication. It wasn't made clear enough that you're intentionally writing it in a form that is not recommended for use outside of |
@ssokolow I can agree on your last point. I will update the README to make that clear, but it's not on crates.io and the v0.1.0 is not released, that's not exactly production-ready. And that doesn't mean it is unsalvageable in the meantime to make it more appropriate for your use-case. Unless of course you want to improve on the Go port, but that is exactly what I did to make this crate. |
@gdzx I have to go shower and get to sleep, but I'll see if I can fit in a look at it tomorrow. |
@ssokolow Thank you ;). |
TL;DR for people like me who came back with 20 new notifications of diverse tones: here is a draft of what an implementation of this lifted to I don't have any kind of authority here, but maybe let's move the discussion about the pros and cons of this draft implementation to the issues of the linked repo, and try not to drown this issue in messages unrelated to the API design and desirability of this idea? :) |
In that case, I just pushed a very early RFC draft for this API. There are some links to implementations in other languages and some explanations. But as I said, it's a first draft. |
I don't have time to look at the code yet, but I've left some inline comments on the commit for the RFC draft. Aside from that, I like what is proposed. It has a bunch of functions, like Heck, I have some ugly hackery for want of |
Uses the normalize_path used in cargo to strip duplicate slashes With a link to a std rfc rust-lang/rfcs#2208 This fixes uutils#1829 This also touches uutils#1768 but does not attempt to fully solve it
* rm: add verbose output and trim multiple slashes Uses the normalize_path used in cargo to strip duplicate slashes With a link to a std rfc rust-lang/rfcs#2208 This fixes #1829 This also touches #1768 but does not attempt to fully solve it
In Python,
os.path.normpath
performs a simple lexical normalization of a path, removing redundant dir separators and cur-dir ('.') parts, and resolving up-dir references ('..'). In addition, on Windows, any forward slashes are converted to backslashes.It seems that Rust's
std::fs::canonicalize
is functionally similar to Python'sos.path.realpath
(with the addition of a check to see if the target exists), and is a superset ofnormpath
. However, I have a case where I only want to normalize the path, and not resolve symlinks.I'm quite new to Rust, but I'd be happy to help contribute a new method to do this to the library if I can.
The text was updated successfully, but these errors were encountered: