-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notes from a new user #17
Comments
Thanks for opening this issue!
😄
I opened #18. We should probably move it to the bottom and make the docs clear that it is a low-cost implementation detail.
I opened #19. It used to do this but was changed in 0.4.0. This is a bug (thanks!)
I made a comment to this effect in #19. Thanks!
I think the docs could be improved here, but basically this is doing this for two specific reasons: User Surprise: paths are semantically resolved rather than physically:
Performance: the method you specified (silently switching from "real" canonicalization to semantic") would require that we make a syscall for each path segment (except the last) every time See #6 for more discussion of why we made the decisions we did. I don't think you are wrong, but I think the performance tradeoffs are too high to make a method like the one you propose be the default. I would be very open to a pull request to "normalize" a path by following all symlinks (including |
Like I said, my use-case is paths that may or may not exist. More specifically, I'm trying to work with something like a cache: I absolutely know (and ensure) that the path to the cache exists, but then I construct a bunch of paths inside the cache that may or may not exist, and I might create them if they don't. In this case, the path-to-the-cache probably isn't supplied directly by the user; it may come from an environment variable, or a hard-coded convention that may involve absolutifying (there needs to be a better verb for that) path-fragments gathered from other sources, and so it could include any number of As for performance, like I said, I expect all my canonicalization to be done once at program startup and then re-used to create new paths. Paths my program constructs don't have any To be honest, I'm not exactly sure what I want, I'm just trying to come up with something more robust and ergonomic than the Python APIs I'm used to. Perhaps I should play around with my own implementation to form a more educated opinion. |
I've spent some more time thinking about exactly the behaviour that I want, and I've decided to call it "monotonic paths". I describe monotonic paths in (bewildering) detail in the cli-wg issue, but the basic invariant that describes a monotonic path is "after the path's head (i.e. the When it comes to actually creating monotonic paths, plain old normalisation (as currently used by In comparison with other approaches:
While figuring this out, I wrote an implementation of these ideas, complete with a test suite that passes on Linux and Windows 10 (and mostly passes on Wine). If you're interested in taking a look, I'll polish up the docstrings and stick it in a gist or something; I'd love for |
Hey, sounds intriguiging! Ill definitely take a look.
Ill be on vacation for the next few weeks, so its going to take me a little
while.
- sent from my Android
…On Sat, Jun 2, 2018, 1:08 AM Screwtapello ***@***.***> wrote:
I've spent some more time thinking about exactly the behaviour that I
want, and I've decided to call it "monotonic paths". I describe monotonic
paths in (bewildering) detail in the cli-wg issue
<https://github.com/rust-lang-nursery/cli-wg/issues/10#issuecomment-392456222>,
but the basic invariant that describes a monotonic path is "after the
path's head (i.e. the / or C:\ components), all following components are
names, not /./ or /../". Those named components might or might not be
symlinks, they might or might not exist on disk, it doesn't matter.
When it comes to actually *creating* monotonic paths, plain old
normalisation (as currently used by PathArc) fulfils the invariant, but
(as described) it can change the meaning of a path. I've got an algorithm
here that only tries to dereference the component immediately preceding a
/../ segment; if it's not a symlink or it doesn't exist on disk, the
algorithm uses plain old normalisation.
In comparison with other approaches:
- vs. full canonicalisation (std::fs::canonicalize): my algorithm
doesn't require paths to exist, and it can safely be used to talk about
symlinks directly (re: #6
<#6>).
- vs. partial canonicalisation (Python''s
pathlib.Path.resolve(strict=False)): my algorithm issues fewer
syscalls, since it does not try to stat every path component
- vs. normalisation (path_abs, Python's normpath()): should be
equivalently fast in the common case (no /../ components), but in the
uncommon case gives the same results as the kernel itself
- vs. just joining relative paths to the current working directory: my
algorithm produces cleaner, easier-to-understand paths in the face of
/../ segments, and also my algorithm works on Windows.
While figuring this out, I wrote an implementation of these ideas,
complete with a test suite that passes on Linux and Windows 10 (and mostly
passes on Wine). If you're interested in taking a look, I'll polish up the
docstrings and stick it in a gist or something; I'd love for path_abs to
use it.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#17 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVCy5eF1uUjRqrlfgfShZafGkSCgcFJks5t4jnsgaJpZM4UL-Eh>
.
|
I polished up my implementation, and stuck it in a gist for you to look at.
Have a great vacation, and I'd love to hear what you think when you get back. |
Hey, thanks for doing all this work! Is it possible you can open a PR? Even just copy/pasting the file into something like |
Adds a "monotonic_path_preview" module containing my monotonic path handling code, for ease of review, as requested in vitiral#17. The module is made public, so the documentation can be viewed with `cargo doc`.
I've created PR #20 as a very quick-and-dirty inclusion of my code into It turns out that while the tests themselves pass, the doctests do not. I prototyped this code in a binary crate rather than a library, and it turns out that not only does |
I think you are no longer a new user, and we can now close this 😄 |
I was pointed to this crate from the comments on a Reddit post I made, and thought I'd take notes about things that confused me while reading the documentation.
Errors always include filenames? That's awesome!
In the list of types introduced in this crate,
PathArc
is first and therefore presumably most important, but it's not actually used anywhere in the examples? So maybe it's just an implementation detail I should ignore?In the crate-level docs,
PathAbs
says "An absolute (not necessarily canonicalized) path that may or may not exist" but the documentation forPathAbs::new()
says "The path must exist or io::Error will be returned."? Must the path exist, or not?I'm interested in working with paths that may or may not exist, but the example at the top of the file only creates these structs with
::create()
, creating them on disk at the same time, so it doesn't particularly help me.I see
PathArc
has anabsolute()
method, which does the broken thing of resolving/../
segments semantically, which means it's almost never safe to use. It's a shame; the other parts of its functionality are pretty useful, and there's other useful cleanups it could perform (like removing empty//
path segments, and normalizing delimiters to/
or\
depending on platform), but while there's a chance that it will corrupt a path I'd rather not use it.The specific functionality I was looking for in my Reddit post, and which I can't seem to find in your crate, was partial canonicalization. Instead of Rust's
canonicalize
which requires the the entire path exist, a method that requires all-but-the-last-segment exists, or even a method that silently switches from real canonicalization to semantic canonicalization at the first segment that does not exist.The text was updated successfully, but these errors were encountered: