-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross-platform file system abstractions #10
Comments
From the etherpad:
|
Maybe also relevant (also from the etherpad):
|
I want to simply dump this here: I'm the author of imag and we have a rather nice abstraction for FS access... maybe you can learn from that. Our abstraction is rather specific for our use-case (each file has a TOML header and a "content" section), but the basic concept (a "store" which holds the files and they can then be borrowed from that store, so that concurrent access to one file is not possible) could be useful for other CLI apps. More information about the |
One thing that we talked about at Mozilla which would be useful in Firefox, as well as possibly useful to expose to the web, would be an API for doing atomic file writes in "small" files. As I understood it, the best way to do truly atomic writes efficiently is to copy the file, while inserting the desired modifications, into a new file. Once the new file has been written issue a rename to the original file name. Alternatively, if the file is small enough and multiple modifications are expected, read the file contents into memory and each time that a modification should be done write out a new file and rename to the desired file destination. The nice thing with this approach is that it can be done without complex journaling files, and without needing to call flush() which can be quite a perf bottleneck. This was something that we were thinking of using for the numerous small configuration/database files that Firefox keeps. I think something like this would fit well with Rust's focus on performance and safety. I'm not sure that this is particularly CLI specific though. But since other filesystem stuff was discussed I figured I'd mention it. |
For some reason I thought the APIs in AFAIK it's OK to use verbatim paths in all Windows APIs, so perhaps a crate that provides a small wrapper type would be good enough? Something like (If someone writes such a crate I would also like it to have an easy |
FYI this link 404s now, the code seems to have moved to https://github.com/dherman/verbatim |
...and after actually reading a little, it sounds like that crate is about 90% of what I proposed in my previous comment. :) |
See also:
I would be interested to here more about how other ecosystems solve this. In particular, how do they deal with relative paths? |
@killercup I believe normalisation on happens on HFS+ on macOS, I don't believe APFS does normalisation on the mac. |
On Unicode normalization and HFS: BurntSushi/ripgrep#845 |
My thought was to just start with a "normalize_path" crate/function that people can use regardless of what path API they are using, kind of like This crate would handle
Then on top of this we could look into an "easy paths" api that is more like python's pathlib combined with easy_strings to help with the prototyper case (easy to reach for needed utilities, less concern for borrow checker at cost of memory or cpu time). This would call "normalize_path" on any untrusted input. I've been providing feedback on
What is your concern with relative paths? |
|
This is not generally possible in POSIX because of symlinks: if That said, Rust kind of has this already in the form of |
I'm mixed on what I'd expect for that symlink scenario. Either way, I assume that with a But for pathlib, apparently, the symlink bug is always there. I'm surprised though, I thought handling of |
Pathlib should actually be OK: in my example, as it resolves
In POSIX, the kernel just takes each path segment and hands it to the filesystem to resolve, and the filesystem physically records a Plan9 behaves the way you expect, but then it doesn't have symlinks so this whole situation just isn't a problem there. |
One of the issues that would need to be addressed in a better way is A method that both concatenates a path or completely replaces the existing one based on the input ... when did one ever have that use-case? I'd guess that pretty much every piece of code using The problem is not that the method exists, but that the individual operations that this method combines do not exist as their own methods. Considering this and the Windows-related issues, I think this makes a good case for having separate types for absolute and relative paths. |
I've tried to do a quick and dirty summary of this. Please point out where I need to expand it! I know on either users or reddit, I saw complaints about
Anyone know where these were so we can reach out to the people that have concerns? Or add your own thoughts on these topics? |
I'd expect that to be the majority use-case for dealing with user-specified paths.
The wrinkle in the above is unwrapping the result of |
Join is useful to make absolute paths from relative paths, which in turn are useful for nearly all cfg files, for both sharing across OSes and moving cfg files. If you can guarantee some validations of the user made paths on a cfg (is relative, doesn't use any (not just the current) OS forbidden characters except '/' and closed quotes, len(cfgpath.parent().join(relativepath)) isn't larger than MAX_PATH_LENGTH in the current platform ) you can even assure it's somewhat OS portable. Though i suppose it's better than all those preconditions are specified in a higher level cfg abstraction, join could still be be useful (in spite of being easy to blow up by passing a absolute path to the suffix). It would have to deal with stuff like I think java solution here was to make 'path' iteration be at directory granularity? Or path is not iterable i don't remember. |
I think it's highly dangerous that
do completely different things. I don't think most people will expect that "joining" one path to an existing path can destroy their existing path. I think a serious path implementation should clearly separate those operations, e. g.
"joining" an absolute path should not even compile. |
I am just seeing this now. First, I'd like to announce the release of 0.4.0 of path_abs which now uses "absolute" instead of "canonicalized" paths. This means that you can have a path with symlinks that may or may not exist, but I'm going to just do a checklist of things from this thread and open issues that aren't covered: File path handling
Edit: I missed some
|
Also, ergo_fs is related to this discussion. |
Edit: I moved my comment about weird |
The full glory of handling Windows paths: https://googleprojectzero.blogspot.de/2016/02/the-definitive-guide-on-win32-to-nt.html TL;DR: Depending on how low-level you want the API to be, you have to handle 7 different path types. |
Yep. Managing code where the developer has control over the path and file names is my use case for the library I'm toying with. Such a library would be exactly what's needed for reading/writing/modifying configuration in a cross-platform compatible manner. It's not meant to deal with all the insanities operating systems have invented. If that's your use case, use Rust's path. Getting things right and reliable has its cost. In some cases it makes sense to pay this cost to get the reliability, in some cases it doesn't.
The point is that these things do happen. Chrome had a security issue due to Windows' general path traversal craziness. If one of the largest IT corps with a highly skilled security department can't deal with paths, what's the chance some random developer can? |
@soc I think this conversation should be moved to a separate issue since I think it is a very small part of the other concerns here. |
@vitiral Thanks, agree on that. |
I've been thinking about path-handling this weekend, after I posted a question to Reddit and had a discussion with @vitiral about his path_abs crate. I've come up with a model that covers my needs and expectations, but I'm interested to hear whether it would suit other people too. MotivationFor tools that take a path on the command-line and just use it immediately (think For tools that take a path from a config file or a database, tools run as batch jobs, or tools that generate or manipulate paths, relative paths can make problems difficult to diagnose: You get a report saying "File not found: some/relative/path", you look in the place you thought the tool would look and the file is definitely present, so clearly the tool was looking somewhere else—but where? To avoid confusion, I want every tool I write to use complete paths, so that when I look at a log-line or error message I can tell exactly what it was looking at. DesignI want to use "monotonic paths" as my standard in-memory representation of a filesystem path. Some definitions: A path is a sequence of zero-or-more components (in the A path's head is:
A path's tail is zero-or-more A monotonic path is one whose tail contains only A monotonic absolute path is a monotonic path whose head contains (on POSIX) a A monotonic relative path is a monotonic path whose head contains no components. A monotonic relative path can be blindly appended to another monotonic path without breaking its monoticity. Example monotonic paths:
Example non-monotonic paths:
ImplementationMaking a path monotonic involves making the head monotonic, and making the tail monotonic. Making the head monotonic is easy, since you can just pass it to Making the tail monotonic involves removing the To remove
You could go even further and resolve as many symlinks as possible instead of just the ones preceeding In Rust, I imagine there would be AlternativesJust use whatever path you were given, as-isAs mentioned in the motivation, this can lead to confusion when a user comes up with a path in one context, then gives it to a program that uses it in a different context (a different time, a different host, a different working directory). Turning relative paths into fully-explicit paths makes it clearer what context the program is using. If you get a relative path, just join it onto the current working directoryRelative paths may contain any number of A bigger problem is that this approach doesn't work on Windows. If your current working directory is If you get a relative path, just use
|
I played with some of my own ideas in https://github.com/soc/paths-rs. The focus is different though: I'm largely interested in being able to have paths (e. g. in config files) that can be read/used/written/moved across operating systems with the guarantee that a path that has been constructed is valid across all operating systems. |
I discussed a but in the gitter about this, but a lot of times when I write a CLI with a config file, or a environmental variable on a unix/linux platform I run into the dreaded I know that the shell would traditionally handle these expansions, but people (myself included) do put these shorthands in config files, variables, or use them fairly often with CLI arguments. It would be really nice to have a wrapper crate that I can trust to handle all paths/expansions when dealing with a CLI program. I agree with above about the cross platfrom as well. |
Yes, that's exactly my goal. It's not implemented though, but the idea is to have special tokens like These tokens are intended to exist explicitly in the serialized format (as strings in a config file) as well as in the memory representation and are only resolved when converting |
Just mentioning the following, hope that could be helpful:
Thanks for your time. |
Could you be more specific of what lessons we should learn from go? I'd rather not assume incorrectly and miss the information you are trying to share. |
|
Thanks for that overview, that will be really helpful. I do feel like to certain audiences, POSIX behavior would be surprising, so there is a trade off of being familiar to POSIX people or not be surprising to non-POSIX people (like if you iterate on
Even Python recognized using strings is broken and now supports bytes as well :). Granted, we do need a good way of interacting with the native Path string type. |
What is the uyse case for appending an absolute path to an absolute path? I feel like the "replace" is meant to act like CWD handling. I could see having distinct and more semantically meaningful names like
I think this is an area where documentation is needed to clear this up but you can use https://doc.rust-lang.org/std/path/struct.PathBuf.html#impl-Extend%3CP%3E |
In a program to manage OS trees, I have to copy That seems counter-intuitive with what I'm used to do in other languages, where I expect
|
First, not all languages do that. I know of at least Python and C++ have the same behavior as Rust. I suspect the difference is whether the API is meant to be a convenience over string manipulation or if it treats paths as first class objects.
I'm a little confused at how
Oh, right, that does make things more annoying. Definitely something to keep in mind when we get to the "simplified path API" (I plan to get to it after |
I fail to see how replacing is a sensible default behavior but obviously that must be good if Python, C++ and Rust do that. From a security point of view, if I accept a user provided path
EDIT: Just looked at the static file serving example from Rocket. They do
I still don't understand then. Could you give me an example? |
This is a summary of the things I would absolutely need when working with paths in Rust, so I don't have to roll-out my own validation and sanitization methods to prevent path traversal, I don't have to be too far from what POSIX defines and I can work with normalized paths without syscalls. Normalization with Pure Lexical ProcessingLike path::normalize("/////var/lib/../../etc/mozilla/"); // --> /etc/mozilla Non-destructive
|
There might be some useful notes for this in a soon-to-close RFC: rust-lang/rfcs#2188 |
@gdouezangrard Thanks for this useful overview, I think I'll follow that approach in my paths crate. |
(moderated summary by the WG)
..
,..
,//
not being auto- handled\\?\
where needed on Windows rust-lang/rust#32689pathlib
-like API.format!
a path for a command line argument or mutating a path can cause problems with the non-UTF8 nature ofOsStr
@killercup's original post
In the first meeting, we talked a bit about the pain points of dealing with files and path in a cross-platform manner.
One idea is to create or improve crates that provide higher-level abstractions than
std
in this area.The text was updated successfully, but these errors were encountered: