Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File::open() on directories does not return Err(), leads to breakage with BufReader #64144

Open
kentfredric opened this issue Sep 4, 2019 · 18 comments
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools A-io Area: std::io, std::fs, std::net and std::path C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@kentfredric
Copy link

I tripped onto some odd behaviour.

I was under the assumption calling File::open() on a directory would Err(), and that I could use that to handle a user specifying a path that was not a file, ( instead of falling prey to race conditions by stat-ing first and then opening second ).

However, ... bad things happened instead.

use std::path::PathBuf;
use std::fs::File;
use std::io::BufRead;
use std::io::BufReader;

fn main() {
    let file = File::open(PathBuf::from("/tmp")).unwrap();
    let buf  = BufReader::with_capacity(100, file);
    let _lines: Vec<std::io::Result<String>> = buf.lines().collect();
}

The last of these lines will run forever, with strace reporting:

read(3, 0x564803253b80, 100)            = -1 EISDIR (Is a directory)

Over and over again ad-infinitum with no end in sight.

Somehow I had a variation of this go crazy and eat 200% of my ram, but I'm having a hard time reproducing that exact case (Though it may have been related to the target-directory in question also being massive in my case).

Digging shows related bug #43504

The profound question that remains unanswered is: "Why is calling File::open on a directory fine?"

And the residual question is "How does one invoke File::open in such a way that it refuses to work on directories".

And importantly, how do I achieve that portably?

And if none of these concerns can be mitigated, the very least that could be done is have this giant foot-gun documented somewhere in File::open.

( As far as I can divine, there's no useful behaviour to be found by allowing File::open to work on directories, as standard read() semantics simply don't work on directory filehandles, you have to use readdir() )

@sfackler
Copy link
Member

sfackler commented Sep 4, 2019

File::open works on directories because the underlying system call open(2) works on directories. You can use O_DIRECTORY to trigger an error if the path is not a directory, but I don't know if there's a way of doing the opposite.

Your program is going crazy because you're collecting a vector of io::Errors.

@kentfredric
Copy link
Author

Perhaps, BufRead et. al needs to internally track if the last call returned EISDIR, and then return None() afterwards?

Trying to continue calling read() after EISDIR is kinda nonsensical.

@kentfredric
Copy link
Author

( That is, EISDIR should be considered equivalent to "Error + EOF" , because a subsequent read() cannot return any more data )

@sfackler
Copy link
Member

sfackler commented Sep 4, 2019

I'm not sure we want to try to catalog the list of all possible terminal errors on every platform.

@kentfredric
Copy link
Author

Are there platforms where invoking read() on a filehandle opened for a directory does something useful?

There may be other function calls useful on directories opened as filehandles, but read()?

@estebank estebank added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Sep 5, 2019
@mgorny
Copy link

mgorny commented Sep 5, 2019

Yes. NetBSD supports reading directory entries this way.

@the8472
Copy link
Member

the8472 commented Sep 5, 2019

As far as I can divine, there's no useful behaviour to be found by allowing File::open to work on directories

On linux there are many uses. It gives you a directory file descriptor (accessible via AsRawFd) which can be used for various other syscalls (not necessarily in the standard library) such as fiemap or the *at syscall family such as renameat.

And the residual question is "How does one invoke File::open in such a way that it refuses to work on directories".

Get the metadata of the file after opening it and check its type. This is often necessary anyway if you only want to operate on regular files and not block devices for example.

@jonas-schievink jonas-schievink added C-enhancement Category: An issue proposing an enhancement or a PR with one. A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools and removed T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 10, 2019
@jonas-schievink
Copy link
Contributor

Re-categorizing as a documentation issue. File::open should mention that it (might) work on directories.

kentfredric added a commit to kentnl-rust/ccache_stats_reader that referenced this issue Sep 11, 2019
This is because BufReader/BufRead doesn't give us the interface we
actually need, because those iterators emit a stream of Some(Err())
repeatedly when given a directory as the underlying fh.

Subsequently, the iterator never terminates, and collect() then becomes
an efficient implementation of a memory exhauster.

Bug: rust-lang/rust#64144
kentfredric added a commit to kentnl-rust/ccache_stats_reader that referenced this issue Sep 11, 2019
This is because BufReader/BufRead doesn't give us the interface we
actually need, because those iterators emit a stream of Some(Err())
repeatedly when given a directory as the underlying fh.

Subsequently, the iterator never terminates, and collect() then becomes
an efficient implementation of a memory exhauster.

Bug: rust-lang/rust#64144
@jonas-schievink jonas-schievink added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Mar 6, 2020
@tertsdiepraam
Copy link

tertsdiepraam commented Mar 22, 2023

At the uutils project we just ran into this issue although triggered in a different way. Basically we have a couple instances of this pattern:

reader.lines().filter_map(Result::ok)

This seems quite innocent, but it causes an infinite loop. The correct code would be:

reader.lines().map_while(Result::ok)

I think documentation is not really enough in this case, because it's an easy mistake to make if the programmer is just looking at the types; filter_map(Result::ok) is the "obvious solution" to turn Iterator<Item = Result<T,E>> into Iterator<Item = T> in general.

Perhaps, BufRead et. al needs to internally track if the last call returned EISDIR, and then return None() afterwards?

This seems like a good solution to me. Though I think it would be more consistent if it returned None after any error. I wonder if anyone relies on the current behaviour? Are there plausible use cases of Lines where it might return an error for a while and then then continues with Ok? Or where the error changes on subsequent calls? In any case it would still technically be a breaking change, so I'd like to suggest at least a clippy lint for filter_map(Result::ok) on Lines, along with better documentation.

Relevant issues: uutils/coreutils#4539, uutils/coreutils#4573, uutils/coreutils#4574

@sylvestre
Copy link
Contributor

@samueltardieu maybe it could be add as a clippy check ^^ (see @tertsdiepraam comment)

@tertsdiepraam
Copy link

tertsdiepraam commented Mar 22, 2023

I just found that the case I presented is essentially a duplicate of #34703 and #43504, which were closed with a reason of "this is expected behavior, you're misusing the API". However, I think that the mistake is too subtle and to easy to make to put the blame entirely on the programmer. Additionally, the fact that this problem showed up in uutils shows that it can cause problems in real code, written by competent Rustaceans.

Edit: For a bit less anecdotal evidence, a search on GitHub also brings up many examples of the same pattern: https://github.com/search?q=.lines()+.filter_map(Result%3A%3Aok)&type=code&ref=advsearch

@samueltardieu
Copy link
Contributor

@samueltardieu maybe it could be add as a clippy check ^^ (see @tertsdiepraam comment)

Good idea: rust-lang/rust-clippy#10534

bors added a commit to rust-lang/rust-clippy that referenced this issue Apr 1, 2023
Flag `bufreader.lines().filter_map(Result::ok)` as suspicious

This lint detects a problem that happened recently in https://github.com/uutils/coreutils and is described in rust-lang/rust#64144.

changelog: [`lines_filter_map_ok`]: new lint
@tertsdiepraam
Copy link

@samueltardieu That looks excellent to me. Thank you!

@DorianCoding
Copy link

DorianCoding commented May 26, 2024

Hello,

Sorry to dig up this issue but it is referenced and for anyone that may fall into it.

If you do want to get the BufReader for instance to count lines like this on a directory :

use std::io::{BufRead,Read};
fn getsizeoffile<T: AsRef<std::path::Path>>(path: T) -> std::io::Result<(usize,usize)> {
    let size2 = std::fs::read_to_string(&path).unwrap_or(String::new()).lines().count();
    let lines = std::io::BufReader::new(File::open(&path)?).take(1000);
    let size = lines.lines().count(); //WILL CRASH
    Ok((size,size2))
}
#[test]
fn testsize() {
    let (size0, size1) = getsizeoffile("/").unwrap();
    assert_eq!(size0,size1);
}

The first std::fs::read_to_string(&path).unwrap_or(String::new()).lines().count(); would unwrap with IsAdirectory however the second bufreader will consume all CPU.
To avoid that, you need to change the line to:

use std::io::{BufRead,Read};
fn getsizeoffile<T: AsRef<std::path::Path>>(path: T) -> std::io::Result<(usize,usize)> {
    let size2 = std::fs::read_to_string(&path).unwrap_or(String::new()).lines().count();
    let lines = std::io::BufReader::new(File::open(&path)?).take(1000);
    let size = lines.lines().map_while(Result::ok).count(); //Count only if lines is okay (else count lines till there)
    Ok((size,size2))
}
#[test]
fn testsize() {
    let (size0, size1) = getsizeoffile("/").unwrap();
    assert_eq!(size0,size1);
}

It is already stated in the above discussion. As well the take limit does nothing here because it does not read anything.

@kaxxa123
Copy link

I am here after troubleshooting a bug that assumed that opening a directory is cross-platform compatible, but in-fact works on Unix platforms but not on Windows. See here:
paradigmxyz/reth#9747

I think this is the fundamental problem i.e. the behavior is platform specific.
Yet this is not documented clearly.

I would appreciate if someone corrects me and points me to documentation on this.
The only place where I found a clear description of the platform differences was ChatGPT.

@the8472
Copy link
Member

the8472 commented Aug 23, 2024

Opening a directory is possible on windows, but you need to pass an extra flag.

The std::fs APIs are rather thin abstractions over platform APIs, so you'll have to consult the platform's documentation for the low level details. The behavior of special files (such as directories, pipes and others) are platform-specific.

@ChrisDenton
Copy link
Member

We do sometimes do things to abstract over platforms differences. Though for this change particularly it's arguable which is the "right" default behaviour and changing it on either platform may be unexpected.

I think a way forward might be to add an option to OpenOptions which can override platform defaults.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools A-io Area: std::io, std::fs, std::net and std::path C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests