-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std::io::fs::walk_dir has O(n^2) behaviour #13411
Comments
@huonw Would keeping an index be sufficient? Just return |
That might work, but it does mean that the size of the internal vector grows to be the total number of elements ever touched, rather than just the number of children of recently-walked directories. It would also require cloning each of them. (Maybe not, if it stored |
What about this. When we run out of space at the end, check if we are using
|
If you can get that to work, I guess so. But really, it would be better/more correct to just use a proper |
I am going to guess that taking |
"It calls .shift on a vector repeatedly." but it's very slow. |
This isn't multiple items, it needs to shift the |
(@pongad I would be OK with moving deque into a "secret" module inside std and reexporting it from libcollections, however rustdoc doesn't support this nicely just yet.) |
How about something along these lines: /// An iterator which walks over a directory
pub struct Directories {
stack: Vec<std::vec::MoveItems<Path>>,
}
impl Iterator<Path> for Directories {
fn next(&mut self) -> Option<Path> {
enum LoopResult {
AddAndReturn(std::vec::MoveItems<Path,Path),
Return(Path),
Pop,
}
loop {
let loop_result;
match self.stack.last() {
Some(ref mut path_iter.next()) => {
Some(path) {
if (path.is_dir() {
match readdir(&path) {
Ok(dirs) => {
loop_result = AddAndReturn(dirs.move_iter(), path)
},
Err(..) => {
loop_result = Return(path)
},
}
} else {
loop_result = Return(path)
}
},
None => {
loop_result = Pop
},
}
None => {
return None,
}
};
match loop_result {
AddAndReturn(dirs,path) => {
self.stack.push(dirs);
return path
},
Return(path) => {
return path
},
Pop => {
self.stack.pop()
},
};
}
}
} I didn't test the above, removing syntax errors is left as an exercise to the reader. Edit: Never mind, I see this changes the order that walk_dir guarantees. I'll try and rewrite my proposal tonight to make it emulate the current behaviour. |
I'm not sure the current order is important, I don't know though (if it's not important we can probably just use However, if we wish to retain the current order, I really do think the correct solution is to just use a deque, rather than write a complicated specialised data structure for this relatively simple task. (Unless that data structure has other significant benefits, of course. Maybe avoiding the copies from each |
The `walk_dir` iterator was simulating a queue using a vector (in particular, using `shift`), leading to O(n^2) performance. Since the order was not well-specified (see issue rust-lang#13411), the simplest fix is to use the vector as a stack (and thus yield a depth-first traversal). This patch does exactly that. It leaves the order as originally specified -- "some top-down order" -- and adds a test to ensure a top-down traversal. Note that the underlying `readdir` function does not specify any particular order, nor does the system call it uses. Closes rust-lang#13411.
The `walk_dir` iterator was simulating a queue using a vector (in particular, using `shift`), leading to O(n^2) performance. Since the order was not well-specified (see issue #13411), the simplest fix is to use the vector as a stack (and thus yield a depth-first traversal). This patch does exactly that, and adds a test checking for depth-first behavior. Note that the underlying `readdir` function does not specify any particular order, nor does the system call it uses. Closes #13411.
fix rust-lang#13411 The `if_not_else` lint can be fixed automatically, but the issue above reports that there is no implementation to do so. Therefore, this PR implements it. ---- changelog: [`if_not_else`]: make suggestions for modified code
It calls
.shift
on a vector repeatedly.Example:
50000
and100000
are directories with that many files in them. Theoretically this wants a deque.The text was updated successfully, but these errors were encountered: