-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification on performance improvements in std::fs::read_to_string
for Windows
#130600
Comments
PRs welcome! |
This should not be the case. The read-loop limits the buffer size passed to the underlying reader regardless of whether you use rust/library/std/src/io/mod.rs Lines 501 to 514 in 1a5a224
|
But the rust/library/std/src/io/mod.rs Lines 903 to 906 in 1a5a224
|
#110650 was not about allocation, it was about a too-large buffer getting passed to the windows API.
|
Thank you for pointing that out, and I apologize for not noticing the changes introduced in #118222 earlier. |
@workingjubilee so I'm sorry for the confusion. I will close both the issue and the PR. |
Hrrm, looking at it again, the heuristics don't apply optimally. Usually file io APIs will fill the whole buffer so no short reads will occur, which means it will stay in the doubling regime for a while, which avoids excessive initialization costs. But when it hits EOF there'll be one short read. And since we can't distinguish from short reads and EOF we have do one more read to figure that out. At that point the full buffer will be passed because the heuristics think no initialization happens (since we only check the initialization done via There should be ways to improve this. |
Thanks for bringing it to our attention, there appears to be room for improvement after all! |
Hmm, this is becoming a bit difficult for me to follow. Should we close this issue and PR, and instead create a new one focused on performance improvements? |
Yes, I think that would be good. My idea would be deferring the If you like you can take a stab at it, otherwise I will. |
@ChrisDenton btw, did you observe what windows is doing? Is it zeroing the buffers in the kernel? If that is the case we could check this in a test by writing some sentinel bytes at the end of a large buffer and check if they survive a trip through the read_to_end API. |
Iirc it's aggressively locking every page in the buffer. Which can matter if the buffer is large but the read is relatively small. EDIT: to expand on that, in my notes I have that it does some validation of the pages (e.g. are they accessible?) then marks them as non-pageable and maps them into kernel space. There is an optimization for small reads where it uses an internal buffer then simply copies to the user buffer. |
It seems difficult for me to touch the source code. |
PR #130670 |
This is incorrect. The heuristics never apply in this case, as Lines 830 to 835 in 2836482
|
…ChrisDenton delay uncapping the max_read_size in File::read_to_end In rust-lang#130600 (comment) I realized that we're likely still passing too-large buffers to the OS, at least once at the end. Previous issues and PRs: * rust-lang#110650 * rust-lang#110655 * rust-lang#118222 r? ChrisDenton
…ChrisDenton delay uncapping the max_read_size in File::read_to_end In rust-lang#130600 (comment) I realized that we're likely still passing too-large buffers to the OS, at least once at the end. Previous issues and PRs: * rust-lang#110650 * rust-lang#110655 * rust-lang#118222 r? ChrisDenton
…ChrisDenton delay uncapping the max_read_size in File::read_to_end In rust-lang#130600 (comment) I realized that we're likely still passing too-large buffers to the OS, at least once at the end. Previous issues and PRs: * rust-lang#110650 * rust-lang#110655 * rust-lang#118222 r? ChrisDenton
Rollup merge of rust-lang#130670 - the8472:read-to-end-heuristics, r=ChrisDenton delay uncapping the max_read_size in File::read_to_end In rust-lang#130600 (comment) I realized that we're likely still passing too-large buffers to the OS, at least once at the end. Previous issues and PRs: * rust-lang#110650 * rust-lang#110655 * rust-lang#118222 r? ChrisDenton
delay uncapping the max_read_size in File::read_to_end In rust-lang/rust#130600 (comment) I realized that we're likely still passing too-large buffers to the OS, at least once at the end. Previous issues and PRs: * #110650 * #110655 * #118222 r? ChrisDenton
Location
Function
read_to_string
instd::fs
https://doc.rust-lang.org/stable/std/fs/fn.read_to_string.html
Summary
The current documentation for
std::fs::read_to_string
states that it is a convenience function forFile::open
andstd::io::Read::read_to_string
, providing fewer imports and no intermediate variables. However, after the recent pull request #110655, it seems thatstd::fs::read_to_string
has received performance improvements specifically for Windows, addressing issues thatFile::open
andstd::io::Read::read_to_string
still face on this platform.This implies that using
std::fs::read_to_string
is not just a matter of convenience, but may also result in better performance on Windows compared to manually combiningFile::open
andstd::io::Read::read_to_string
.I believe this performance improvement should be mentioned in the documentation, especially for developers targeting Windows, as the current documentation only emphasizes the convenience aspect without mentioning the performance benefits.
Suggested change:
Add a note to the documentation of
std::fs::read_to_string
, clarifying that due to recent improvements in pull request #110655, it offers better performance on Windows than manually usingFile::open
andstd::io::Read::read_to_string
.Thank you for your consideration!
The text was updated successfully, but these errors were encountered: