-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Mmap
to open the rmeta file.
#55556
Conversation
Here's what DHAT said about this, for
In other words, the allocations for these files accounted for 8.23% of all heap-allocated bytes, and 30.3% of the peak heap memory usage, but on average each byte was only read 0.27 times (i.e. most of the data was never even read). I did a local benchmarking run and the results weren't as good as the above profiling suggested they would be.
But
A bit disappointing overall, but still certainly worth landing, IMO. |
@bors try |
🔒 Merge conflict This pull request and the master branch diverged in a way that cannot be automatically merged. Please rebase on top of the latest master branch, and let the reviewer approve again. How do I rebase?Assuming
You may also read Git Rebasing to Resolve Conflicts by Drew Blessing for a short tutorial. Please avoid the "Resolve conflicts" button on GitHub. It uses Sometimes step 4 will complete without asking for resolution. This is usually due to difference between how Error message
|
@bors try |
⌛ Trying commit 8fa874b329057d51e58ecfc4532b170def218d91 with merge 7325ca0693ea3ec184cca9bdf5b6d48f345d6beb... |
This comment has been minimized.
This comment has been minimized.
💔 Test failed - status-travis |
This comment has been minimized.
This comment has been minimized.
@bors try |
Use `Mmap` to open the rmeta file. Because those files are quite large, contribute significantly to peak memory usage, but only a small fraction of the data is ever read. r? @eddyb
// mmap the file, because only a small fraction of it is read. | ||
let file = std::fs::File::open(filename).map_err(|_| | ||
format!("failed to open rmeta metadata: '{}'", filename.display()))?; | ||
let mmap = unsafe { memmap::Mmap::map(&file) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obligatory "not actually safe to map this because of concurrent modification"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do some simple locking here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the inherent unsafety be clarified? I'm not super familiar with the relation between mmap and concurrent modification, but rustc does sometimes load (and later reject) files which are being concurrently modified. This can happen when cargo fires of a number of rustc builds, and as one is loading dependency information another is writing that information. The one loading may read a halfway-written file, but it'll be guaranteed to reject it at some point later because it's not actually needed (or otherwise Cargo would sequence the compilations differently).
In light of this, though, is this something we actually need to fix in rustc or owrry about from an unsafety point of view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of the problem: we mmap the file's contents into a Vec
at T1, and that Vec
's contents are theoretically set. However, in practice they aren't actually set until they are read, at T2, whereupon the contents will be page-faulted in. But it's possible that the file's contents have changed between T1 and T2. (Even worse, there is actually a different T2 for every separate page in the Vec
.)
File locking won't help unless we lock the file for the entire period that its contents may be read, which doesn't seem like a good idea.
Also, even if you use the normal file-reading operations, there's still a chance you'll end up with inconsistent data if the file is overwritten in the middle of that operation. But the window of opportunity is potentially much smaller. (I think this is what @alexcrichton is referring to.)
Given that the gains turned out to be minor, I would be ok with abandoning this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the gains turned out to be minor, I would be ok with abandoning this.
They did? In both time and memory? That's a shame, then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be legal for LLVM to optimize 4 relaxed atomic 1-byte loads into one relaxed atomic 4-byte load. (The same optimization would not be legal for volatile.) Merging loads is okay with atomics, just splitting them is not. I am not sure if LLVM does anything like that though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping it would do something like that, but then my code was 10x slower when I used relaxed atomics :( I didn't spend much time wrestling with it, though, and I'm no optimization expert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I expect that there is lots of unused potential around optimizations for atomics. But of course that doesn't help right now, that would take someone to improve LLVM's analyses.
As far as this PR is concerned, the good news is that this is UB only if mutation actually happens. The fact that this is shared is invisible to the program until something else writes to it. That is unlikely to happen. So the end result is that the compiler may fail in arbitrarily unexpected ways if someone mutates the mmap'ed file, but absent that nothing strange can happen.
There's no way to ask the kernel to do copy-on-write for an mmap'ed file so that memory, once actually mapped (on the first access), is frozen and not affected by others mutating the file -- is there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that I think we have to read things byte by byte anyway, so I have a doubt that AtomicU8
will pessimize anything.
EDIT: other than maybe getting a &[u8]
to read a String
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the whole point of this PR is that typically only 10--20% of the mapped memory in question is actually read. There are no loops here, so the overhead of atomics are probably not high.
☀️ Test successful - status-travis |
@rust-timer build 6554f34 |
Success: Queued 6554f34 with parent de9666f, comparison URL. |
It looks like perf got stuck? It seems to be restarting benchmarks on the same commit again and again. /cc @Mark-Simulacrum |
type Target = [u8]; | ||
|
||
fn deref(&self) -> &[u8] { | ||
self.0.deref() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preferred way to write this is &*self.0
, but YMMV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I agree, but I think this way is clearer within a deref
method :)
Finished benchmarking try commit 6554f34 |
I still want to land this, so I'll keep it open so we don't forget about it. |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
FWIW I don't intend to draw attention to possible unsafety here and try to make sure this has really strong motivation to land. @eddyb is right in that LLVM is already mmaping rlibs and we've basically never had problems with that (afaik). Also the compiler's predominant use case is not one that's exploiting UB, it just runs the risk of accidentally reading halfway-written files, nothing is being overwritten and truncated, it may just not be as long as we thought it was. (this is already the case today, we can read a file that was halfway written). I am personally very unfamiliar with safety and mmap, but given how we've basically done this forever (via rlibs and LLVM) and it's not hitting the "serious cases" like overwriting data and truncation, this seems fine to land by me. This clearly doesn't have any perf regressions and should be a clear win for larger crates and dependency graphs. |
@nnethercote Can you perhaps file an issue about attempting the replacement of Otherwise, I think we should land this. |
Ping from triage! What is the status of this PR? |
I closed it, @eddyb reopened it. I'm not inclined to pursue it further, because the perf wins were very small and the semantic are complicated. It's fine by me if someone else wants to take it over. |
Because those files are quite large, contribute significantly to peak memory usage, but only a small fraction of the data is ever read.
@bors r+ |
📌 Commit 818257e has been approved by |
@nnethercote Like I said, I think that's just because you weren't testing on workspaces large enough. The semantics aren't anything new, because we already mmap via LLVM for |
Use `Mmap` to open the rmeta file. Because those files are quite large, contribute significantly to peak memory usage, but only a small fraction of the data is ever read.
Rollup of 16 pull requests Successful merges: - #54906 (Reattach all grandchildren when constructing specialization graph.) - #55182 (Redox: Update to new changes) - #55211 (Add BufWriter::buffer method) - #55507 (Add link to std::mem::size_of to size_of intrinsic documentation) - #55530 (Speed up String::from_utf16) - #55556 (Use `Mmap` to open the rmeta file.) - #55622 (NetBSD: link libstd with librt in addition to libpthread) - #55827 (A few tweaks to iterations/collecting) - #55901 (fix various typos in doc comments) - #55926 (Change sidebar selector to fix compatibility with docs.rs) - #55930 (A handful of hir tweaks) - #55932 (core/char: Speed up `to_digit()` for `radix <= 10`) - #55935 (appveyor: Use VS2017 for all our images) - #55936 (save-analysis: be even more aggressive about ignorning macro-generated defs) - #55948 (submodules: update clippy from d8b4269 to 7e0ddef) - #55956 (add tests for some fixed ICEs)
Use `Mmap` to open the rmeta file. Because those files are quite large, contribute significantly to peak memory usage, but only a small fraction of the data is ever read. r? @eddyb
Rollup of 17 pull requests Successful merges: - #55182 (Redox: Update to new changes) - #55211 (Add BufWriter::buffer method) - #55507 (Add link to std::mem::size_of to size_of intrinsic documentation) - #55530 (Speed up String::from_utf16) - #55556 (Use `Mmap` to open the rmeta file.) - #55622 (NetBSD: link libstd with librt in addition to libpthread) - #55750 (Make `NodeId` and `HirLocalId` `newtype_index`) - #55778 (Wrap some query results in `Lrc`.) - #55781 (More precise spans for temps and their drops) - #55785 (Add mem::forget_unsized() for forgetting unsized values) - #55852 (Rewrite `...` as `..=` as a `MachineApplicable` 2018 idiom lint) - #55865 (Unix RwLock: avoid racy access to write_locked) - #55901 (fix various typos in doc comments) - #55926 (Change sidebar selector to fix compatibility with docs.rs) - #55930 (A handful of hir tweaks) - #55932 (core/char: Speed up `to_digit()` for `radix <= 10`) - #55956 (add tests for some fixed ICEs) Failed merges: r? @ghost
Per @eddyb’s request I’ve tried this in Servo. Comparing the first Nightly that contains this change, I run Times in seconds reported by Cargo before: 42.67, 41.73, 42.40, 42.42. Average: 42.305 seconds. Times after: 40.80, 39.81, 39.87, 39.71. Average: 40.0475 seconds. This is a ~5% time reduction. $ git log --merges --pretty=%s 6f93e93..6b9b97b |
Because those files are quite large, contribute significantly to peak
memory usage, but only a small fraction of the data is ever read.
r? @eddyb