-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ptr::read code in debug builds #81163
Comments
Also Cc #80377 |
Actually, |
If all of |
Yeah, that would work... but we'd have to make them public to the entire |
Any chance of implementing those lowest level functions exclusively with intrinsics? It seems non-ideal to rely on a (non-existing in debug mode) optimizer to turn this into a decent binary code. I would argue that the most fundamental building blocks to low-level applications should never cause unnecessary overhead, even in debug mode. |
Intrinsics carry a significant implementation cost in various layers of the compiler (typechecking, CTFE/Miri, codegen), so IMO we should keep the number of intrinsics to the absolute minimum necessary.
This is not entirely true; some MIR simplification passes do run even in debug mode. |
Fair enough. I still think
If we can get efficient implementations for the mentioned functions another way that would be great. |
That might be a good enough justification to make the fields of OTOH, the volatile operations have to be intrinsics anyway, so potentially much of the code could be shared between |
I may be off the mark here, but might we see net perf gains with certain cheap wrapper functions (e.g. |
|
I thought |
For the old pass manager:
For the new pass manager:
|
It doesn't seem implausible to me that |
I mean slower compared to manual inlining. |
Sure, |
|
FWIW,
I think the main issue here is that using re-exports makes the intrinsics stably callable at |
I don't think we should be trying to solve this at the library level. We are getting rather close to making MIR inlining work everywhere, and in my personal opinion we should look into improving debug mode by adding more |
|
Anyway, this issue is about |
I started working on that optimization last night. |
Lowering |
FWIW, the So together with #81238, |
directly expose copy and copy_nonoverlapping intrinsics This effectively un-does rust-lang#57997. That should help with `ptr::read` codegen in debug builds (and any other of these low-level functions that bottoms out at `copy`/`copy_nonoverlapping`), where the wrapper function will not get inlined. See the discussion in rust-lang#80290 and rust-lang#81163. Cc `@bjorn3` `@therealprof`
What is the status of that? Both |
I can confirm that I'm seeing slight binary size reductions (< 1% in the best case) when comparing some STM32F0 applications using |
@RalfJung I opened #81344 with that optimization but completely forgot that we approved an MCP a few months ago to add a MIR |
directly expose copy and copy_nonoverlapping intrinsics This effectively un-does rust-lang/rust#57997. That should help with `ptr::read` codegen in debug builds (and any other of these low-level functions that bottoms out at `copy`/`copy_nonoverlapping`), where the wrapper function will not get inlined. See the discussion in rust-lang/rust#80290 and rust-lang/rust#81163. Cc `@bjorn3` `@therealprof`
FYI #77511 has landed. |
@wesleywiser do you still plan on redoing #81344? |
Thanks for the reminder! I've opened #83785 which shows some small, positive improvements to compilation time. |
rust/library/core/src/ptr/mod.rs Lines 1116 to 1121 in 5fa44b5
Is there anything left to do in this issue? |
For now, considering this fixed by #87827. |
In #80290, some people raised concerns about the quality of the code that
ptr::write
compiles to in debug builds. Given that, to my knowledge, reads are much more common than writes, I would think that one should be much more concerned with the code thatptr::read
compiles to -- and currently, there's going to be quite a few function calls in there, so without inlining, that code will be pretty slow.ptr::read
could be improved with techniques similar to what I did forptr::write
(call intrinsics directly, and inline everything else by hand). This would result in (something like) the following implementation: (EDIT see below for why this is wrong)However, here we have the extra difficulty that
read
is (unstably) aconst fn
, so the above implementation is rejected.&tmp.init
can be replaced by&mut tmp.init
and that works (or we wait for a bootstrap bump so we can make use of #80418), buttransmute_copy
is non-const
, so there's still more work to be done. (transmute
does not work since the compiler does not recognize thatT
andManuallyDrop<T>
have the same size.)I will stop here, but if someone else strongly cares about
ptr::read
performance/codesize in debug builds, feel free to pick this up and drive it to completion.Cc @bjorn3 @therealprof @usbalbin
The text was updated successfully, but these errors were encountered: