-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std: make HEAP
initializer never inline
#120205
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @ChrisDenton (or someone else) soon. Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (
|
We can use |
That's the original implementation before adding
I've noticed another init method here:
And it seems to run before |
While that is certainly a lot better, I'm still very nervous about this change. Allocation is absolutely fundamental to libstd (as opposed to libcore); it's unusable without it. Having no fallback at all makes this risky should it not get initialized. It's also been a problem in the past that users can run their own code before our pre-main code. We now try to mitigate against that but it's only a "best effort" attempt. |
Now I place the initializer in rust/library/std/src/sys/pal/windows/compat.rs Lines 36 to 39 in a58ec8f
I think |
That was recently clarified: https://doc.rust-lang.org/std/#use-before-and-after-main While there aren't any strict guarantees, we do attempt to make sure everything works pre-main unless platform limitations mean it's not possible. |
This comment has been minimized.
This comment has been minimized.
I think I'll need to sleep on this one. My thoughts at the moment are:
|
Another possible solution, is to move |
Yes, I think it'd be easier for me to accept something like that. Another way might be to wrap the call to |
0c59340
to
936e0fd
Compare
HEAP
only onceHEAP
initializer never inline
OK. I just marked |
Looks good to me @bors r+ rollup |
What the diff with PR? |
Just now, I've noticed that in some cases this PR will not shrink the binary size. I'll turn the PR to draft and may try some other approaches. |
@bors r- |
@mati865: 🔑 Insufficient privileges: Not in reviewers |
@bors r- |
My suggestion would be to try something like: fn process_heap_alloc(flags: c::DWORD, dwBytes: c::SIZE_T) -> c::LPVOID {
let heap = init_or_get_process_heap();
if heap.is_null() {
return ptr::null_mut();
}
unsafe { HeapAlloc(heap, flags, dwBytes) }
} And call |
It shrinks the binary size of course, but I'm a little afraid that it introduces a longer calling chain to call |
I'd be surprised if it did but would be open to being wrong. |
936e0fd
to
27a6e6e
Compare
OK, I finally wrapped P.S. Comparing stripped binary, for each |
From my personal testing this does not appear to affect performance in a way I can discern from noise. So I'll r+ this. In the worse case, it can always be reverted if people do find it does affects them. @bors r+ |
…iaskrgr Rollup of 12 pull requests Successful merges: - rust-lang#103522 (stabilise array methods) - rust-lang#113489 (impl `From<&[T; N]>` for `Cow<[T]>`) - rust-lang#119342 (Emit suggestion when trying to write exclusive ranges as `..<`) - rust-lang#119562 (Rename `pointer` field on `Pin`) - rust-lang#119800 (Document `rustc_index::vec::IndexVec`) - rust-lang#120205 (std: make `HEAP` initializer never inline) - rust-lang#120277 (Normalize field types before checking validity) - rust-lang#120311 (core: add `From<core::ascii::Char>` implementations) - rust-lang#120366 (mark a doctest with UB as no_run) - rust-lang#120378 (always normalize `LoweredTy` in the new solver) - rust-lang#120382 (Classify closure arguments in refutable pattern in argument error) - rust-lang#120389 (Add fmease to the compiler review rotation) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#120205 - Berrysoft:windows-alloc-init, r=ChrisDenton std: make `HEAP` initializer never inline The system allocator for Windows calls `init_or_get_process_heap` every time allocating. It generates very much useless code and makes the binary larger. The `HEAP` only needs to initialize once before the main fn. Concerns: * I'm not sure if `init` will be properly called in cdylib. * Do we need to ensure the allocator works if the user enables `no_main`? * Should we panic if `GetProcessHeap` returns null?
Optimize `process_heap_alloc` This optimizes `process_heap_alloc` introduced in rust-lang#120205. From: ``` .text:0000000180027ED0 ; std::sys::pal::windows::alloc::process_heap_alloc::h703a613b3e25ff93 .text:0000000180027ED0 public _ZN3std3sys3pal7windows5alloc18process_heap_alloc17h703a613b3e25ff93E .text:0000000180027ED0 _ZN3std3sys3pal7windows5alloc18process_heap_alloc17h703a613b3e25ff93E proc near .text:0000000180027ED0 ; CODE XREF: std::sys::pal::common::alloc::realloc_fallback::hc4c96b4c24d03e77+23↑p .text:0000000180027ED0 ; std::sys::pal::common::alloc::realloc_fallback::hc4c96b4c24d03e77+55↑p ... .text:0000000180027ED0 push rsi .text:0000000180027ED1 push rdi .text:0000000180027ED2 sub rsp, 28h .text:0000000180027ED6 mov rsi, rdx .text:0000000180027ED9 mov edi, ecx .text:0000000180027EDB mov rcx, cs:_ZN3std3sys3pal7windows5alloc4HEAP17hb53ca4010cc29b62E ; std::sys::pal::windows::alloc::HEAP::hb53ca4010cc29b62 .text:0000000180027EE2 test rcx, rcx .text:0000000180027EE5 jnz short loc_180027EFC .text:0000000180027EE7 call cs:__imp_GetProcessHeap .text:0000000180027EED test rax, rax .text:0000000180027EF0 jz short loc_180027F0E .text:0000000180027EF2 mov rcx, rax .text:0000000180027EF5 mov cs:_ZN3std3sys3pal7windows5alloc4HEAP17hb53ca4010cc29b62E, rax ; std::sys::pal::windows::alloc::HEAP::hb53ca4010cc29b62 .text:0000000180027EFC .text:0000000180027EFC loc_180027EFC: ; CODE XREF: std::sys::pal::windows::alloc::process_heap_alloc::h703a613b3e25ff93+15↑j .text:0000000180027EFC mov edx, edi .text:0000000180027EFE mov r8, rsi .text:0000000180027F01 add rsp, 28h .text:0000000180027F05 pop rdi .text:0000000180027F06 pop rsi .text:0000000180027F07 jmp cs:__imp_HeapAlloc .text:0000000180027F0E ; --------------------------------------------------------------------------- .text:0000000180027F0E .text:0000000180027F0E loc_180027F0E: ; CODE XREF: std::sys::pal::windows::alloc::process_heap_alloc::h703a613b3e25ff93+20↑j .text:0000000180027F0E xor eax, eax .text:0000000180027F10 add rsp, 28h .text:0000000180027F14 pop rdi .text:0000000180027F15 pop rsi .text:0000000180027F16 retn .text:0000000180027F16 _ZN3std3sys3pal7windows5alloc18process_heap_alloc17h703a613b3e25ff93E endp ``` to ``` .text:0000000180027EE0 ; std::sys::pal::windows::alloc::process_heap_alloc::h70f9d61a631e5c16 .text:0000000180027EE0 public _ZN3std3sys3pal7windows5alloc18process_heap_alloc17h70f9d61a631e5c16E .text:0000000180027EE0 _ZN3std3sys3pal7windows5alloc18process_heap_alloc17h70f9d61a631e5c16E proc near .text:0000000180027EE0 ; CODE XREF: std::sys::pal::common::alloc::realloc_fallback::hc4c96b4c24d03e77+23↑p .text:0000000180027EE0 ; std::sys::pal::common::alloc::realloc_fallback::hc4c96b4c24d03e77+54↑p ... .text:0000000180027EE0 mov rcx, cs:_ZN3std3sys3pal7windows5alloc4HEAP17hb53ca4010cc29b62E ; std::sys::pal::windows::alloc::HEAP::hb53ca4010cc29b62 .text:0000000180027EE7 test rcx, rcx .text:0000000180027EEA jz short loc_180027EF3 .text:0000000180027EEC jmp cs:__imp_HeapAlloc .text:0000000180027EF3 ; --------------------------------------------------------------------------- .text:0000000180027EF3 .text:0000000180027EF3 loc_180027EF3: ; CODE XREF: std::sys::pal::windows::alloc::process_heap_alloc::h70f9d61a631e5c16+A↑j .text:0000000180027EF3 mov ecx, edx .text:0000000180027EF5 mov rdx, r8 .text:0000000180027EF8 jmp std__sys__pal__windows__alloc__process_heap_init_and_alloc .text:0000000180027EF8 _ZN3std3sys3pal7windows5alloc18process_heap_alloc17h70f9d61a631e5c16E endp ``` r? `@ChrisDenton`
Rollup merge of rust-lang#122326 - Zoxc:win-alloc-tweak, r=ChrisDenton Optimize `process_heap_alloc` This optimizes `process_heap_alloc` introduced in rust-lang#120205. From: ``` .text:0000000180027ED0 ; std::sys::pal::windows::alloc::process_heap_alloc::h703a613b3e25ff93 .text:0000000180027ED0 public _ZN3std3sys3pal7windows5alloc18process_heap_alloc17h703a613b3e25ff93E .text:0000000180027ED0 _ZN3std3sys3pal7windows5alloc18process_heap_alloc17h703a613b3e25ff93E proc near .text:0000000180027ED0 ; CODE XREF: std::sys::pal::common::alloc::realloc_fallback::hc4c96b4c24d03e77+23↑p .text:0000000180027ED0 ; std::sys::pal::common::alloc::realloc_fallback::hc4c96b4c24d03e77+55↑p ... .text:0000000180027ED0 push rsi .text:0000000180027ED1 push rdi .text:0000000180027ED2 sub rsp, 28h .text:0000000180027ED6 mov rsi, rdx .text:0000000180027ED9 mov edi, ecx .text:0000000180027EDB mov rcx, cs:_ZN3std3sys3pal7windows5alloc4HEAP17hb53ca4010cc29b62E ; std::sys::pal::windows::alloc::HEAP::hb53ca4010cc29b62 .text:0000000180027EE2 test rcx, rcx .text:0000000180027EE5 jnz short loc_180027EFC .text:0000000180027EE7 call cs:__imp_GetProcessHeap .text:0000000180027EED test rax, rax .text:0000000180027EF0 jz short loc_180027F0E .text:0000000180027EF2 mov rcx, rax .text:0000000180027EF5 mov cs:_ZN3std3sys3pal7windows5alloc4HEAP17hb53ca4010cc29b62E, rax ; std::sys::pal::windows::alloc::HEAP::hb53ca4010cc29b62 .text:0000000180027EFC .text:0000000180027EFC loc_180027EFC: ; CODE XREF: std::sys::pal::windows::alloc::process_heap_alloc::h703a613b3e25ff93+15↑j .text:0000000180027EFC mov edx, edi .text:0000000180027EFE mov r8, rsi .text:0000000180027F01 add rsp, 28h .text:0000000180027F05 pop rdi .text:0000000180027F06 pop rsi .text:0000000180027F07 jmp cs:__imp_HeapAlloc .text:0000000180027F0E ; --------------------------------------------------------------------------- .text:0000000180027F0E .text:0000000180027F0E loc_180027F0E: ; CODE XREF: std::sys::pal::windows::alloc::process_heap_alloc::h703a613b3e25ff93+20↑j .text:0000000180027F0E xor eax, eax .text:0000000180027F10 add rsp, 28h .text:0000000180027F14 pop rdi .text:0000000180027F15 pop rsi .text:0000000180027F16 retn .text:0000000180027F16 _ZN3std3sys3pal7windows5alloc18process_heap_alloc17h703a613b3e25ff93E endp ``` to ``` .text:0000000180027EE0 ; std::sys::pal::windows::alloc::process_heap_alloc::h70f9d61a631e5c16 .text:0000000180027EE0 public _ZN3std3sys3pal7windows5alloc18process_heap_alloc17h70f9d61a631e5c16E .text:0000000180027EE0 _ZN3std3sys3pal7windows5alloc18process_heap_alloc17h70f9d61a631e5c16E proc near .text:0000000180027EE0 ; CODE XREF: std::sys::pal::common::alloc::realloc_fallback::hc4c96b4c24d03e77+23↑p .text:0000000180027EE0 ; std::sys::pal::common::alloc::realloc_fallback::hc4c96b4c24d03e77+54↑p ... .text:0000000180027EE0 mov rcx, cs:_ZN3std3sys3pal7windows5alloc4HEAP17hb53ca4010cc29b62E ; std::sys::pal::windows::alloc::HEAP::hb53ca4010cc29b62 .text:0000000180027EE7 test rcx, rcx .text:0000000180027EEA jz short loc_180027EF3 .text:0000000180027EEC jmp cs:__imp_HeapAlloc .text:0000000180027EF3 ; --------------------------------------------------------------------------- .text:0000000180027EF3 .text:0000000180027EF3 loc_180027EF3: ; CODE XREF: std::sys::pal::windows::alloc::process_heap_alloc::h70f9d61a631e5c16+A↑j .text:0000000180027EF3 mov ecx, edx .text:0000000180027EF5 mov rdx, r8 .text:0000000180027EF8 jmp std__sys__pal__windows__alloc__process_heap_init_and_alloc .text:0000000180027EF8 _ZN3std3sys3pal7windows5alloc18process_heap_alloc17h70f9d61a631e5c16E endp ``` r? `@ChrisDenton`
Optimize `process_heap_alloc` This optimizes `process_heap_alloc` introduced in rust-lang/rust#120205. From: ``` .text:0000000180027ED0 ; std::sys::pal::windows::alloc::process_heap_alloc::h703a613b3e25ff93 .text:0000000180027ED0 public _ZN3std3sys3pal7windows5alloc18process_heap_alloc17h703a613b3e25ff93E .text:0000000180027ED0 _ZN3std3sys3pal7windows5alloc18process_heap_alloc17h703a613b3e25ff93E proc near .text:0000000180027ED0 ; CODE XREF: std::sys::pal::common::alloc::realloc_fallback::hc4c96b4c24d03e77+23↑p .text:0000000180027ED0 ; std::sys::pal::common::alloc::realloc_fallback::hc4c96b4c24d03e77+55↑p ... .text:0000000180027ED0 push rsi .text:0000000180027ED1 push rdi .text:0000000180027ED2 sub rsp, 28h .text:0000000180027ED6 mov rsi, rdx .text:0000000180027ED9 mov edi, ecx .text:0000000180027EDB mov rcx, cs:_ZN3std3sys3pal7windows5alloc4HEAP17hb53ca4010cc29b62E ; std::sys::pal::windows::alloc::HEAP::hb53ca4010cc29b62 .text:0000000180027EE2 test rcx, rcx .text:0000000180027EE5 jnz short loc_180027EFC .text:0000000180027EE7 call cs:__imp_GetProcessHeap .text:0000000180027EED test rax, rax .text:0000000180027EF0 jz short loc_180027F0E .text:0000000180027EF2 mov rcx, rax .text:0000000180027EF5 mov cs:_ZN3std3sys3pal7windows5alloc4HEAP17hb53ca4010cc29b62E, rax ; std::sys::pal::windows::alloc::HEAP::hb53ca4010cc29b62 .text:0000000180027EFC .text:0000000180027EFC loc_180027EFC: ; CODE XREF: std::sys::pal::windows::alloc::process_heap_alloc::h703a613b3e25ff93+15↑j .text:0000000180027EFC mov edx, edi .text:0000000180027EFE mov r8, rsi .text:0000000180027F01 add rsp, 28h .text:0000000180027F05 pop rdi .text:0000000180027F06 pop rsi .text:0000000180027F07 jmp cs:__imp_HeapAlloc .text:0000000180027F0E ; --------------------------------------------------------------------------- .text:0000000180027F0E .text:0000000180027F0E loc_180027F0E: ; CODE XREF: std::sys::pal::windows::alloc::process_heap_alloc::h703a613b3e25ff93+20↑j .text:0000000180027F0E xor eax, eax .text:0000000180027F10 add rsp, 28h .text:0000000180027F14 pop rdi .text:0000000180027F15 pop rsi .text:0000000180027F16 retn .text:0000000180027F16 _ZN3std3sys3pal7windows5alloc18process_heap_alloc17h703a613b3e25ff93E endp ``` to ``` .text:0000000180027EE0 ; std::sys::pal::windows::alloc::process_heap_alloc::h70f9d61a631e5c16 .text:0000000180027EE0 public _ZN3std3sys3pal7windows5alloc18process_heap_alloc17h70f9d61a631e5c16E .text:0000000180027EE0 _ZN3std3sys3pal7windows5alloc18process_heap_alloc17h70f9d61a631e5c16E proc near .text:0000000180027EE0 ; CODE XREF: std::sys::pal::common::alloc::realloc_fallback::hc4c96b4c24d03e77+23↑p .text:0000000180027EE0 ; std::sys::pal::common::alloc::realloc_fallback::hc4c96b4c24d03e77+54↑p ... .text:0000000180027EE0 mov rcx, cs:_ZN3std3sys3pal7windows5alloc4HEAP17hb53ca4010cc29b62E ; std::sys::pal::windows::alloc::HEAP::hb53ca4010cc29b62 .text:0000000180027EE7 test rcx, rcx .text:0000000180027EEA jz short loc_180027EF3 .text:0000000180027EEC jmp cs:__imp_HeapAlloc .text:0000000180027EF3 ; --------------------------------------------------------------------------- .text:0000000180027EF3 .text:0000000180027EF3 loc_180027EF3: ; CODE XREF: std::sys::pal::windows::alloc::process_heap_alloc::h70f9d61a631e5c16+A↑j .text:0000000180027EF3 mov ecx, edx .text:0000000180027EF5 mov rdx, r8 .text:0000000180027EF8 jmp std__sys__pal__windows__alloc__process_heap_init_and_alloc .text:0000000180027EF8 _ZN3std3sys3pal7windows5alloc18process_heap_alloc17h70f9d61a631e5c16E endp ``` r? `@ChrisDenton`
The system allocator for Windows calls
init_or_get_process_heap
every time allocating. It generates very much useless code and makes the binary larger. TheHEAP
only needs to initialize once before the main fn.Concerns:
init
will be properly called in cdylib.no_main
?GetProcessHeap
returns null?