-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Micro-optimize the __morestack fast path #3565
Comments
When all does There has also been a bunch of discussion about possibly ditching segmented stacks? |
It's added to every single function, and LLVM does accounting of stack space and growth for us through our |
visiting for triage, email from 2013-09-09 Right now split-stacks are turned off since they are not supported in the newrt. But I imagine most/all of the suggestions above could be applicable in the next implementation, unless we switch to an entirely new strategy (like using guard pages as suggested by thestinger) |
In today's meeting we have decided to jettison segmented stacks. |
We only use |
* Implement Serialize on IgnoreList * Add a test for rust-lang#3536
re-organize libc tests And share some more things across unices
This is very performance critical code used for growing the stack, and it currently wastes a lot of instructions on the non-allocating fast path. There are a number of distinct optimizations we can identify.
Here's what happens after calling into
__morestack
, on the fast pathupcall_new_stack
clobbers them__morestack
custom calling convention registers to the C calling convention registers used byupcall_new_stack
upcall_new_stack
, through the indirection of the dynamic linkerget_sp_limit
, an entire assembly function consisting ofmovq %fs:112, %rax
sp_limit
to 0 and don't branch to therust_get_current_task
slow path. This branch always makes the same decision during a__morestack
call.task
pointer from the stack limittask->stk->next
is a big enough stack segment to usereuse_valgrind_stack
to give valgrind hintsrecord_stack_limit
to execute another single instruction__morestack
And returning from the segment:
upcall_del_stack
through the dynamic linkerget_sp_limit
, an entire function consisting ofmovq %fs:112, %rax
sp_limit
to 0, etc.record_stack_limit
Potential optimizations:
get_sp_limit
,record_stack_limit
(Inline get_sp_limit, set_sp_limit, get_sp runtime functions #2521)upcall_new_stack
andupcall_del_stack
, hitting new dynamically linked upcalls for the slow pathrust_get_current_task
that doesn't have a fallback path for the case when the task pointer can't be retrieved from the stack segment. Use it from upcall_new_stack/del_stack.upcall_new_stack
doesn't use xmm registers and remove the xmm saves and restores in__morestack
Stop saving floating point registers in __morestack #2043upcall_del_stack
into__morestack
The text was updated successfully, but these errors were encountered: