Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow should abort the process normally, not segfault and dump core #31273

Closed
brson opened this issue Jan 29, 2016 · 7 comments
Closed

Comments

@brson
Copy link
Contributor

brson commented Jan 29, 2016

Today, when you overflow the stack, Rust traps the SIGSEGV and prints an error message, then the program exits with a segfault, exit code 135 on linux, and dumps core.

When we used segstacks for stack overflow the runtime just abort!ed, exiting with an illegal instruction.

That stack overflow protection is implemented by handling a segfault is an implementation detail. Rust should not be exposing it. Instead, when it detects a segfault because of stack overflow it should abort like any other fatal error.

cc https://users.rust-lang.org/t/rust-guarantees-no-segfaults-with-only-safe-code-but-it-segfaults-stack-overflow/4305/20

@jethrogb
Copy link
Contributor

Considering Rust already traps the SIGSEGV, modifying the signal handler to exit at that point should be easy.

@nagisa
Copy link
Member

nagisa commented Jan 29, 2016

SIGSEGV is the default behaviour on stack overflow in C/C++ programs compiled both by gcc and clang.

I’m not sure SIGILL is any better end-result compared to SIGSEGV. They both will dump the core and they both are pretty opaque. Fixing this issue will only serve to make “prevents segfaults” (should be changed to something else regardless of the outcome here) on the front page a little bit true-r.

@lambda
Copy link
Contributor

lambda commented Feb 1, 2016

I think that SIGABRT is fairly reasonable for a stack overflow; it's still treated as a crash, so will dump core or send crash reports on most platforms, but it doesn't cause the confusion that a SIGSEGV would cause by making people think that they need to track down some piece of unsafe code that's chasing a dangling pointer.

I've sent a PR that implements this.

@lambda
Copy link
Contributor

lambda commented Feb 1, 2016

This ticket is related to #30963 (about updating the documentation to mention stack overflow segfaults, which would not be necessary if this issue is fixed) and #26458 (which contains discussion of the most recent change to the current stack overflow behavior).

@nagisa
Copy link
Member

nagisa commented Feb 1, 2016

@lambda there’s no (portable) way to produce SIGABRT in a no-stack condition, I think. SIGSEGV, SIGILL and SIGBUS, I think are an exhaustive set of signals you can make happen without stack.

@lambda
Copy link
Contributor

lambda commented Feb 1, 2016

@nagisa Do you have any reference for that? According to POSIX:

The following table defines a set of functions that shall be either reentrant or non-interruptible by signals and shall be async-signal-safe. Therefore applications may invoke them, without restriction, from signal-catching functions:

  • ...
  • abort()
  • ...
  • raise()

Remember, when we're in our SIGBUS/SIGSEGV handler, we're operating on a small alternative stack, so we actually do have enough stack space to run our signal handler and print out an error message. I believe that should be sufficient for raising SIGABRT as well.

@nagisa
Copy link
Member

nagisa commented Feb 1, 2016

we're operating on a small alternative stack

Yep, I wrote the comment under the assumption we’re restricted to not using any stack, but I had totally forgotten we have signal specific stack for SIGSEGV handler.

lambda added a commit to lambda/rust that referenced this issue Feb 6, 2016
We use guard pages that cause the process to abort to protect against
undefined behavior in the event of stack overflow.  We have a handler
that catches segfaults, prints out an error message if the segfault was
due to a stack overflow, then unregisters itself and returns to allow
the signal to be re-raised and kill the process.

This caused some confusion, as it was unexpected that safe code would be
able to cause a segfault, while it's easy to overflow the stack in safe
code.  To avoid this confusion, when we detect a segfault in the guard
page, abort instead of the previous behavior of re-raising the SIGSEGV.

To test this, we need to adapt the tests for segfault to actually check
the exit status.  Doing so revealed that the existing test for segfault
behavior was actually invalid; LLVM optimizes the explicit null pointer
reference down to an illegal instruction, so the program aborts with
SIGILL instead of SIGSEGV and the test didn't actually trigger the
signal handler at all.  Use a C helper function to get a null pointer
that LLVM can't optimize away, so we get our segfault instead.

This is a [breaking-change] if anyone is relying on the exact signal
raised to kill a process on stack overflow.

Closes rust-lang#31273
bors added a commit that referenced this issue Feb 6, 2016
Abort on stack overflow instead of re-raising SIGSEGV

We use guard pages that cause the process to abort to protect against
undefined behavior in the event of stack overflow.  We have a handler
that catches segfaults, prints out an error message if the segfault was
due to a stack overflow, then unregisters itself and returns to allow
the signal to be re-raised and kill the process.

This caused some confusion, as it was unexpected that safe code would be
able to cause a segfault, while it's easy to overflow the stack in safe
code.  To avoid this confusion, when we detect a segfault in the guard
page, abort instead of the previous behavior of re-raising SIGSEGV.

To test this, we need to adapt the tests for segfault to actually check
the exit status.  Doing so revealed that the existing test for segfault
behavior was actually invalid; LLVM optimizes the explicit null pointer
reference down to an illegal instruction, so the program aborts with
SIGILL instead of SIGSEGV and the test didn't actually trigger the
signal handler at all.  Use a C helper function to get a null pointer
that LLVM can't optimize away, so we get our segfault instead.

This is a [breaking-change] if anyone is relying on the exact signal
raised to kill a process on stack overflow.

Closes #31273
bors added a commit that referenced this issue Feb 6, 2016
Abort on stack overflow instead of re-raising SIGSEGV

We use guard pages that cause the process to abort to protect against
undefined behavior in the event of stack overflow.  We have a handler
that catches segfaults, prints out an error message if the segfault was
due to a stack overflow, then unregisters itself and returns to allow
the signal to be re-raised and kill the process.

This caused some confusion, as it was unexpected that safe code would be
able to cause a segfault, while it's easy to overflow the stack in safe
code.  To avoid this confusion, when we detect a segfault in the guard
page, abort instead of the previous behavior of re-raising SIGSEGV.

To test this, we need to adapt the tests for segfault to actually check
the exit status.  Doing so revealed that the existing test for segfault
behavior was actually invalid; LLVM optimizes the explicit null pointer
reference down to an illegal instruction, so the program aborts with
SIGILL instead of SIGSEGV and the test didn't actually trigger the
signal handler at all.  Use a C helper function to get a null pointer
that LLVM can't optimize away, so we get our segfault instead.

This is a [breaking-change] if anyone is relying on the exact signal
raised to kill a process on stack overflow.

Closes #31273
lambda added a commit to lambda/rust that referenced this issue Feb 6, 2016
intrinsics::abort compiles down to an illegal instruction, which on
Unix-like platforms causes the process to be killed with SIGILL.  A more
appropriate way to kill the process would be SIGABRT; this indicates
better that the runtime has explicitly aborted, rather than some kind of
compiler bug or architecture mismatch that SIGILL might indicate.

For rtassert!, replace this with libc::abort.  libc::abort raises
SIGABRT, but is defined to do so in such a way that it will terminate
the process even if SIGABRT is currently masked or caught by a signal
handler that returns.

On non-Unix platforms, retain the existing behavior.  On Windows we
prefer to avoid depending on the C runtime, and we need a fallback for
any other platforms that may be defined.  An alternative on Windows
would be to call TerminateProcess, but this seems less essential than
switching to using SIGABRT on Unix-like platforms, where it is common
for the process-killing signal to be printed out or logged.

This is a [breaking-change] for any code that depends on the exact
signal raised to abort a process via rtabort!

cc rust-lang#31273
cc rust-lang#31333
lambda added a commit to lambda/rust that referenced this issue May 23, 2016
intrinsics::abort compiles down to an illegal instruction, which on
Unix-like platforms causes the process to be killed with SIGILL.  A more
appropriate way to kill the process would be SIGABRT; this indicates
better that the runtime has explicitly aborted, rather than some kind of
compiler bug or architecture mismatch that SIGILL might indicate.

For rtassert!, replace this with libc::abort.  libc::abort raises
SIGABRT, but is defined to do so in such a way that it will terminate
the process even if SIGABRT is currently masked or caught by a signal
handler that returns.

On non-Unix platforms, retain the existing behavior.  On Windows we
prefer to avoid depending on the C runtime, and we need a fallback for
any other platforms that may be defined.  An alternative on Windows
would be to call TerminateProcess, but this seems less essential than
switching to using SIGABRT on Unix-like platforms, where it is common
for the process-killing signal to be printed out or logged.

This is a [breaking-change] for any code that depends on the exact
signal raised to abort a process via rtabort!

cc rust-lang#31273
cc rust-lang#31333
bors added a commit that referenced this issue May 23, 2016
Use libc::abort, not intrinsics::abort, in rtabort!

intrinsics::abort compiles down to an illegal instruction, which on
Unix-like platforms causes the process to be killed with SIGILL.  A more
appropriate way to kill the process would be SIGABRT; this indicates
better that the runtime has explicitly aborted, rather than some kind of
compiler bug or architecture mismatch that SIGILL might indicate.

For rtassert!, replace this with libc::abort.  libc::abort raises
SIGABRT, but is defined to do so in such a way that it will terminate
the process even if SIGABRT is currently masked or caught by a signal
handler that returns.

On non-Unix platforms, retain the existing behavior.  On Windows we
prefer to avoid depending on the C runtime, and we need a fallback for
any other platforms that may be defined.  An alternative on Windows
would be to call TerminateProcess, but this seems less essential than
switching to using SIGABRT on Unix-like platforms, where it is common
for the process-killing signal to be printed out or logged.

This is a [breaking-change] for any code that depends on the exact
signal raised to abort a process via rtabort!

cc #31273
cc #31333
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants