-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix reporting illegal instruction exception #107414
Fix reporting illegal instruction exception #107414
Conversation
The new EH (like NativeAOT) was not recognizing illegal instruction as valid hardware exceptions, so the reported error and stack trace were unusable to determine the error cause. This change adds proper reporting of this hardware exception. Close dotnet#107145
@@ -96,6 +96,11 @@ internal static class RuntimeExceptionHelpers | |||
FailFast("Access Violation: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. The application will be terminated since this platform does not support throwing an AccessViolationException."); | |||
return null; | |||
|
|||
#pragma warning disable CS0618 // ExecutionEngineException is obsolete | |||
case ExceptionIDs.IllegalInstruction: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we want to create an exception on the NativeAOT or behave like in the case of Access violation and just failfast here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think PlarformNotSupportedException
would be the best because it will allow us to have same behavior as CoreCLR for valid C#, but invalid use of hardware intrinsics.
For example:
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;
Sse41.Blend(Vector128<double>.One, default, default);
I'm not guarding this with Sse41.IsSupported
and the generated code has an unguarded blendpd
instruction. CoreCLR-JIT would generate a PNSE throw instead of the unguarded blendpd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether we want these faults to be catchable exceptions. The JIT assumes that these instructions won't throw, and it may take shortcuts when generating GC info for this IP. Catching and handling this fault would produce a GC hole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In coreclr, it is not catchable - the exception is created just for reporting the failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether we want these faults to be catchable exceptions. The JIT assumes that these instructions won't throw, and it may take shortcuts when generating GC info for this IP. Catching and handling this fault would produce a GC hole.
👍
I think there's also a desire to differentiate between "the user called this API without checking, the JIT did the right thing" and "something in the JIT went wrong and an invalid instruction was emitted". In some cases it might even be invalid on all hardware, such as from a buggy encoding and not simply some encoding that would be valid on new enough hardware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In coreclr, it is not catchable - the exception is created just for reporting the failure.
The exceptions should not be catchable in native aot either. I think it should look more like the access violation handling. I have tried the following with this change on Native AOT (my machine does not have AvxVnni support):
dotnet publish /p:PublishAot=true /p:IlcInstructionSet=avx2
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;
Console.WriteLine(AvxVnni.IsSupported);
try
{
AvxVnni.MultiplyWideningAndAdd(default(Vector128<int>), default(Vector128<byte>), default(Vector128<sbyte>)).ToString();
}
catch (Exception e)
{
Console.WriteLine("HANDLED!");
}
It prints:
False
HANDLED!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated it to fail fast on NativeAOT like we do for the access violation.
@@ -96,6 +96,10 @@ internal static class RuntimeExceptionHelpers | |||
FailFast("Access Violation: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. The application will be terminated since this platform does not support throwing an AccessViolationException."); | |||
return null; | |||
|
|||
case ExceptionIDs.IllegalInstruction: | |||
FailFast("Illegal instruction."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message for AccessViolation
is a lot more verbose, to help explain things. Could we do the same here?
AccessViolation
is
Access Violation: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. The application will be terminated since this platform does not support throwing an AccessViolationException.
So, perhaps IllegalInstruction
could be something like:
Illegal Instruction: Attempted to execute an instruction code not defined by the processor. This is often an indication that a code path using hardware intrinsics was not correctly guarded by the corresponding IsSupported check. The application will be terminated since this platform does not support throwing an exception for such failures.
-- I wonder if there are any other hardware exceptions we're not handling that we should be as well: https://learn.microsoft.com/en-us/cpp/cpp/hardware-exceptions?view=msvc-170
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that the intrinsics are the most likely case of illegal instruction in general so that we would like to skew the message towards those?
I wonder if there are any other hardware exceptions we're not handling that we should be as well
Coreclr considers these exceptions as state corrupting, not allowing them to be handled:
runtime/src/coreclr/vm/excep.cpp
Lines 10253 to 10263 in b7b91cb
case STATUS_ACCESS_VIOLATION: | |
if (throwable != NULL && CoreLibBinder::IsException(throwable->GetMethodTable(), kNullReferenceException)) | |
return FALSE; | |
break; | |
case STATUS_STACK_OVERFLOW: | |
case EXCEPTION_ILLEGAL_INSTRUCTION: | |
case EXCEPTION_IN_PAGE_ERROR: | |
case EXCEPTION_INVALID_DISPOSITION: | |
case EXCEPTION_NONCONTINUABLE_EXCEPTION: | |
case EXCEPTION_PRIV_INSTRUCTION: | |
case STATUS_UNWIND_CONSOLIDATE: |
So we could possibly add handling of all of these to NativeAOT too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that the intrinsics are the most likely case of illegal instruction in general so that we would like to skew the message towards those?
That should be the only thing causing such cases, outside of JIT or native compiler bugs where they are generating incorrect code bytes themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad (unsafe) user code can lead to illegal instruction crashes too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍, I should have said for our own codegen.
Users manually trying to emit code bytes and execute it, such as via function pointers, can certainly cause it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do not even need to emit new code. You can be just doing bad function pointer manipulation or unsafe casts.
The new EH (like NativeAOT) was not recognizing illegal instruction as valid hardware exceptions, so the reported error and stack trace were unusable to determine the error cause.
This change adds proper reporting of this hardware exception.
Close #107145