Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reporting illegal instruction exception #107414

Merged
merged 4 commits into from
Sep 21, 2024

Conversation

janvorli
Copy link
Member

@janvorli janvorli commented Sep 5, 2024

The new EH (like NativeAOT) was not recognizing illegal instruction as valid hardware exceptions, so the reported error and stack trace were unusable to determine the error cause.

This change adds proper reporting of this hardware exception.

Close #107145

The new EH (like NativeAOT) was not recognizing illegal instruction as
valid hardware exceptions, so the reported error and stack trace were
unusable to determine the error cause.

This change adds proper reporting of this hardware exception.

Close dotnet#107145
@@ -96,6 +96,11 @@ internal static class RuntimeExceptionHelpers
FailFast("Access Violation: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. The application will be terminated since this platform does not support throwing an AccessViolationException.");
return null;

#pragma warning disable CS0618 // ExecutionEngineException is obsolete
case ExceptionIDs.IllegalInstruction:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we want to create an exception on the NativeAOT or behave like in the case of Access violation and just failfast here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think PlarformNotSupportedException would be the best because it will allow us to have same behavior as CoreCLR for valid C#, but invalid use of hardware intrinsics.

For example:

using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;

Sse41.Blend(Vector128<double>.One, default, default);

I'm not guarding this with Sse41.IsSupported and the generated code has an unguarded blendpd instruction. CoreCLR-JIT would generate a PNSE throw instead of the unguarded blendpd.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether we want these faults to be catchable exceptions. The JIT assumes that these instructions won't throw, and it may take shortcuts when generating GC info for this IP. Catching and handling this fault would produce a GC hole.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In coreclr, it is not catchable - the exception is created just for reporting the failure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether we want these faults to be catchable exceptions. The JIT assumes that these instructions won't throw, and it may take shortcuts when generating GC info for this IP. Catching and handling this fault would produce a GC hole.

👍

I think there's also a desire to differentiate between "the user called this API without checking, the JIT did the right thing" and "something in the JIT went wrong and an invalid instruction was emitted". In some cases it might even be invalid on all hardware, such as from a buggy encoding and not simply some encoding that would be valid on new enough hardware.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In coreclr, it is not catchable - the exception is created just for reporting the failure.

The exceptions should not be catchable in native aot either. I think it should look more like the access violation handling. I have tried the following with this change on Native AOT (my machine does not have AvxVnni support):
dotnet publish /p:PublishAot=true /p:IlcInstructionSet=avx2

using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;

Console.WriteLine(AvxVnni.IsSupported);

try
{
    AvxVnni.MultiplyWideningAndAdd(default(Vector128<int>), default(Vector128<byte>), default(Vector128<sbyte>)).ToString();
}
catch (Exception e)
{
    Console.WriteLine("HANDLED!");
}

It prints:

False
HANDLED!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated it to fail fast on NativeAOT like we do for the access violation.

@@ -96,6 +96,10 @@ internal static class RuntimeExceptionHelpers
FailFast("Access Violation: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. The application will be terminated since this platform does not support throwing an AccessViolationException.");
return null;

case ExceptionIDs.IllegalInstruction:
FailFast("Illegal instruction.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message for AccessViolation is a lot more verbose, to help explain things. Could we do the same here?

AccessViolation is

Access Violation: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. The application will be terminated since this platform does not support throwing an AccessViolationException.

So, perhaps IllegalInstruction could be something like:

Illegal Instruction: Attempted to execute an instruction code not defined by the processor. This is often an indication that a code path using hardware intrinsics was not correctly guarded by the corresponding IsSupported check. The application will be terminated since this platform does not support throwing an exception for such failures.

-- I wonder if there are any other hardware exceptions we're not handling that we should be as well: https://learn.microsoft.com/en-us/cpp/cpp/hardware-exceptions?view=msvc-170

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that the intrinsics are the most likely case of illegal instruction in general so that we would like to skew the message towards those?

I wonder if there are any other hardware exceptions we're not handling that we should be as well

Coreclr considers these exceptions as state corrupting, not allowing them to be handled:

runtime/src/coreclr/vm/excep.cpp

Lines 10253 to 10263 in b7b91cb

case STATUS_ACCESS_VIOLATION:
if (throwable != NULL && CoreLibBinder::IsException(throwable->GetMethodTable(), kNullReferenceException))
return FALSE;
break;
case STATUS_STACK_OVERFLOW:
case EXCEPTION_ILLEGAL_INSTRUCTION:
case EXCEPTION_IN_PAGE_ERROR:
case EXCEPTION_INVALID_DISPOSITION:
case EXCEPTION_NONCONTINUABLE_EXCEPTION:
case EXCEPTION_PRIV_INSTRUCTION:
case STATUS_UNWIND_CONSOLIDATE:

So we could possibly add handling of all of these to NativeAOT too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that the intrinsics are the most likely case of illegal instruction in general so that we would like to skew the message towards those?

That should be the only thing causing such cases, outside of JIT or native compiler bugs where they are generating incorrect code bytes themselves.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad (unsafe) user code can lead to illegal instruction crashes too.

Copy link
Member

@tannergooding tannergooding Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍, I should have said for our own codegen.

Users manually trying to emit code bytes and execute it, such as via function pointers, can certainly cause it as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not even need to emit new code. You can be just doing bad function pointer manipulation or unsafe casts.

@janvorli janvorli merged commit 529804a into dotnet:main Sep 21, 2024
88 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve error message for unexpected hardware exceptions in managed code with new EH
4 participants