Fix potential 'Floating Point Exception' with speculative execution #93517

grospelliergilles · 2023-10-14T14:06:49Z

With the current version of runtime and version 8.0-rc2, there is a 'Floating Point Exception' (signal SIGFPE) when floating point exception are enabled (via feenableexcept()).

It seems to be due to speculative execution of the division in the check.

This patch remove the check and make sure we do not divide by zero.

Alternatively, because the result of this function is only to check if the fragmentation is high enough, we can rewrite the part of the code to remove all divide operation.

ghost · 2023-10-14T14:07:02Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

With the current version of runtime and version 8.0-rc2, there is a 'Floating Point Exception' (signal SIGFPE) when floating point exception are enabled (via feenableexcept()).

It seems to be due to speculative execution of the division in the check.

This patch remove the check and make sure we do not divide by zero.

Alternatively, because the result of this function is only to check if the fragmentation is high enough, we can rewrite the part of the code to remove all divide operation.

Author:	grospelliergilles
Assignees:	-
Labels:	`area-GC-coreclr`, `community-contribution`
Milestone:	-

tannergooding · 2023-10-14T14:43:18Z

Running with floating-point exceptions enabled is explicitly not supported, the behavior that occurs if it is enabled is undefined.

CC. @jkotas

grospelliergilles · 2023-10-14T15:01:53Z

Thanks for your response.
I already had this kind of error in old versions of the runtime (#76257) and this was fixed.
In my application I do not have control on enabling floating point exception. And because it occurs in the garbage collector I can not disable them in this part.
The fix is pretty simple and should not change the results. Do you think it could be merged ?

mangod9 · 2023-11-06T18:14:18Z

@Maoni0 as well.

Maoni0 · 2023-11-06T21:55:29Z

this is a different issue from the fix I made in #76294. if a speculative execution can result in SIGFPE (which seems quite dangerous to me), that means the fix I made would subject to the same problem.

we should decide whether we want to support this speculative execution scenario at all - if we do, we should harden all the places this can happen instead of making one off fixes like this.

I don't have a strong opinion on whether to support it or not, @jkotas, do you?

jkotas · 2023-11-06T22:41:18Z

As I have said in #76257 (comment), .NET runtime and libraries are not compatible with enabled floating point exceptions. If you need floating point exceptions enabled for your C/C++ library, you need to disable around them around the calls to .NET runtime.

I do not think that one-off workarounds like the one proposed in this PR make sense.

grospelliergilles · 2023-11-07T07:24:14Z

Thansk @jkotas for your response. It is difficult to disable FPE around this part of the code because it is called by the garbage collector and I do not know when it will be called. I will try to use GC.RegisterForFullGCNotification() to see if it works.

Alternatively, I think the code may be rewritten without using division because the result is only used in comparison. So instead of checking a>b/c we can rewrite as a*c>b. If you agree I can provide a patch to remove them if you want.

jkotas · 2023-11-07T18:03:26Z

So instead of checking a>b/c we can rewrite as a*c>b

generation_allocator_efficiency is called in number of places. I do not see how you can do this rewrite.

…eration_unusable_fragmentation' to remove floating point usage. - Rename 'generation_allocator_efficiency' to 'generation_allocator_efficiency_percent' because now it returns a number between 0 and 100. - Do not use 'generation_allocator_efficiency' in 'generation_unusable_fragmentation' and do direct calculation.

grospelliergilles · 2023-11-07T21:14:50Z

I have removed floating point usage in the two functions which can throw FPE.
Using only integer division there is no longer potential FPE.

grospelliergilles · 2023-11-07T21:15:48Z

@grospelliergilles please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@dotnet-policy-service agree [company="{your company}"]
Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@dotnet-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@dotnet-policy-service agree company="Microsoft"
Contributor License Agreement
@dotnet-policy-service agree

grospelliergilles · 2023-11-07T21:16:56Z

@dotnet-policy-service agree

src/coreclr/gc/gcpriv.h

…tion.

mangod9 · 2023-12-12T22:13:40Z

@grospelliergilles, there appear to be build breaks on x86. @Maoni0 please review as well.

markples

I feel like there is some readability loss here. Perhaps it can be addressed (see my specific remark on a comment that could help). Also, I think it would help to limit the use of uint64_t to where the large values can occur. generation_allocator_efficiency_percent can have an intermediate large value but returns a percentage. generation_unusable_fragmentation also can have an intermediate large value but returns a size (in fact, this is the cause of the build break because the call site is appropriately expecting a size_t.

However, given this, I wonder if adding something like FLT_EPSILON to the denominator would be a clearer fix. I thought about suggesting that generation_unusable_fragmentation continue to call generation_allocator_efficiency_percent with appropriate scaling, but I see that you were careful to avoid rounding errors due to integer truncation and I don't think we want to try to analyze if percent, per mille, etc., would be sufficient.

markples · 2023-12-12T23:46:01Z

src/coreclr/gc/gcpriv.h

+    uint64_t free_list_space = generation_free_list_space (inst);
+    if (free_obj_space==0)
+      return 0;
+    return (free_obj_space + (free_obj_space * free_list_space) / (free_list_allocated + free_obj_space));


This could use a comment, either that it is based on rewriting of (100 - efficiency)/100 * free_list_space or inefficiency * free_list_space, being careful in both cases to minimize the impact of the integer division.

Thanks for your report. I will make the required change and add the information your want.
If you prefer I revert to the first commit (using FLT_EPSILON) I can also do that.

I would really rather to not have to have the presence of FLT_EPSILON in the GC code at all if I can avoid it :) it's just another thing that someone who's not familiar with FP has to care about.

but I agree with @markples that these should continue to return size_t, instead of uint64_t. I would also add a comment above generation_allocator_efficiency_percent to explain why this is written this way (to accommodate this exception specific to FP which is not a common knowledge AFAIK) instead of the previous way which seems more natural.

Maoni0 · 2023-12-13T08:09:40Z

src/coreclr/gc/gcpriv.h

+    uint64_t free_obj_space = generation_free_obj_space (inst);
+    uint64_t free_list_allocated = generation_free_list_allocated (inst);
+    uint64_t free_list_space = generation_free_list_space (inst);
+    if (free_obj_space==0)


Suggested change

if (free_obj_space==0)

if ((free_list_allocated + free_obj_space) == 0)

I think the idea is that if free_obj_space is zero, then the computation will be zero regardless of the free_list_allocated value. (unless one can be negative...) It's subtle and perhaps not worth the microoptimization or worth a comment. The same patterns occurs in the efficiency method.

Yes if free_obj_space is zero then the result is always zero. The operands are of type size_t and so can not be negative.

yes, I got the idea. but this kind of microoptimization is not worth it at all. I'd much rather have the code be easier to understand.

Thanks for your report. I will do as requested.

Also add comment to explain why we use integer division instead of floating point division.

src/coreclr/gc/gc.cpp

jkotas · 2023-12-21T14:39:04Z

Build break

D:\a\_work\1\s\src\coreclr\gc\gc.cpp(22277): error C2220: the following warning is treated as an error
D:\a\_work\1\s\src\coreclr\gc\gc.cpp(22277): warning C4267: '=': conversion from 'size_t' to 'uint32_t', possible loss of data

…fficiency_percent'.

mangod9 · 2024-01-29T18:16:31Z

@grospelliergilles is this PR ready to review again?

grospelliergilles · 2024-01-29T18:40:32Z

@grospelliergilles is this PR ready to review again?

I have fixed compile errors so I think it is OK.

mangod9 · 2024-01-30T20:36:42Z

@Maoni0 can you please take another look?

Maoni0

LGTM

mangod9 · 2024-01-31T00:35:16Z

thanks @grospelliergilles for your help.

grospelliergilles · 2024-01-31T06:59:07Z

Thank you very much @mangod9 for accepting my PR.

Fix potential 'Floating Point Exception' with speculative execution

89fb671

ghost added the community-contribution Indicates that the PR has been added by a community member label Oct 14, 2023

dotnet-issue-labeler bot added the area-GC-coreclr label Oct 14, 2023

jkotas reviewed Nov 7, 2023

View reviewed changes

src/coreclr/gc/gcpriv.h Outdated Show resolved Hide resolved

This was referenced Nov 8, 2023

Timeout in System.Net.Quic.Functional.Tests #86019

Closed

CI error: System.Net.Quic.QuicException: The connection timed out from inactivity #91757

Closed

Use 'uint64_t' instead of 'size_t' for computing generation fragmenta…

e601d36

…tion.

jkotas requested a review from Maoni0 December 12, 2023 18:05

build-analysis bot mentioned this pull request Dec 12, 2023

[8.0] Failing to build native components #94823

Closed

markples reviewed Dec 12, 2023

View reviewed changes

Maoni0 reviewed Dec 13, 2023

View reviewed changes

Use 'size_t' instead of 'uint64_t' for return type.

fad395d

Also add comment to explain why we use integer division instead of floating point division.

grospelliergilles force-pushed the fix-potental-floating-point-exception branch from 7e164e3 to fad395d Compare December 13, 2023 12:48

build-analysis bot mentioned this pull request Dec 13, 2023

The process cannot access the file '...\NativeAOTFX\.def' because it is being used by another process. #95961

Closed

Improve readability of the test

a2d83a5

jkotas reviewed Dec 14, 2023

View reviewed changes

src/coreclr/gc/gc.cpp Show resolved Hide resolved

Remove cast from 'size_t' to 'int' when using 'generation_allocator_e…

ab750d4

…fficiency_percent'.

grospelliergilles force-pushed the fix-potental-floating-point-exception branch from a2bef7e to ab750d4 Compare December 21, 2023 15:00

This was referenced Dec 21, 2023

Tracking issue for CI build timeouts #76454

Closed

System.Tests.StringTests failing on windows Release NativeAOT_Libraries leg #96279

Closed

jkotas closed this Jan 29, 2024

jkotas reopened this Jan 29, 2024

Maoni0 approved these changes Jan 30, 2024

View reviewed changes

mangod9 merged commit ca46ae5 into dotnet:main Jan 31, 2024
111 checks passed

github-actions bot locked and limited conversation to collaborators Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix potential 'Floating Point Exception' with speculative execution #93517

Fix potential 'Floating Point Exception' with speculative execution #93517

grospelliergilles commented Oct 14, 2023

ghost commented Oct 14, 2023

tannergooding commented Oct 14, 2023

grospelliergilles commented Oct 14, 2023

mangod9 commented Nov 6, 2023

Maoni0 commented Nov 6, 2023

jkotas commented Nov 6, 2023

grospelliergilles commented Nov 7, 2023

jkotas commented Nov 7, 2023

grospelliergilles commented Nov 7, 2023

grospelliergilles commented Nov 7, 2023

grospelliergilles commented Nov 7, 2023

mangod9 commented Dec 12, 2023

markples left a comment

markples Dec 12, 2023

grospelliergilles Dec 13, 2023 •

edited

Loading

Maoni0 Dec 13, 2023

Maoni0 Dec 13, 2023

markples Dec 13, 2023

grospelliergilles Dec 13, 2023

Maoni0 Dec 13, 2023

grospelliergilles Dec 14, 2023

jkotas commented Dec 21, 2023

mangod9 commented Jan 29, 2024

grospelliergilles commented Jan 29, 2024

mangod9 commented Jan 30, 2024

Maoni0 left a comment

mangod9 commented Jan 31, 2024

grospelliergilles commented Jan 31, 2024

	if (free_obj_space==0)
	if ((free_list_allocated + free_obj_space) == 0)

Fix potential 'Floating Point Exception' with speculative execution #93517

Fix potential 'Floating Point Exception' with speculative execution #93517

Conversation

grospelliergilles commented Oct 14, 2023

ghost commented Oct 14, 2023

tannergooding commented Oct 14, 2023

grospelliergilles commented Oct 14, 2023

mangod9 commented Nov 6, 2023

Maoni0 commented Nov 6, 2023

jkotas commented Nov 6, 2023

grospelliergilles commented Nov 7, 2023

jkotas commented Nov 7, 2023

grospelliergilles commented Nov 7, 2023

grospelliergilles commented Nov 7, 2023

grospelliergilles commented Nov 7, 2023

mangod9 commented Dec 12, 2023

markples left a comment

Choose a reason for hiding this comment

markples Dec 12, 2023

Choose a reason for hiding this comment

grospelliergilles Dec 13, 2023 • edited Loading

Choose a reason for hiding this comment

Maoni0 Dec 13, 2023

Choose a reason for hiding this comment

Maoni0 Dec 13, 2023

Choose a reason for hiding this comment

markples Dec 13, 2023

Choose a reason for hiding this comment

grospelliergilles Dec 13, 2023

Choose a reason for hiding this comment

Maoni0 Dec 13, 2023

Choose a reason for hiding this comment

grospelliergilles Dec 14, 2023

Choose a reason for hiding this comment

jkotas commented Dec 21, 2023

mangod9 commented Jan 29, 2024

grospelliergilles commented Jan 29, 2024

mangod9 commented Jan 30, 2024

Maoni0 left a comment

Choose a reason for hiding this comment

mangod9 commented Jan 31, 2024

grospelliergilles commented Jan 31, 2024

grospelliergilles Dec 13, 2023 •

edited

Loading