Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potential 'Floating Point Exception' with speculative execution #93517

Merged

Conversation

grospelliergilles
Copy link
Contributor

With the current version of runtime and version 8.0-rc2, there is a 'Floating Point Exception' (signal SIGFPE) when floating point exception are enabled (via feenableexcept()).

It seems to be due to speculative execution of the division in the check.

This patch remove the check and make sure we do not divide by zero.

Alternatively, because the result of this function is only to check if the fragmentation is high enough, we can rewrite the part of the code to remove all divide operation.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Oct 14, 2023
@ghost
Copy link

ghost commented Oct 14, 2023

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

With the current version of runtime and version 8.0-rc2, there is a 'Floating Point Exception' (signal SIGFPE) when floating point exception are enabled (via feenableexcept()).

It seems to be due to speculative execution of the division in the check.

This patch remove the check and make sure we do not divide by zero.

Alternatively, because the result of this function is only to check if the fragmentation is high enough, we can rewrite the part of the code to remove all divide operation.

Author: grospelliergilles
Assignees: -
Labels:

area-GC-coreclr, community-contribution

Milestone: -

@tannergooding
Copy link
Member

Running with floating-point exceptions enabled is explicitly not supported, the behavior that occurs if it is enabled is undefined.

CC. @jkotas

@grospelliergilles
Copy link
Contributor Author

Thanks for your response.
I already had this kind of error in old versions of the runtime (#76257) and this was fixed.
In my application I do not have control on enabling floating point exception. And because it occurs in the garbage collector I can not disable them in this part.
The fix is pretty simple and should not change the results. Do you think it could be merged ?

@mangod9
Copy link
Member

mangod9 commented Nov 6, 2023

@Maoni0 as well.

@Maoni0
Copy link
Member

Maoni0 commented Nov 6, 2023

this is a different issue from the fix I made in #76294. if a speculative execution can result in SIGFPE (which seems quite dangerous to me), that means the fix I made would subject to the same problem.

we should decide whether we want to support this speculative execution scenario at all - if we do, we should harden all the places this can happen instead of making one off fixes like this.

I don't have a strong opinion on whether to support it or not, @jkotas, do you?

@jkotas
Copy link
Member

jkotas commented Nov 6, 2023

As I have said in #76257 (comment), .NET runtime and libraries are not compatible with enabled floating point exceptions. If you need floating point exceptions enabled for your C/C++ library, you need to disable around them around the calls to .NET runtime.

I do not think that one-off workarounds like the one proposed in this PR make sense.

@grospelliergilles
Copy link
Contributor Author

Thansk @jkotas for your response. It is difficult to disable FPE around this part of the code because it is called by the garbage collector and I do not know when it will be called. I will try to use GC.RegisterForFullGCNotification() to see if it works.

Alternatively, I think the code may be rewritten without using division because the result is only used in comparison. So instead of checking a>b/c we can rewrite as a*c>b. If you agree I can provide a patch to remove them if you want.

@jkotas
Copy link
Member

jkotas commented Nov 7, 2023

So instead of checking a>b/c we can rewrite as a*c>b

generation_allocator_efficiency is called in number of places. I do not see how you can do this rewrite.

…eration_unusable_fragmentation' to remove floating point usage.

- Rename 'generation_allocator_efficiency' to
  'generation_allocator_efficiency_percent' because now it returns a
  number between 0 and 100.
- Do not use 'generation_allocator_efficiency' in
  'generation_unusable_fragmentation' and do direct calculation.
@grospelliergilles
Copy link
Contributor Author

I have removed floating point usage in the two functions which can throw FPE.
Using only integer division there is no longer potential FPE.

@grospelliergilles
Copy link
Contributor Author

@grospelliergilles please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@dotnet-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@dotnet-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@dotnet-policy-service agree company="Microsoft"

Contributor License Agreement
@dotnet-policy-service agree

@grospelliergilles
Copy link
Contributor Author

@dotnet-policy-service agree

src/coreclr/gc/gcpriv.h Outdated Show resolved Hide resolved
@mangod9
Copy link
Member

mangod9 commented Dec 12, 2023

@grospelliergilles, there appear to be build breaks on x86. @Maoni0 please review as well.

Copy link
Member

@markples markples left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there is some readability loss here. Perhaps it can be addressed (see my specific remark on a comment that could help). Also, I think it would help to limit the use of uint64_t to where the large values can occur. generation_allocator_efficiency_percent can have an intermediate large value but returns a percentage. generation_unusable_fragmentation also can have an intermediate large value but returns a size (in fact, this is the cause of the build break because the call site is appropriately expecting a size_t.

However, given this, I wonder if adding something like FLT_EPSILON to the denominator would be a clearer fix. I thought about suggesting that generation_unusable_fragmentation continue to call generation_allocator_efficiency_percent with appropriate scaling, but I see that you were careful to avoid rounding errors due to integer truncation and I don't think we want to try to analyze if percent, per mille, etc., would be sufficient.

uint64_t free_list_space = generation_free_list_space (inst);
if (free_obj_space==0)
return 0;
return (free_obj_space + (free_obj_space * free_list_space) / (free_list_allocated + free_obj_space));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a comment, either that it is based on rewriting of (100 - efficiency)/100 * free_list_space or inefficiency * free_list_space, being careful in both cases to minimize the impact of the integer division.

Copy link
Contributor Author

@grospelliergilles grospelliergilles Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your report. I will make the required change and add the information your want.
If you prefer I revert to the first commit (using FLT_EPSILON) I can also do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really rather to not have to have the presence of FLT_EPSILON in the GC code at all if I can avoid it :) it's just another thing that someone who's not familiar with FP has to care about.

but I agree with @markples that these should continue to return size_t, instead of uint64_t. I would also add a comment above generation_allocator_efficiency_percent to explain why this is written this way (to accommodate this exception specific to FP which is not a common knowledge AFAIK) instead of the previous way which seems more natural.

uint64_t free_obj_space = generation_free_obj_space (inst);
uint64_t free_list_allocated = generation_free_list_allocated (inst);
uint64_t free_list_space = generation_free_list_space (inst);
if (free_obj_space==0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (free_obj_space==0)
if ((free_list_allocated + free_obj_space) == 0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea is that if free_obj_space is zero, then the computation will be zero regardless of the free_list_allocated value. (unless one can be negative...) It's subtle and perhaps not worth the microoptimization or worth a comment. The same patterns occurs in the efficiency method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if free_obj_space is zero then the result is always zero. The operands are of type size_t and so can not be negative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I got the idea. but this kind of microoptimization is not worth it at all. I'd much rather have the code be easier to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your report. I will do as requested.

Also add comment to explain why we use integer division instead of
floating point division.
@jkotas
Copy link
Member

jkotas commented Dec 21, 2023

Build break

D:\a\_work\1\s\src\coreclr\gc\gc.cpp(22277): error C2220: the following warning is treated as an error
D:\a\_work\1\s\src\coreclr\gc\gc.cpp(22277): warning C4267: '=': conversion from 'size_t' to 'uint32_t', possible loss of data

@mangod9
Copy link
Member

mangod9 commented Jan 29, 2024

@grospelliergilles is this PR ready to review again?

@grospelliergilles
Copy link
Contributor Author

@grospelliergilles is this PR ready to review again?

I have fixed compile errors so I think it is OK.

@jkotas jkotas closed this Jan 29, 2024
@jkotas jkotas reopened this Jan 29, 2024
@mangod9
Copy link
Member

mangod9 commented Jan 30, 2024

@Maoni0 can you please take another look?

Copy link
Member

@Maoni0 Maoni0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mangod9 mangod9 merged commit ca46ae5 into dotnet:main Jan 31, 2024
111 checks passed
@mangod9
Copy link
Member

mangod9 commented Jan 31, 2024

thanks @grospelliergilles for your help.

@grospelliergilles
Copy link
Contributor Author

Thank you very much @mangod9 for accepting my PR.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-GC-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants