-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix potential 'Floating Point Exception' with speculative execution #93517
Fix potential 'Floating Point Exception' with speculative execution #93517
Conversation
Tagging subscribers to this area: @dotnet/gc Issue DetailsWith the current version of runtime and version It seems to be due to speculative execution of the division in the check. This patch remove the check and make sure we do not divide by zero. Alternatively, because the result of this function is only to check if the fragmentation is high enough, we can rewrite the part of the code to remove all divide operation.
|
Running with floating-point exceptions enabled is explicitly not supported, the behavior that occurs if it is enabled is undefined. CC. @jkotas |
Thanks for your response. |
@Maoni0 as well. |
this is a different issue from the fix I made in #76294. if a speculative execution can result in SIGFPE (which seems quite dangerous to me), that means the fix I made would subject to the same problem. we should decide whether we want to support this speculative execution scenario at all - if we do, we should harden all the places this can happen instead of making one off fixes like this. I don't have a strong opinion on whether to support it or not, @jkotas, do you? |
As I have said in #76257 (comment), .NET runtime and libraries are not compatible with enabled floating point exceptions. If you need floating point exceptions enabled for your C/C++ library, you need to disable around them around the calls to .NET runtime. I do not think that one-off workarounds like the one proposed in this PR make sense. |
Thansk @jkotas for your response. It is difficult to disable FPE around this part of the code because it is called by the garbage collector and I do not know when it will be called. I will try to use Alternatively, I think the code may be rewritten without using division because the result is only used in comparison. So instead of checking |
|
…eration_unusable_fragmentation' to remove floating point usage. - Rename 'generation_allocator_efficiency' to 'generation_allocator_efficiency_percent' because now it returns a number between 0 and 100. - Do not use 'generation_allocator_efficiency' in 'generation_unusable_fragmentation' and do direct calculation.
I have removed floating point usage in the two functions which can throw FPE. |
|
@dotnet-policy-service agree |
@grospelliergilles, there appear to be build breaks on x86. @Maoni0 please review as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there is some readability loss here. Perhaps it can be addressed (see my specific remark on a comment that could help). Also, I think it would help to limit the use of uint64_t
to where the large values can occur. generation_allocator_efficiency_percent
can have an intermediate large value but returns a percentage. generation_unusable_fragmentation
also can have an intermediate large value but returns a size (in fact, this is the cause of the build break because the call site is appropriately expecting a size_t
.
However, given this, I wonder if adding something like FLT_EPSILON to the denominator would be a clearer fix. I thought about suggesting that generation_unusable_fragmentation
continue to call generation_allocator_efficiency_percent
with appropriate scaling, but I see that you were careful to avoid rounding errors due to integer truncation and I don't think we want to try to analyze if percent, per mille, etc., would be sufficient.
src/coreclr/gc/gcpriv.h
Outdated
uint64_t free_list_space = generation_free_list_space (inst); | ||
if (free_obj_space==0) | ||
return 0; | ||
return (free_obj_space + (free_obj_space * free_list_space) / (free_list_allocated + free_obj_space)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use a comment, either that it is based on rewriting of (100 - efficiency)/100 * free_list_space
or inefficiency * free_list_space
, being careful in both cases to minimize the impact of the integer division.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your report. I will make the required change and add the information your want.
If you prefer I revert to the first commit (using FLT_EPSILON) I can also do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really rather to not have to have the presence of FLT_EPSILON in the GC code at all if I can avoid it :) it's just another thing that someone who's not familiar with FP has to care about.
but I agree with @markples that these should continue to return size_t, instead of uint64_t. I would also add a comment above generation_allocator_efficiency_percent
to explain why this is written this way (to accommodate this exception specific to FP which is not a common knowledge AFAIK) instead of the previous way which seems more natural.
src/coreclr/gc/gcpriv.h
Outdated
uint64_t free_obj_space = generation_free_obj_space (inst); | ||
uint64_t free_list_allocated = generation_free_list_allocated (inst); | ||
uint64_t free_list_space = generation_free_list_space (inst); | ||
if (free_obj_space==0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (free_obj_space==0) | |
if ((free_list_allocated + free_obj_space) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea is that if free_obj_space is zero, then the computation will be zero regardless of the free_list_allocated value. (unless one can be negative...) It's subtle and perhaps not worth the microoptimization or worth a comment. The same patterns occurs in the efficiency method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes if free_obj_space
is zero then the result is always zero. The operands are of type size_t
and so can not be negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I got the idea. but this kind of microoptimization is not worth it at all. I'd much rather have the code be easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your report. I will do as requested.
Also add comment to explain why we use integer division instead of floating point division.
7e164e3
to
fad395d
Compare
Build break
|
…fficiency_percent'.
a2bef7e
to
ab750d4
Compare
@grospelliergilles is this PR ready to review again? |
I have fixed compile errors so I think it is OK. |
@Maoni0 can you please take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
thanks @grospelliergilles for your help. |
Thank you very much @mangod9 for accepting my PR. |
With the current version of runtime and version
8.0-rc2
, there is a 'Floating Point Exception' (signalSIGFPE
) when floating point exception are enabled (viafeenableexcept()
).It seems to be due to speculative execution of the division in the check.
This patch remove the check and make sure we do not divide by zero.
Alternatively, because the result of this function is only to check if the fragmentation is high enough, we can rewrite the part of the code to remove all divide operation.