-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GetHashCode code generator should produce code that doesn't overflow in checked mode. #24161
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hashCode = hashCode * -1521134295 + base.GetHashCode(); | ||
hashCode = hashCode * -1521134295 + j.GetHashCode(); | ||
hashCode = hashCode ^ base.GetHashCode(); | ||
hashCode = (hashCode << 5) | (hashCode >> 27); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Did you mean for this to be a rotation? If so, the right shift needs to be unsigned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question. let me ask the original hash people about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Switched to unsigned right shift. Awesome catch!
@@ -1868,6 +1868,10 @@ public SyntaxNode TupleElementExpression(ITypeSymbol type, string name = null) | |||
/// </summary> | |||
public abstract SyntaxNode BitwiseOrExpression(SyntaxNode left, SyntaxNode right); | |||
|
|||
internal abstract SyntaxNode ExclusiveOrExpression(SyntaxNode left, SyntaxNode right); | |||
internal abstract SyntaxNode LeftShiftExpression(SyntaxNode left, SyntaxNode right); | |||
internal abstract SyntaxNode RightShiftExpression(SyntaxNode left, SyntaxNode right); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: i cannot think of any reason why these should not be exposed publicly. They make sense for both C# and VB. @mattwar Were these intentionally not included? If so, why? if not, then i think we should just make public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hate to say it, but this can, again, throw in a checked environment.
In C#, you could use unchecked((int)((uint)hashCode >> 27))
That's right for all using System;
public class C
{
public static void Main()
{
TestWrappedShiftLeft(0, 5);
TestWrappedShiftLeft(int.MaxValue, 5);
TestWrappedShiftLeft(0b0010_0110_0000_0011_1001_1011_1010_1100, 5);
TestWrappedShiftLeft(int.MinValue, 5); //throws OverflowException
}
static void TestWrappedShiftLeft(int number, int shift)
{
void PrintNumberBinary(string message, int numberToPrint)
=> Console.WriteLine($"{message}: { Convert.ToString(numberToPrint, 2).PadLeft(32, '0')}");
PrintNumberBinary("Orginal", number);
PrintNumberBinary("Shifted", checked((number << shift) | (int)((uint)number >> (32 - shift))));
Console.WriteLine();
}
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked conversion from int
to uint
can throw, so this no longer fixes the original problem. It appears there are two separate things going on:
- Trying to make the generated code not throw an exception in
checked
mode - Changing the hash code algorithm
Given that both hash code algorithms require the use of unchecked
to work in all cases, I recommend restricting this pull request to only include the changes on that front. The discussion on which hash code algorithm to use can and should be separate.
20c6ea2
to
34313be
Compare
Ok. I've changed things. Now, we check if we're in a 'checked' context. If so:
-- Again, in the future, we will detect and call to HashCode.Combine, which will solve the problem nicely. for now, this at least mitigates the problem. |
@@ -1247,4 +1247,7 @@ Sub(<parameterList>) <statement></value> | |||
<data name="Use_IsNot_Nothing_check" xml:space="preserve"> | |||
<value>Use 'IsNot Nothing' check</value> | |||
</data> | |||
<data name="Warning_code_may_overflow_at_runtime_unless_removeintchecks_flag_is_supplied_to_compiler" xml:space="preserve"> | |||
<value>Warning: code may overflow ar runtime unless /removeintchecks flag is passed to compiler</value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: overflow ar runtime
You basically say "We are only able to provide an algorithm that fails at non deterministic occasions at runtime until you remove a safety net ( which many developers are not allowed to remove). Good luck with that.". I would say that anything, even the most terrible algorithm, is better than obviously buggy code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that anything, even the most terrible algorithm, is better than obviously buggy code.
As i mentioned in the other thread, i'm happy to take an algorithm that will work for both cases. That said, in the interim perioud, this at least lets the user know about the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. This feature is shipped and needs to be fixed soon. I posted some other ideas in the corresponding issue. Based on those we can make a new PR if we are confident to have a better solution.
Yes. The comment describes the situation as it exists today. |
A simple approach we could take for vb would be to do something like: Return (Me.A, Me.B, Me.C).GetHashCode() |
But this relies on a nuget package being installed on older platforms and the alternatives (anonymous classes, old tuples) allocate. |
Yes. The purpose of IDE features isn't to be perfect, or solve problems for 100% of users in 100% of scenarios. They're to provide a reasonable level of help to a sufficient proportion of users. If the generated code isn't suitable to the user, then they can change things accordingly. The difference here is that currently we are generating code that gives no indication that it can fail at runtime. The comment addresses that. The tuple approach gives a workable solution, if the user uses tuples. And if they don't, then they'll get a clear message that they need to change something. |
In general I like the tuple idea because it checks the right boxes (concise, readily understandable and so on). A minor detail that needs to be considered when this gets implemented, is the limited number of elements (8) so this needs to generated code like this What I don't understand is how this features fallback mechanism should behave as there are so many options and conditions to consider. Options
Some more intressting questions: Are there any plans to change ValueTuple.GetHashCode to use System.HashCode.Combine in the future? Is System.Hashcode distributed as Nuget package so it can be used on older platforms? Conditions
I think it is worth to elaborate this and it this already was started here #17646 and should be discussed there (Especially how many fallback steps there are [this could be up to 4 in theory]). |
I don't think that's the case. The compiler takes care of that for you. You can have a tuple with more than 8 elements, and the compiler will already appropriately create the nested ValueTuple representation for it. |
I think my point is: we really don't have to overthink this that much. We're not the compiler. We're the IDE. It's sufficient to be useful for the common cases, and to also leave it to the user in others. So, for example, if:
It's ok if the user then has to write their own GetHashCode implementation. The bug i'm trying to fix here is that we emit code that can overflow without any indication that that can happen. |
Ok. We fallback to using tuples if they're available to the user. Otherwise, we fallback to adding the comment. |
Also updated PR to use System.HashCode if it is available. |
PR has been updated to generate code that i think works in VB if overflow checks are on. |
I wrote a small application and tested this algorithm. These are my findings:
So from my perspective this seems a reasonable well algorithm. This is what my test suite covered: Take every i1 and combine with every i2 in the range int.MinValue .. int.MaxValue. I recorded the number of collisions for each possible hash code:
I also tracked the combinations that produced the most collisions during the test run (to see whether a certain combinations/hash codes are somehow biased).
This is quite a limited test (It only tested the combination of two hash codes) but better than no test at all. And I also like how the generated code looks like. |
Sorry, not exactly sure how to interpret this part of your post:
Can you clarify what you mean by this? |
I recorded for every hashcode how often it was generated (a collision). The numbers show, that almost all hashcode had 181 collisions. |
How large was your input size? |
I took took about 180 numbers (i1) from different portions of the integer range (see above for the exact ranges). I combined each i1 with all integers (i2). |
And you're saying when you did this, you got around 180 collisions? i.e. you picked a number like 311, and then hashed it against around 2^32 values. And when doing that, you got 180 collisions at the end? |
@CyrusNajmabadi Let me know today if you want this in 15.7 (would require rebase), otherwise I'll merge it for 15.8. 👍 |
What's the eta on 15.8? a few months? or like a year from now? If the former, i can wait. If the latter, i'd be happy to rebase. |
Your guess is as good as mine 😆 |
aa86699
to
b6ae697
Compare
Produce warning for VB code. Fix spelling mistake. Add loc files. Fall back to tuples if they're available when making a hashcode. Use System.HashCode if its available. Use 64bit math when computing VB hashcodes in checked contexts. Use full 32 bits for vb hash. Cleanup
b6ae697
to
8e293c5
Compare
Ok. I rebased to 15.7.x. Thanks! |
@Pilchie for ask mode approval |
@CyrusNajmabadi Why did you request compiler review for this? |
@agocke The request was automatically applied per the policy in CODE_OWNERS. |
@agocke I didn't :) Indeed, i don't think i can actually request reviewers myself. |
@Pilchie for approval |
@sharwell Thanks. I've dropped compilers, since I don't think we're relevant here. |
Approved to merge for 15.7 |
Thanks guys! |
Fixes #24035
Fixes #17646