Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement in Count<T> extension #3548

Merged
6 commits merged into from
Nov 21, 2020

Conversation

Sergio0694
Copy link
Member

@Sergio0694 Sergio0694 commented Oct 29, 2020

PR Type

What kind of change does this PR introduce?

  • Performance improvement

What is the new behavior?

About 20% improvement on .NET 5 when working on char types (or larger):

image

This was done by adding an unrolled loop for the vectorized path of the SIMD accelerated version of Count<T>.

PR Checklist

Please check if your PR fulfills the following requirements:

  • Tested code with current supported SDKs
  • Pull Request has been submitted to the documentation repository instructions. Link:
  • Sample in sample app has been added / updated (for bug fixes / features)
  • Tests for the changes have been added (for bug fixes / features) (if applicable)
  • Header has been added to all new source files (run build/UpdateHeaders.bat)
  • Contains NO breaking changes

@Sergio0694 Sergio0694 added DO NOT MERGE ⚠️ high-performance 🚂 Issues/PRs for the Microsoft.Toolkit.HighPerformance package optimization ☄ Performance or memory usage improvements .NET Components which are .NET based (non UWP specific) labels Oct 29, 2020
@Sergio0694 Sergio0694 added this to the 7.0 milestone Oct 29, 2020
@ghost
Copy link

ghost commented Oct 29, 2020

Thanks Sergio0694 for opening a Pull Request! The reviewers will test the PR and highlight if there is any conflict or changes required. If the PR is approved we will proceed to merge the pull request 🙌

@ghost ghost requested review from michael-hawker, azchohfi and Kyaa-dost October 29, 2020 00:09
@Rosuavio
Copy link
Contributor

Rosuavio commented Nov 16, 2020

I like optimization, but could the code be wirten in a more maintainable way so that RyuJit would unroll it for us? There seems to be some ongoing discussion (dotnet/runtime#8107) on expanding loop unrolling optimizations.

@michael-hawker
Copy link
Member

Thanks @RosarioPulella! @Sergio0694 would it make sense to find/file dotnet issues for any hand-optimizations like this you're doing that could maybe be improved in the compiler?

At least then we could mark out the section of code and link to the issue so we could remove it later when it gets optimized in the runtime?

@Sergio0694
Copy link
Member Author

Sergio0694 commented Nov 19, 2020

Oh, whoops, forgot to reply to this, I'm sorry! 😅

@RosarioPulella The discussion is ongoing but as you can see it's still planned for future (not even .NET 6), and furthermore this case is particularly tricky because the JIT would need to unroll a loop that itself is also already using vectorized instructions, which adds another layer of complexity on top for it to handle. I would argue this is worth it for a few reasons:

  • Regardless of when the runtime will get support for this, this is a clean ~20% perf improvement we can have today.
  • It doesn't really add much complexity to the rest of the codebase, or that class in particular.
  • We only need this to trigger for specific T instantiations, so doing this manually gives us better control anyway. As in, even if that was supported I wouldn't want the JIT to just always unroll this, as when T is a sbyte that loop would not help at all (I've profiled that to verify that, that's why I've added that check to exclude that type and only for types >= 2 bytes in size).
  • Since we're targeting multiple platforms, I'd argue it makes sense to include as many optimizations as we can upstream, so that consumers will be able to benefit from them on all platforms. I'm doing the same elsewhere too, so that devs not using .NET 5 will not be penalized (eg. by them not having access to specific JIT intrinsics).

@michael-hawker I've already opened a few issues related to things I'm doing in the HighPerformance package (or rather, things I discovered while working on it), I can definitely start also linking to issues from comments in the future 😊

Copy link
Contributor

@Rosuavio Rosuavio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sergio0694 You make some really good points and I see the value in these changes.

@Sergio0694
Copy link
Member Author

@RosarioPulella Awesome! Glad we all agree - and those were some absolutely valid questions! 😄

@ghost
Copy link

ghost commented Nov 20, 2020

Hello @michael-hawker!

Because this pull request has the auto merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit 82f86e8 into CommunityToolkit:master Nov 21, 2020
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto merge ⚡ high-performance 🚂 Issues/PRs for the Microsoft.Toolkit.HighPerformance package .NET Components which are .NET based (non UWP specific) optimization ☄ Performance or memory usage improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants