-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix all regressions relative to v1.1 #116
Conversation
Note that I tried to keep this PR minimal and only apply changes that had a significant impact in the benchmarks. That means, for example, that I didn't copy over the |
Ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work :)
* Make bitmask-with-rejection non-recursive * INLINE some uniformRM implementations
* Make bitmask-with-rejection non-recursive * INLINE some uniformRM implementations
* Make bitmask-with-rejection non-recursive * INLINE some uniformRM implementations
* Make bitmask-with-rejection non-recursive * INLINE some uniformRM implementations
…hortbytestring Improve uniform `ShortByteString`
This combines changes to the bitmask-rejection methods from #103 with a few benchmark-driven
INLINE
pragmas.Context
In
interface-to-performance
right now, there are performance regressions relative to v1.1:This PR: no more regressions relative to v1.1
This PR speeds up the slower generators such that every benchmarked function runs faster than on v1.1 by 1400% or more.
Comparison between v1.1 (reference) and this branch:
Some regressions relative to
interface-to-performance
Strangely, while speeding up many generator functions significantly, this PR introduces some mild regressions relative to
interface-to-performance
.Comparison between
interface-to-performance
(reference) and this branch:As an additional observation, adding one more
INLINE
, e.g. for Int8'suniformRM
, actually makesnextWord32
andnextWord64
slower! My guess is that GHC has some sort of "inlining budget", and that adding anotherINLINE
pragma somewhere leads to less inlining elsewhere. For this reason I've only addedINLINE
pragmas where they improved the benchmark results significantly, and removed them if they led to slower generated code.Conclusion
While I can't fully explain the non-local effects I observed with respect to inlining, this PR does objectively remove all regression we previously had relative to v1.1, so I suggest we merge.