Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparison operators #702
Comparison operators #702
Changes from 6 commits
df3df9b
8e0e205
ee102de
df34d0b
637db24
f9a6a06
408c392
24e2c5a
b9ff0ee
e2f020d
219db68
42cc532
34c2f7b
3813ea7
ffe8e55
640a746
2f5391e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://quick-bench.com/q/1_xA8G_jXci_yeOKt0WgCc6eGN4 seems to get to 1.3x and 1.4x. ;]
But it's still a "bad" test -- most users aren't conjuring booleans, they're branching. Most of the extra slowness is materializing the
0
and1
value here. It'll be harder to measure the difference when used in real-world code.But it's still a difference and so it may be something we have to pay attention to here long-term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculation I'm using here is:
((correct operation benchmark time) - (no-op benchmark time)) /
((converting operation benchmark time) - (no-op benchmark time))
Using that calculation, I get 2.0x and 2.5x for the two operations in your revised approach rather than the 1.3x and 1.4x as reported by quick-bench. Even though these are similar to the numbers I had before, I've switched to your quick-bench link because I think your branchless implementation seems preferable in general. I previously reported 2.8x and 1.8x, but I'm now realizing that the 1.8x was cheating because it involved a branch that happened to be always taken, that skipped half of the comparison. And while the 2.8x involved a branch that was taken half the time, that was in predictable batches of 500 occurrences branching each way.
I've also added results for a case where we branch on the result of the computation, where the performance delta does seem smaller -- and the delta disappears entirely for one of the two operations; perhaps we're getting one of the two tests for free somehow. I expect that's still not really representative, but perhaps it's closer to being so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh! Good catch on the noop-baseline.
Probably spent too much time on this, but I've got a re-worked benchmark that I like a bit better:
https://godbolt.org/z/xj3n6PKfb
You can't run it on quick-bench.com because it needs Abseil, but it actually uses a controllable distribution of random values. As pasted it should use non-negative values that fit into an i64.
The result is, at least on my Skylake VM, identical performance for the branchless version and the converting version. The branching version behaves a bit differently. Note that these nearly never find "true" for equality test, and so that's a bit biasing here.
If you shift the dataset a bit to, for example, have negative integers then it skews very differently: the non-converting cases are actually much faster. Why? Because they can short circuit by only looking at one input when that input is negative! But we shouldn't expect this to be representative of anything, that's why I skewed the distribution back to make the negative test essentially always fail -- if that test ever succeeds then the converting case will be slower.
It also seems to show much tighter behavior with floating point as well, but I've not spent much time analyzing that.
Happy to iterate a bit here and get a benchmark that we can actually include in the repository if useful (and it seems like it might be) to ensure we're doing a good job of setting expectations here.
But regardless, this shouldn't block the proposal. Feel free to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, to compile the above locally on Linux, I think the following will work: