Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing the rest of bool's Ord implementation #114721

Merged
merged 1 commit into from
Aug 16, 2023

Conversation

danflapjax
Copy link
Contributor

@danflapjax danflapjax commented Aug 11, 2023

After coming across issue #66780, I realized that the other functions provided by Ord (min, max, and clamp) were similarly inefficient for bool. This change provides implementations for them in terms of boolean operators, resulting in much simpler assembly and faster code.
Fixes issue #114653

Comparison on Godbolt

max assembly before:

example::max:
        mov     eax, edi
        mov     ecx, eax
        neg     cl
        mov     edx, esi
        not     dl
        cmp     dl, cl
        cmove   eax, esi
        ret

max assembly after:

example::max:
        mov     eax, edi
        or      eax, esi
        ret

clamp assembly before:

example::clamp:
        mov     eax, esi
        sub     al, dl
        inc     al
        cmp     al, 2
        jae     .LBB1_1
        mov     eax, edi
        sub     al, sil
        movzx   ecx, dil
        sub     dil, dl
        cmp     dil, 1
        movzx   edx, dl
        cmovne  edx, ecx
        cmp     al, -1
        movzx   eax, sil
        cmovne  eax, edx
        ret
.LBB1_1:
        ; identical assert! code

clamp assembly after:

example::clamp:
        test    edx, edx
        jne     .LBB1_2
        test    sil, sil
        jne     .LBB1_3
.LBB1_2:
        or      dil, sil
        and     dil, dl
        mov     eax, edi
        ret
.LBB1_3:
        ; identical assert! code

@rustbot
Copy link
Collaborator

rustbot commented Aug 11, 2023

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @cuviper (or someone else) soon.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Aug 11, 2023
@danflapjax
Copy link
Contributor Author

danflapjax commented Aug 11, 2023

Haha, I actually found this over two weeks ago and just got around to implementing it, but it looks like somebody else came across it in the meantime too: #114653 #114695

@joboet joboet mentioned this pull request Aug 11, 2023
@cuviper
Copy link
Member

cuviper commented Aug 11, 2023

I'm not sure whether the perf run will exercise this, but let's see:
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 11, 2023
@bors
Copy link
Contributor

bors commented Aug 11, 2023

⌛ Trying commit b75351e with merge 8104587bead96343e80021e6a1277aba1825b518...

@cuviper
Copy link
Member

cuviper commented Aug 11, 2023

example::clamp:
        test    edx, edx
        jne     .LBB1_2
        test    sil, sil
        jne     .LBB1_3
.LBB1_2:
        or      dil, sil
        and     dil, dl
        mov     eax, edi
        ret

FWIW, cg-gcc improves the assertion to a single branch:

example::clamp:
        cmp     dl, sil
        jb      .L6
        or      esi, edi
        mov     eax, esi
        and     eax, edx
        ret
example::clamp.cold:
.L6:
        ; panic code...

@bors
Copy link
Contributor

bors commented Aug 11, 2023

☀️ Try build successful - checks-actions
Build commit: 8104587bead96343e80021e6a1277aba1825b518 (8104587bead96343e80021e6a1277aba1825b518)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8104587bead96343e80021e6a1277aba1825b518): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.0% [2.0%, 2.0%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.4% [-4.4%, -4.4%] 1
All ❌✅ (primary) - - 0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 631.671s -> 631.905s (0.04%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 11, 2023
@scottmcm
Copy link
Member

Ah, right, like #105840 found out, LLVM is really bad at understanding the subtraction-Ord::cmp, so it's really good for full cmp but really bad for downstream optimization of anything else 😞

@cuviper
Copy link
Member

cuviper commented Aug 15, 2023

cg-gcc improves the assertion to a single branch

This difference is also visible in C: https://godbolt.org/z/WcWvx1rYM

Anyway, the Rust codegen is still an improvement, and being neutral on rust-timer is fine, just apparently unused.

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Aug 15, 2023

📌 Commit b75351e has been approved by cuviper

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 15, 2023
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Aug 15, 2023
…r=cuviper

Optimizing the rest of bool's Ord implementation

After coming across issue rust-lang#66780, I realized that the other functions provided by Ord (`min`, `max`, and `clamp`) were similarly inefficient for bool. This change provides implementations for them in terms of boolean operators, resulting in much simpler assembly and faster code.
Fixes issue rust-lang#114653

[Comparison on Godbolt](https://rust.godbolt.org/z/5nb5P8e8j)

`max` assembly before:
```assembly
example::max:
        mov     eax, edi
        mov     ecx, eax
        neg     cl
        mov     edx, esi
        not     dl
        cmp     dl, cl
        cmove   eax, esi
        ret
```
`max` assembly after:
```assembly
example::max:
        mov     eax, edi
        or      eax, esi
        ret
```
`clamp` assembly before:
```assembly
example::clamp:
        mov     eax, esi
        sub     al, dl
        inc     al
        cmp     al, 2
        jae     .LBB1_1
        mov     eax, edi
        sub     al, sil
        movzx   ecx, dil
        sub     dil, dl
        cmp     dil, 1
        movzx   edx, dl
        cmovne  edx, ecx
        cmp     al, -1
        movzx   eax, sil
        cmovne  eax, edx
        ret
.LBB1_1:
        ; identical assert! code
```
`clamp` assembly after:
```assembly
example::clamp:
        test    edx, edx
        jne     .LBB1_2
        test    sil, sil
        jne     .LBB1_3
.LBB1_2:
        or      dil, sil
        and     dil, dl
        mov     eax, edi
        ret
.LBB1_3:
        ; identical assert! code
```
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 16, 2023
…iaskrgr

Rollup of 7 pull requests

Successful merges:

 - rust-lang#114721 (Optimizing the rest of bool's Ord implementation)
 - rust-lang#114746 (Don't add associated type bound for non-types)
 - rust-lang#114779 (Add check before suggest removing parens)
 - rust-lang#114859 (Add trait related queries to SMIR's rustc_internal)
 - rust-lang#114861 (fix typo: affect -> effect)
 - rust-lang#114867 ([nit] Fix a comment typo.)
 - rust-lang#114871 (Update the link in the docs of `std::intrinsics`)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 4b2d87d into rust-lang:master Aug 16, 2023
@rustbot rustbot added this to the 1.73.0 milestone Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants