Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asserts the maximum value that can be returned from Vec::len #131193

Merged
merged 2 commits into from
Dec 22, 2024

Conversation

EFanZh
Copy link
Contributor

@EFanZh EFanZh commented Oct 3, 2024

Currently, casting Vec<i32> to Vec<u32> takes O(1) time:

// See <https://godbolt.org/z/hxq3hnYKG> for assembly output.
pub fn cast(vec: Vec<i32>) -> Vec<u32> {
    vec.into_iter().map(|e| e as _).collect()
}

But the generated assembly is not the same as the identity function, which prevents us from casting Vec<Vec<i32>> to Vec<Vec<u32>> within O(1) time:

// See <https://godbolt.org/z/7n48bxd9f> for assembly output.
pub fn cast(vec: Vec<Vec<i32>>) -> Vec<Vec<u32>> {
    vec.into_iter()
        .map(|e| e.into_iter().map(|e| e as _).collect())
        .collect()
}

This change tries to fix the problem. You can see the comparison here: https://godbolt.org/z/jdManrKvx.

@rustbot
Copy link
Collaborator

rustbot commented Oct 3, 2024

r? @tgross35

rustbot has assigned @tgross35.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 3, 2024
@rust-log-analyzer

This comment has been minimized.

@the8472
Copy link
Member

the8472 commented Oct 3, 2024

Have you figured out which expression fails to optimize? Maybe unchecked arithmetic could provide a more localized fix than adding an assume everywhere lengths are involved.

Well, let's see what the perf impact is, maybe the assume is fine.

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 3, 2024
@the8472 the8472 assigned the8472 and unassigned tgross35 Oct 3, 2024
@bors
Copy link
Contributor

bors commented Oct 3, 2024

⌛ Trying commit 6f0772b with merge 0fb2e37...

bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 3, 2024
Asserts the maximum value that can be returned from `Vec::len`

Currently, casting `Vec<i32>` to `Vec<u32>` takes O(1) time:

```rust
// See <https://godbolt.org/z/hxq3hnYKG> for assembly output.
pub fn cast(vec: Vec<i32>) -> Vec<u32> {
    vec.into_iter().map(|e| e as _).collect()
}
```

But the generated assembly is not the same as the identity function, which prevents us from casting `Vec<Vec<i32>>` to `Vec<Vec<u32>>` within O(1) time:

```rust
// See <https://godbolt.org/z/7n48bxd9f> for assembly output.
pub fn cast(vec: Vec<Vec<i32>>) -> Vec<Vec<u32>> {
    vec.into_iter()
        .map(|e| e.into_iter().map(|e| e as _).collect())
        .collect()
}
```

This change tries to fix the problem. You can see the comparison here: <https://godbolt.org/z/jdManrKvx>.
@tgross35
Copy link
Contributor

tgross35 commented Oct 3, 2024

r? the8472 (oh, you did that already)

Fixing this seems desirable, but putting the check in len seems a bit suboptimal since it is really a type invariant. I wonder if an internal LeSignedMax<T> wrapper type similar to NonZero would be useful in more places than just this.

I think @scottmcm may have looked at optimizing on these invariants at some point.

@rustbot

This comment was marked as outdated.

@EFanZh
Copy link
Contributor Author

EFanZh commented Oct 3, 2024

@the8472:

Have you figured out which expression fails to optimize?

I suspect the multiplication and division here, but not very sure:

inner.cap.unchecked_mul(mem::size_of::<I::Src>()) / mem::size_of::<T>(),

Well, let's see what the perf impact is, maybe the assume is fine.

If performance is a concern, there is also a possible optimization to precompute the length upper bound, maybe something like:

pub trait SizedTypeProperties: Sized {
    const IS_ZST: bool = size_of::<Self>() == 0;
    const LAYOUT: Layout = Layout::new::<Self>();

    const MAX_VEC_LEN: usize =
            if Self::IS_ZST { usize::MAX } else { isize::MAX as usize / size_of::<Self>() };
}

impl Vec<T> {
    pub fn len(&self) -> usize {
        unsafe { core::hint::assert_unchecked(self.len <= T::MAX_VEC_LEN) };

        self.len
    }
}

Also, there still is a remaining CHECK-NOT: call deleted in tests/codegen/vec_pop_push_noop.rs. This one seems to generate different output under different configurations, which I am not able to come up with a solution that keeps the call check and passes all tests.

@the8472
Copy link
Member

the8472 commented Oct 3, 2024

Fixing this seems desirable, but putting the check in len seems a bit suboptimal since it is really a type invariant. I wonder if an internal LeSignedMax<T> wrapper type similar to NonZero would be useful in more places than just this.

It's not, since the maximum is the size in bytes but the len field is tracking the number of entries.

@bors
Copy link
Contributor

bors commented Oct 3, 2024

☀️ Try build successful - checks-actions
Build commit: 0fb2e37 (0fb2e372662301038c04a6eaf7a34fe2d99f631d)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (0fb2e37): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.6% [0.3%, 1.7%] 25
Regressions ❌
(secondary)
1.2% [0.6%, 1.9%] 2
Improvements ✅
(primary)
-0.2% [-0.3%, -0.2%] 5
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 3
All ❌✅ (primary) 0.5% [-0.3%, 1.7%] 30

Max RSS (memory usage)

Results (primary 1.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.3% [1.1%, 12.1%] 10
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.7% [-2.9%, -2.4%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.9% [-2.9%, 12.1%] 13

Cycles

Results (primary 1.4%, secondary 4.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.4% [1.2%, 1.5%] 2
Regressions ❌
(secondary)
4.2% [2.3%, 5.4%] 5
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.4% [1.2%, 1.5%] 2

Binary size

Results (primary 0.1%, secondary 0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 1.9%] 57
Regressions ❌
(secondary)
0.6% [0.0%, 2.4%] 4
Improvements ✅
(primary)
-0.2% [-0.4%, -0.0%] 8
Improvements ✅
(secondary)
-0.1% [-0.2%, -0.0%] 15
All ❌✅ (primary) 0.1% [-0.4%, 1.9%] 65

Bootstrap: 772.929s -> 774.321s (0.18%)
Artifact size: 341.95 MiB -> 341.91 MiB (-0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Oct 3, 2024
@the8472
Copy link
Member

the8472 commented Oct 3, 2024

Well, that's mostly a regression, so a more localized or lightweight fix would be preferable.

I suspect the multiplication and division here, but not very sure

The size is constant (so no need for a zero check) and the mul is unchecked so it already asserts it doesn't overflow. So I suspect it's probably some other arithmetic.

@scottmcm
Copy link
Member

scottmcm commented Oct 3, 2024

If performance is a concern, there is also a possible optimization to precompute the length upper bound

Indeed, I had exactly code for this in

https://github.com/rust-lang/rust/pull/123575/files#diff-02049689ae3bb7ac98af3418ad8fdca219bfa1227cb6cad31afc72bdbae30e27R1239-R1248

You might get less of an impact by doing this change as

let len = self.len;
intrinsics::assume(len <= T::MAX_SLICE_LEN);
len

No branch, no duplicate projecting, and no UbCheck generated from hint::assert_unchecked.


Though really, the good way to do this is to allow range assume operand bundles in LLVM, so that we can get the comparison and call out of the normal instruction stream. LLVM really doesn't like range-assumes.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 5, 2024
@bors
Copy link
Contributor

bors commented Dec 5, 2024

⌛ Testing commit f23b312 with merge 186335e...

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 5, 2024
Asserts the maximum value that can be returned from `Vec::len`

Currently, casting `Vec<i32>` to `Vec<u32>` takes O(1) time:

```rust
// See <https://godbolt.org/z/hxq3hnYKG> for assembly output.
pub fn cast(vec: Vec<i32>) -> Vec<u32> {
    vec.into_iter().map(|e| e as _).collect()
}
```

But the generated assembly is not the same as the identity function, which prevents us from casting `Vec<Vec<i32>>` to `Vec<Vec<u32>>` within O(1) time:

```rust
// See <https://godbolt.org/z/7n48bxd9f> for assembly output.
pub fn cast(vec: Vec<Vec<i32>>) -> Vec<Vec<u32>> {
    vec.into_iter()
        .map(|e| e.into_iter().map(|e| e as _).collect())
        .collect()
}
```

This change tries to fix the problem. You can see the comparison here: <https://godbolt.org/z/jdManrKvx>.
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Dec 5, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Dec 5, 2024
@EFanZh
Copy link
Contributor Author

EFanZh commented Dec 15, 2024

It seems that the codegen result of tests/codegen/vec_pop_push_noop.rs is different with wasm32-unknown-wasip1 target. With i686-unknown-linux-gnu target, the codegen of noop function is:

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read, inaccessiblemem: write) uwtable
define void @noop(ptr noalias nocapture noundef readonly align 4 dereferenceable(12) %v) unnamed_addr #0 personality ptr @rust_eh_personality {
start:
  %0 = getelementptr inbounds i8, ptr %v, i32 8
  %_2.i = load i32, ptr %0, align 4, !alias.scope !2, !noundef !5
  %.not = icmp eq i32 %_2.i, 0
  br i1 %.not, label %bb3, label %"_ZN5alloc3vec16Vec$LT$T$C$A$GT$4push17h7892da45aee13302E.exit"

"_ZN5alloc3vec16Vec$LT$T$C$A$GT$4push17h7892da45aee13302E.exit": ; preds = %start
  %1 = add i32 %_2.i, -1
  %2 = load i32, ptr %v, align 4, !noundef !5
  %cond.i = icmp ult i32 %1, %2
  tail call void @llvm.assume(i1 %cond.i)
  br label %bb3

bb3:                                              ; preds = %start, %"_ZN5alloc3vec16Vec$LT$T$C$A$GT$4push17h7892da45aee13302E.exit"
  ret void
}

With wasm32-unknown-wasip1 target, the codegen is:

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: read, inaccessiblemem: write)
define dso_local void @noop(ptr noalias nocapture noundef readonly align 4 dereferenceable(12) %v) unnamed_addr #0 {
start:
  %0 = getelementptr inbounds i8, ptr %v, i32 8
  %_2.i = load i32, ptr %0, align 4, !alias.scope !1, !noundef !4
  %.not = icmp eq i32 %_2.i, 0
  br i1 %.not, label %bb3, label %bb7

bb3:                                              ; preds = %start, %bb7
  ret void

bb7:                                              ; preds = %start
  %1 = add i32 %_2.i, -1
  %2 = load i32, ptr %v, align 4, !noundef !4
  %cond.i = icmp ult i32 %1, %2
  tail call void @llvm.assume(i1 %cond.i)
  br label %bb3
}

It seems the order of the last two basic blocks is reversed with wasm32-unknown-wasip1 target.

I see two fixes for codegen tests:

  • Remove the last // CHECK: ret line.
  • Replace // CHECK: ret with // CHECK: {{ret|[}]}}.

@EFanZh
Copy link
Contributor Author

EFanZh commented Dec 22, 2024

Hi @the8472, I have fixed the CI failure, would you like to try again?

@the8472
Copy link
Member

the8472 commented Dec 22, 2024

@bors r+

@bors
Copy link
Contributor

bors commented Dec 22, 2024

📌 Commit 7d450bb has been approved by the8472

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 22, 2024
@bors
Copy link
Contributor

bors commented Dec 22, 2024

⌛ Testing commit 7d450bb with merge 303e8bd...

@bors
Copy link
Contributor

bors commented Dec 22, 2024

☀️ Test successful - checks-actions
Approved by: the8472
Pushing 303e8bd to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Dec 22, 2024
@bors bors merged commit 303e8bd into rust-lang:master Dec 22, 2024
7 checks passed
@rustbot rustbot added this to the 1.85.0 milestone Dec 22, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (303e8bd): comparison URL.

Overall result: ❌ regressions - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

  • If the regression was expected or you think it can be justified,
    please write a comment with sufficient written justification, and add
    @rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
  • If you think that you know of a way to resolve the regression, try to create
    a new PR with a fix for the regression.
  • If you do not understand the regression or you think that it is just noise,
    you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
    were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.3% [0.2%, 1.2%] 104
Regressions ❌
(secondary)
0.3% [0.1%, 0.8%] 38
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.3% [0.2%, 1.2%] 104

Max RSS (memory usage)

Results (primary -0.0%, secondary 2.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.4% [2.0%, 4.8%] 2
Regressions ❌
(secondary)
2.2% [2.2%, 2.2%] 1
Improvements ✅
(primary)
-6.9% [-6.9%, -6.9%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.0% [-6.9%, 4.8%] 3

Cycles

Results (primary -0.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.8% [0.7%, 0.8%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-4.1% [-4.1%, -4.1%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.9% [-4.1%, 0.8%] 3

Binary size

Results (primary 0.0%, secondary -0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.5%] 29
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.1% [-0.3%, -0.0%] 12
Improvements ✅
(secondary)
-0.1% [-0.1%, -0.1%] 1
All ❌✅ (primary) 0.0% [-0.3%, 0.5%] 41

Bootstrap: 762.237s -> 765.312s (0.40%)
Artifact size: 330.65 MiB -> 330.59 MiB (-0.02%)

@EFanZh EFanZh deleted the asserts-vec-len branch December 23, 2024 00:26
@Kobzol
Copy link
Contributor

Kobzol commented Dec 23, 2024

The post-merge regression seems to be much larger than the previous runs. Is this regression expected? It seems to be mostly in opt builds, where it was probably expected but also in check builds.

@scottmcm
Copy link
Member

I'm surprised the pre-merge one was so small. The similar things I'd tried in the past, like #122926 (comment) and #123575 (comment) both had results like the post-merge one here.

Interesting that for the opt-full regression on image, the biggest icount one, the wall time came out way better due to permuting the codegen schedule:
image

@Kobzol
Copy link
Contributor

Kobzol commented Dec 24, 2024

Thanks for taking a look. I agree that this is something that is expected to regress (especially opt) performance, while potentially giving better codegen.

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Dec 24, 2024
@scottmcm
Copy link
Member

scottmcm commented Dec 24, 2024

FWIW, I didn't end up merging my previous attempts at these things, so I don't know if the icount change is acceptable here. This kind of "slower to compile but hopefully-faster at runtime" we don't have good data about.

This being entirely red even for check builds -- which wouldn't have the same expected codegen impact -- is less than great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.