Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix target-cpu fpu features on Armv8-R. #130295

Merged
merged 1 commit into from
Sep 15, 2024

Conversation

chrisnc
Copy link
Contributor

@chrisnc chrisnc commented Sep 13, 2024

This is a follow-up to #123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work properly. Now that this change exists in rustc's llvm, we can fix Armv8-R's default fpu features. In Armv8-R's case, the default features from LLVM for floating-point are sufficient, because there is no integer-only variant of this architecture.

@rustbot
Copy link
Collaborator

rustbot commented Sep 13, 2024

r? @petrochenkov

rustbot has assigned @petrochenkov.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 13, 2024
@rustbot
Copy link
Collaborator

rustbot commented Sep 13, 2024

These commits modify compiler targets.
(See the Target Tier Policy.)

@jieyouxu jieyouxu added the relnotes Marks issues that should be documented in the release notes of the next release. label Sep 13, 2024
@jieyouxu jieyouxu added O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. labels Sep 13, 2024
@jieyouxu
Copy link
Member

cc @workingjubilee since you looked at #123159, I'll reroll this PR to you too.

r? @workingjubilee

@jieyouxu
Copy link
Member

And since it might help bisection in the future,
@bors rollup=never

@chrisnc
Copy link
Contributor Author

chrisnc commented Sep 13, 2024

Confirmed that with this change, -C target-cpu=cortex-r52 now allows double-precision and neon instructions to be generated, according to the additional features implied by this CPU in LLVM.

#![no_main]
#![no_std]

#[no_mangle]
pub fn foo(x: f64, y: f64) -> f64 {
    x + y
}

#[no_mangle]
pub fn bar(x: &mut [f32; 4], y: &[f32; 4]) {
    for i in 0..4 {
        x[i] += y[i];
    }
}

use core::panic::PanicInfo;

#[panic_handler]
fn panic(_panic: &PanicInfo<'_>) -> ! {
    loop {}
}
$ build/aarch64-apple-darwin/stage1/bin/rustc --emit=obj --target armv8r-none-eabihf foo.rs -C target-cpu=cortex-r52 -C opt-level=3

polus:rust$ objdump -d foo.o

foo.o:  file format elf32-littlearm

Disassembly of section .text.foo:

00000000 <foo>:
       0: ee300b01      vadd.f64        d0, d0, d1
       4: e12fff1e      bx      lr

Disassembly of section .text.bar:

00000000 <bar>:
       0: f4610a8f      vld1.32 {d16, d17}, [r1]
       4: f4602a8f      vld1.32 {d18, d19}, [r0]
       8: f2400de2      vadd.f32        q8, q8, q9
       c: f4400a8f      vst1.32 {d16, d17}, [r0]
      10: e12fff1e      bx      lr

Disassembly of section .text.rust_begin_unwind:

00000000 <rust_begin_unwind>:
       0: eafffffe      b       0x0 <rust_begin_unwind> @ imm = #-8

Output on the immediate ancestor commit:

$ build/aarch64-apple-darwin/stage1/bin/rustc --emit=obj --target armv8r-none-eabihf foo.rs -C target-cpu=cortex-r52 -C opt-level=3

$ objdump -d foo.o

foo.o:  file format elf32-littlearm

Disassembly of section .text.foo:

00000000 <foo>:
       0: e92d4800      push    {r11, lr}
       4: ec510b10      vmov    r0, r1, d0
       8: ec532b11      vmov    r2, r3, d1
       c: ebfffffe      bl      0xc <foo+0xc>           @ imm = #-8
      10: ec410b10      vmov    d0, r0, r1
      14: e8bd8800      pop     {r11, pc}

Disassembly of section .text.bar:

00000000 <bar>:
       0: ed910a00      vldr    s0, [r1]
       4: ed911a01      vldr    s2, [r1, #4]
       8: ed912a02      vldr    s4, [r1, #8]
       c: ed913a03      vldr    s6, [r1, #12]
      10: ed904a00      vldr    s8, [r0]
      14: ed905a01      vldr    s10, [r0, #4]
      18: ed906a02      vldr    s12, [r0, #8]
      1c: ed907a03      vldr    s14, [r0, #12]
      20: ee300a04      vadd.f32        s0, s0, s8
      24: ee311a05      vadd.f32        s2, s2, s10
      28: ee322a06      vadd.f32        s4, s4, s12
      2c: ee333a07      vadd.f32        s6, s6, s14
      30: ed800a00      vstr    s0, [r0]
      34: ed801a01      vstr    s2, [r0, #4]
      38: ed802a02      vstr    s4, [r0, #8]
      3c: ed803a03      vstr    s6, [r0, #12]
      40: e12fff1e      bx      lr

Disassembly of section .text.rust_begin_unwind:

00000000 <rust_begin_unwind>:
       0: eafffffe      b       0x0 <rust_begin_unwind> @ imm = #-8

@workingjubilee
Copy link
Member

@chrisnc My apologies, for your good deeds will not go unpunished, as now I have to ask:

Can you add that as a test?

@chrisnc
Copy link
Contributor Author

chrisnc commented Sep 14, 2024

Punishment accepted. Could you point me to an example that would be good to follow? The tests I see in tests/assembly seem to not have any operators, and adding the code like above in one of those tests requires turning on a bunch of lang items and I can't seem to get it to work...

Here's roughly where I got, but it doesn't compile because no_core disables everything, and if I try to use core, it won't be available in the test. I assume I'm doing this wrong.

// Test that Neon instructions are emitted when a target-cpu supporting them is specified.

//@ assembly-output: emit-asm
//@ compile-flags: --target armv8r-none-eabihf
//@ compile-flags: -C target-cpu=cortex-r52
//@ compile-flags: -C opt-level=3
//@ needs-llvm-components: arm

#![feature(no_core, lang_items)]
#![no_std]
#![no_core]
#![crate_type = "lib"]

#[lang = "sized"]
trait Sized {}

#[lang = "copy"]
trait Copy {}

// CHECK-LABEL: vadd_q:
#[no_mangle]
pub fn vadd_q(x: &mut [f32; 4], y: &[f32; 4]) {
    // CHECK: vld{{.*}}
    // CHECK: vld{{.*}}
    // CHECK: vadd.f32 {{.*}}
    // CHECK: vst{{.*}}
    // CHECK: bx lr
    let mut i = 0usize;
    while i < 4usize {
        x[i] += y[i];
        i += 1usize;
    }
}

// CHECK-LABEL: vadd_f64:
#[no_mangle]
pub fn vadd_f64(x: f64, y: f64) -> f64 {
    // CHECK: vadd.f64 d0, d0, d1
    // CHECK: bx lr
    x + y
}

output:

error[E0369]: binary operation `<` cannot be applied to type `usize`
  --> /Users/chrisnc/src/rust/tests/assembly/armv8r-target-cpu.rs:29:13
   |
29 |     while i < 4usize {
   |           - ^ ------ usize
   |           |
   |           usize

error[E0608]: cannot index into a value of type `&mut [f32; 4]`
  --> /Users/chrisnc/src/rust/tests/assembly/armv8r-target-cpu.rs:30:10
   |
30 |         x[i] += y[i];
   |          ^^^

error[E0608]: cannot index into a value of type `&[f32; 4]`
  --> /Users/chrisnc/src/rust/tests/assembly/armv8r-target-cpu.rs:30:18
   |
30 |         x[i] += y[i];
   |                  ^^^

error[E0368]: binary assignment operation `+=` cannot be applied to type `usize`
  --> /Users/chrisnc/src/rust/tests/assembly/armv8r-target-cpu.rs:31:9
   |
31 |         i += 1usize;
   |         -^^^^^^^^^^
   |         |
   |         cannot use `+=` on type `usize`

error[E0369]: cannot add `f64` to `f64`
  --> /Users/chrisnc/src/rust/tests/assembly/armv8r-target-cpu.rs:40:7
   |
40 |     x + y
   |     - ^ - f64
   |     |
   |     f64

error: aborting due to 5 previous errors

@workingjubilee
Copy link
Member

workingjubilee commented Sep 14, 2024

The problem is that this situation is cross-compiling, but I don't believe we actually have a satisfactory way to run cross-compiling of assembly tests beyond the no_core case which is impractical for this example, so I think the actually most satisfactory way is unfortunately probably rmake and filecheck.

@rustbot rustbot added the A-run-make Area: port run-make Makefiles to rmake.rs label Sep 14, 2024
@rustbot
Copy link
Collaborator

rustbot commented Sep 14, 2024

This PR modifies tests/run-make/. If this PR is trying to port a Makefile
run-make test to use rmake.rs, please update the
run-make port tracking issue
so we can track our progress. You can either modify the tracking issue
directly, or you can comment on the tracking issue and link this PR.

cc @jieyouxu

@chrisnc
Copy link
Contributor Author

chrisnc commented Sep 14, 2024

Okay, the rmake test is working locally now at least. I'm not sure how to have the same test run for multiple targets, but I've set it up so it will extend to the other target+cpu combinations easily (tested thumbv8m.main-none-eabihf and cortex-m85 also...).

@workingjubilee
Copy link
Member

no need to test every single possible subarchitecture, but yes, this looks nice.

I will see if we can't implement actual cross-compiled assembly testing, but for now, this will do quite well, thank you.

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Sep 14, 2024

📌 Commit d7b2790 has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 14, 2024
@chrisnc
Copy link
Contributor Author

chrisnc commented Sep 14, 2024

Great! Will this actually run in CI? I had to provide a matching target with:
./x test --target=armv8r-none-eabihf tests/run-make/arm-target-cpu-fp for this to run properly, otherwise it would complain about missing core. Is there somewhere I need to specify the invocation needed for this test?

@chrisnc
Copy link
Contributor Author

chrisnc commented Sep 14, 2024

Sorry, needed to amend because I realized two things:
my .checks file had leading "//" which are superfluous.

I re-read my comments on the older PR from a few months ago and was reminded that the plan for Armv8-R is that the feature list should be empty, because Arm defines Armv8-R to always have these floating-point features, and so does LLVM, so we don't need to specify them. I've removed the feature setting and updated the comment and commit message to reflect that.

@workingjubilee
Copy link
Member

@bors r-

@bors
Copy link
Contributor

bors commented Sep 15, 2024

⌛ Trying commit 1df34e4 with merge 1cdb483...

bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 15, 2024
Fix target-cpu fpu features on Armv8-R.

This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work properly. Now that this change exists in rustc's llvm, we can fix Armv8-R's default fpu features. In Armv8-R's case, the default features from LLVM for floating-point are sufficient, because there is no integer-only variant of this architecture.

Add a run-make test that target-cpu=cortex-r52 enables double-precision and neon.

try-job: dist-aarch64-linux
try-job: dist-various-1
try-job: dist-various-2
try-job: dist-x86_64-linux
try-job: dist-x86_64-msvc
@bors
Copy link
Contributor

bors commented Sep 15, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 15, 2024
@rust-log-analyzer

This comment has been minimized.

@chrisnc
Copy link
Contributor Author

chrisnc commented Sep 15, 2024

:(
Is there a way to build core from within the rmake.rs so that this will work...?

And would it be okay to break off the test part to another PR and merge the Armv8-R change as-is?

@workingjubilee
Copy link
Member

And would it be okay to break off the test part to another PR and merge the Armv8-R change as-is?

Yeah let's do that.

@workingjubilee
Copy link
Member

I guess we'll be stuck just straight-up invoking what amounts to -Zbuild-std=core,compiler-builtins for this.

This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work
properly. Now that this change exists in rustc's llvm, we can fix
Armv8-R's default fpu features. In Armv8-R's case, the default features
from LLVM for floating-point are sufficient, because there is no
integer-only variant of this architecture.
@chrisnc
Copy link
Contributor Author

chrisnc commented Sep 15, 2024

Done. Created #130382 for the test.

@workingjubilee
Copy link
Member

coolio.

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Sep 15, 2024

📌 Commit 1a0ba01 has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 15, 2024
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Sep 15, 2024
…ingjubilee

Fix target-cpu fpu features on Armv8-R.

This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work properly. Now that this change exists in rustc's llvm, we can fix Armv8-R's default fpu features. In Armv8-R's case, the default features from LLVM for floating-point are sufficient, because there is no integer-only variant of this architecture.
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Sep 15, 2024
…ingjubilee

Fix target-cpu fpu features on Armv8-R.

This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work properly. Now that this change exists in rustc's llvm, we can fix Armv8-R's default fpu features. In Armv8-R's case, the default features from LLVM for floating-point are sufficient, because there is no integer-only variant of this architecture.
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Sep 15, 2024
…ingjubilee

Fix target-cpu fpu features on Armv8-R.

This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work properly. Now that this change exists in rustc's llvm, we can fix Armv8-R's default fpu features. In Armv8-R's case, the default features from LLVM for floating-point are sufficient, because there is no integer-only variant of this architecture.
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 15, 2024
…kingjubilee

Rollup of 3 pull requests

Successful merges:

 - rust-lang#130295 (Fix target-cpu fpu features on Armv8-R.)
 - rust-lang#130325 (Use -0.0 in `intrinsics::simd::reduce_add_unordered`)
 - rust-lang#130371 (Correctly account for niche-optimized tags in rustc_transmute)

r? `@ghost`
`@rustbot` modify labels: rollup
@jieyouxu
Copy link
Member

I guess we'll be stuck just straight-up invoking what amounts to -Zbuild-std=core,compiler-builtins for this.

These ones are really unfortunate, there's a couple of run-make tests that seem to need to use bootstrap cargo to -Zbuild-std, but those are really slow. Oh well.

bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 15, 2024
…iaskrgr

Rollup of 5 pull requests

Successful merges:

 - rust-lang#129195 (Stabilize `&mut` (and `*mut`) as well as `&Cell` (and `*const Cell`) in const)
 - rust-lang#130118 (move Option::unwrap_unchecked into const_option feature gate)
 - rust-lang#130295 (Fix target-cpu fpu features on Armv8-R.)
 - rust-lang#130371 (Correctly account for niche-optimized tags in rustc_transmute)
 - rust-lang#130381 (library: Compute Rust exception class from its string repr)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 3225bd5 into rust-lang:master Sep 15, 2024
6 checks passed
@rustbot rustbot added this to the 1.83.0 milestone Sep 15, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Sep 15, 2024
Rollup merge of rust-lang#130295 - chrisnc:armv8r-feature-fix, r=workingjubilee

Fix target-cpu fpu features on Armv8-R.

This is a follow-up to rust-lang#123159, but applied to Armv8-R.

This required llvm/llvm-project#88287 to work properly. Now that this change exists in rustc's llvm, we can fix Armv8-R's default fpu features. In Armv8-R's case, the default features from LLVM for floating-point are sufficient, because there is no integer-only variant of this architecture.
@chrisnc chrisnc deleted the armv8r-feature-fix branch September 16, 2024 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-run-make Area: port run-make Makefiles to rmake.rs A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants