implement ptr::write without dedicated intrinsic #80290

RalfJung · 2020-12-22T12:08:21Z

This makes ptr::write more consistent with ptr::write_unaligned, ptr::read, ptr::read_unaligned, all of which are implemented in terms of copy_nonoverlapping.

This means we can also remove move_val_init implementations in codegen and Miri, and its special handling in the borrow checker.

Also see this Zulip discussion.

rust-highfive · 2020-12-22T12:08:25Z

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

RalfJung · 2020-12-22T12:09:22Z

Let's see if perf says anything.

@bors try

bors · 2020-12-22T12:09:32Z

⌛ Trying commit e10fd772a10fa762806af973306b1513d40ac1c1 with merge 035e759b99e57ac8055a4d0c71b48e2ceb0beb36...

RalfJung · 2020-12-22T12:11:42Z

@rust-timer queue

rust-timer · 2020-12-22T12:11:44Z

Awaiting bors try build completion.

rust-log-analyzer · 2020-12-22T12:28:04Z

The job x86_64-gnu-llvm-9 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

.................................................................................................... 9000/11196
.................................................................................................... 9100/11196
......................................................................................i.......i..... 9200/11196
.................................................................................................... 9300/11196
.........................iiiiii..iiiiii.i........................................................... 9400/11196
.................................................................................................... 9600/11196
.................................................................................................... 9700/11196
.................................................test [ui] ui/issues/issue-74564-if-expr-stack-overflow.rs has been running for over 60 seconds
................................................... 9800/11196
---
failures:

---- [ui] ui/intrinsics/intrinsic-move-val-cleanups.rs stdout ----

error: test compilation failed although it shouldn't!
status: exit code: 1
command: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/src/test/ui/intrinsics/intrinsic-move-val-cleanups.rs" "-Zthreads=1" "--target=x86_64-unknown-linux-gnu" "--error-format" "json" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-Zemit-future-incompat-report" "-C" "prefer-dynamic" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/intrinsics/intrinsic-move-val-cleanups/a" "-Crpath" "-O" "-Cdebuginfo=0" "-Zunstable-options" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/intrinsics/intrinsic-move-val-cleanups/auxiliary"
------------------------------------------

------------------------------------------
stderr:
stderr:
------------------------------------------
error[E0425]: cannot find function `move_val_init` in module `intrinsics`
   |
   |
LL |         intrinsics::move_val_init({ LogOnDrop(&acq, "drop temp LOD", 2); &mut dest_a },
   |                     ^^^^^^^^^^^^^ not found in `intrinsics`

error[E0425]: cannot find function `move_val_init` in module `intrinsics`
   |
   |
LL |         intrinsics::move_val_init(&mut dest_b, { LogOnDrop(&acq, "drop temp LOD", 3);
   |                     ^^^^^^^^^^^^^ not found in `intrinsics`

error[E0425]: cannot find function `move_val_init` in module `intrinsics`
   |
   |
LL |         intrinsics::move_val_init({ panic!("every test ends in a panic") },
   |                     ^^^^^^^^^^^^^ not found in `intrinsics`

error[E0425]: cannot find function `move_val_init` in module `intrinsics`
   |
   |
LL |         intrinsics::move_val_init({ LogOnDrop(&acq, "drop temp LOD", 2); &mut dest_a },
   |                     ^^^^^^^^^^^^^ not found in `intrinsics`

error[E0425]: cannot find function `move_val_init` in module `intrinsics`
   |
   |
LL |         intrinsics::move_val_init({ LogOnDrop(&acq, "drop temp LOD", 2); &mut dest_a },
   |                     ^^^^^^^^^^^^^ not found in `intrinsics`
error: aborting due to 5 previous errors

For more information about this error, try `rustc --explain E0425`.


------------------------------------------


---- [ui] ui/unsafe/unsafe-move-val-init.rs stdout ----
diff of stderr:

- error[E0133]: dereference of raw pointer is unsafe and requires unsafe function or block
-   --> $DIR/unsafe-move-val-init.rs:8:5
+ error[E0425]: cannot find function `move_val_init` in module `intrinsics`
3    |
3    |
4 LL |     intrinsics::move_val_init(1 as *mut u32, 1);
-    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dereference of raw pointer
-    |
-    = note: raw pointers may be NULL, dangling or unaligned; they can violate aliasing rules and cause data races: all of these are undefined behavior
+    |                 ^^^^^^^^^^^^^ not found in `intrinsics`
9 error: aborting due to previous error
10 

- For more information about this error, try `rustc --explain E0133`.
- For more information about this error, try `rustc --explain E0133`.
+ For more information about this error, try `rustc --explain E0425`.
12 


The actual stderr differed from the expected stderr.
Actual stderr saved to /checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/unsafe/unsafe-move-val-init/unsafe-move-val-init.stderr
To update references, rerun the tests and pass the `--bless` flag
To only update this specific test, also pass `--test-args unsafe/unsafe-move-val-init.rs`
error: 1 errors occurred comparing output.
status: exit code: 1
status: exit code: 1
command: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/src/test/ui/unsafe/unsafe-move-val-init.rs" "-Zthreads=1" "--target=x86_64-unknown-linux-gnu" "--error-format" "json" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-Zemit-future-incompat-report" "--emit" "metadata" "-C" "prefer-dynamic" "--out-dir" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/unsafe/unsafe-move-val-init" "-A" "unused" "-Crpath" "-O" "-Cdebuginfo=0" "-Zunstable-options" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/unsafe/unsafe-move-val-init/auxiliary"
------------------------------------------

------------------------------------------
stderr:
stderr:
------------------------------------------
error[E0425]: cannot find function `move_val_init` in module `intrinsics`
   |
   |
LL |     intrinsics::move_val_init(1 as *mut u32, 1);
   |                 ^^^^^^^^^^^^^ not found in `intrinsics`
error: aborting due to previous error

For more information about this error, try `rustc --explain E0425`.

---

Some tests failed in compiletest suite=ui mode=ui host=x86_64-unknown-linux-gnu target=x86_64-unknown-linux-gnu


command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/ui" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--suite" "ui" "--mode" "ui" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-9/bin/FileCheck" "--nodejs" "/usr/bin/node" "--host-rustcflags" "-Crpath -O -Cdebuginfo=0 -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--target-rustcflags" "-Crpath -O -Cdebuginfo=0 -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "9.0.0" "--llvm-components" "aarch64 aarch64asmparser aarch64codegen aarch64desc aarch64disassembler aarch64info aarch64utils aggressiveinstcombine all all-targets amdgpu amdgpuasmparser amdgpucodegen amdgpudesc amdgpudisassembler amdgpuinfo amdgpuutils analysis arm armasmparser armcodegen armdesc armdisassembler arminfo armutils asmparser asmprinter avr avrasmparser avrcodegen avrdesc avrdisassembler avrinfo binaryformat bitreader bitstreamreader bitwriter bpf bpfasmparser bpfcodegen bpfdesc bpfdisassembler bpfinfo codegen core coroutines coverage debuginfocodeview debuginfodwarf debuginfogsym debuginfomsf debuginfopdb demangle dlltooldriver engine executionengine fuzzmutate globalisel hexagon hexagonasmparser hexagoncodegen hexagondesc hexagondisassembler hexagoninfo instcombine instrumentation interpreter ipo irreader jitlink lanai lanaiasmparser lanaicodegen lanaidesc lanaidisassembler lanaiinfo libdriver lineeditor linker lto mc mca mcdisassembler mcjit mcparser mips mipsasmparser mipscodegen mipsdesc mipsdisassembler mipsinfo mirparser msp430 msp430asmparser msp430codegen msp430desc msp430disassembler msp430info native nativecodegen nvptx nvptxcodegen nvptxdesc nvptxinfo objcarcopts object objectyaml option orcjit passes perfjitevents powerpc powerpcasmparser powerpccodegen powerpcdesc powerpcdisassembler powerpcinfo profiledata remarks riscv riscvasmparser riscvcodegen riscvdesc riscvdisassembler riscvinfo riscvutils runtimedyld scalaropts selectiondag sparc sparcasmparser sparccodegen sparcdesc sparcdisassembler sparcinfo support symbolize systemz systemzasmparser systemzcodegen systemzdesc systemzdisassembler systemzinfo tablegen target textapi transformutils vectorize webassembly webassemblyasmparser webassemblycodegen webassemblydesc webassemblydisassembler webassemblyinfo windowsmanifest x86 x86asmparser x86codegen x86desc x86disassembler x86info x86utils xcore xcorecodegen xcoredesc xcoredisassembler xcoreinfo xray" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"


failed to run: /checkout/obj/build/bootstrap/debug/bootstrap --stage 2 test --exclude src/tools/tidy
Build completed unsuccessfully in 0:17:10

bors · 2020-12-22T12:56:28Z

☀️ Try build successful - checks-actions
Build commit: 035e759b99e57ac8055a4d0c71b48e2ceb0beb36 (035e759b99e57ac8055a4d0c71b48e2ceb0beb36)

rust-timer · 2020-12-22T12:56:30Z

Queued 035e759b99e57ac8055a4d0c71b48e2ceb0beb36 with parent 793931f, future comparison URL.

@rustbot label: +S-waiting-on-perf

rust-timer · 2020-12-22T14:38:19Z

Finished benchmarking try commit (035e759b99e57ac8055a4d0c71b48e2ceb0beb36): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

RalfJung · 2020-12-22T14:45:53Z

Let's see what happens if we only call intrinsics directly in write.

@bors try @rust-timer queue

rust-timer · 2020-12-22T14:45:54Z

Awaiting bors try build completion.

bors · 2020-12-22T14:46:03Z

⌛ Trying commit 9b2ca1ac97baf9624f2c5fb8bff9959e6e91d712 with merge 63c71d4cdc96bd631e4a4da9b77587035aac443b...

rust-log-analyzer · 2020-12-22T15:04:54Z

The job x86_64-gnu-llvm-9 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

.................................................................................................... 9000/11196
.................................................................................................... 9100/11196
......................................................................................i......i...... 9200/11196
.................................................................................................... 9300/11196
.........................iiiiii..iiiiii.i........................................................... 9400/11196
.................................................................................................... 9600/11196
.................................................................................................... 9700/11196
.................................................................................................... 9800/11196
.................................................................................................... 9900/11196
---
failures:

---- [ui] ui/intrinsics/intrinsic-move-val-cleanups.rs stdout ----

error: test compilation failed although it shouldn't!
status: exit code: 1
command: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/src/test/ui/intrinsics/intrinsic-move-val-cleanups.rs" "-Zthreads=1" "--target=x86_64-unknown-linux-gnu" "--error-format" "json" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-Zemit-future-incompat-report" "-C" "prefer-dynamic" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/intrinsics/intrinsic-move-val-cleanups/a" "-Crpath" "-O" "-Cdebuginfo=0" "-Zunstable-options" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/intrinsics/intrinsic-move-val-cleanups/auxiliary"
------------------------------------------

------------------------------------------
stderr:
stderr:
------------------------------------------
error[E0425]: cannot find function `move_val_init` in module `intrinsics`
   |
   |
LL |         intrinsics::move_val_init({ LogOnDrop(&acq, "drop temp LOD", 2); &mut dest_a },
   |                     ^^^^^^^^^^^^^ not found in `intrinsics`

error[E0425]: cannot find function `move_val_init` in module `intrinsics`
   |
   |
LL |         intrinsics::move_val_init(&mut dest_b, { LogOnDrop(&acq, "drop temp LOD", 3);
   |                     ^^^^^^^^^^^^^ not found in `intrinsics`

error[E0425]: cannot find function `move_val_init` in module `intrinsics`
   |
   |
LL |         intrinsics::move_val_init({ panic!("every test ends in a panic") },
   |                     ^^^^^^^^^^^^^ not found in `intrinsics`

error[E0425]: cannot find function `move_val_init` in module `intrinsics`
   |
   |
LL |         intrinsics::move_val_init({ LogOnDrop(&acq, "drop temp LOD", 2); &mut dest_a },
   |                     ^^^^^^^^^^^^^ not found in `intrinsics`

error[E0425]: cannot find function `move_val_init` in module `intrinsics`
   |
   |
LL |         intrinsics::move_val_init({ LogOnDrop(&acq, "drop temp LOD", 2); &mut dest_a },
   |                     ^^^^^^^^^^^^^ not found in `intrinsics`
error: aborting due to 5 previous errors

For more information about this error, try `rustc --explain E0425`.


------------------------------------------


---- [ui] ui/unsafe/unsafe-move-val-init.rs stdout ----
diff of stderr:

- error[E0133]: dereference of raw pointer is unsafe and requires unsafe function or block
-   --> $DIR/unsafe-move-val-init.rs:8:5
+ error[E0425]: cannot find function `move_val_init` in module `intrinsics`
3    |
3    |
4 LL |     intrinsics::move_val_init(1 as *mut u32, 1);
-    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dereference of raw pointer
-    |
-    = note: raw pointers may be NULL, dangling or unaligned; they can violate aliasing rules and cause data races: all of these are undefined behavior
+    |                 ^^^^^^^^^^^^^ not found in `intrinsics`
9 error: aborting due to previous error
10 

- For more information about this error, try `rustc --explain E0133`.
- For more information about this error, try `rustc --explain E0133`.
+ For more information about this error, try `rustc --explain E0425`.
12 


The actual stderr differed from the expected stderr.
Actual stderr saved to /checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/unsafe/unsafe-move-val-init/unsafe-move-val-init.stderr
To update references, rerun the tests and pass the `--bless` flag
To only update this specific test, also pass `--test-args unsafe/unsafe-move-val-init.rs`
error: 1 errors occurred comparing output.
status: exit code: 1
status: exit code: 1
command: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/src/test/ui/unsafe/unsafe-move-val-init.rs" "-Zthreads=1" "--target=x86_64-unknown-linux-gnu" "--error-format" "json" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-Zemit-future-incompat-report" "--emit" "metadata" "-C" "prefer-dynamic" "--out-dir" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/unsafe/unsafe-move-val-init" "-A" "unused" "-Crpath" "-O" "-Cdebuginfo=0" "-Zunstable-options" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/unsafe/unsafe-move-val-init/auxiliary"
------------------------------------------

------------------------------------------
stderr:
stderr:
------------------------------------------
error[E0425]: cannot find function `move_val_init` in module `intrinsics`
   |
   |
LL |     intrinsics::move_val_init(1 as *mut u32, 1);
   |                 ^^^^^^^^^^^^^ not found in `intrinsics`
error: aborting due to previous error

For more information about this error, try `rustc --explain E0425`.

---

Some tests failed in compiletest suite=ui mode=ui host=x86_64-unknown-linux-gnu target=x86_64-unknown-linux-gnu


command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/ui" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--suite" "ui" "--mode" "ui" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-9/bin/FileCheck" "--nodejs" "/usr/bin/node" "--host-rustcflags" "-Crpath -O -Cdebuginfo=0 -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--target-rustcflags" "-Crpath -O -Cdebuginfo=0 -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "9.0.0" "--llvm-components" "aarch64 aarch64asmparser aarch64codegen aarch64desc aarch64disassembler aarch64info aarch64utils aggressiveinstcombine all all-targets amdgpu amdgpuasmparser amdgpucodegen amdgpudesc amdgpudisassembler amdgpuinfo amdgpuutils analysis arm armasmparser armcodegen armdesc armdisassembler arminfo armutils asmparser asmprinter avr avrasmparser avrcodegen avrdesc avrdisassembler avrinfo binaryformat bitreader bitstreamreader bitwriter bpf bpfasmparser bpfcodegen bpfdesc bpfdisassembler bpfinfo codegen core coroutines coverage debuginfocodeview debuginfodwarf debuginfogsym debuginfomsf debuginfopdb demangle dlltooldriver engine executionengine fuzzmutate globalisel hexagon hexagonasmparser hexagoncodegen hexagondesc hexagondisassembler hexagoninfo instcombine instrumentation interpreter ipo irreader jitlink lanai lanaiasmparser lanaicodegen lanaidesc lanaidisassembler lanaiinfo libdriver lineeditor linker lto mc mca mcdisassembler mcjit mcparser mips mipsasmparser mipscodegen mipsdesc mipsdisassembler mipsinfo mirparser msp430 msp430asmparser msp430codegen msp430desc msp430disassembler msp430info native nativecodegen nvptx nvptxcodegen nvptxdesc nvptxinfo objcarcopts object objectyaml option orcjit passes perfjitevents powerpc powerpcasmparser powerpccodegen powerpcdesc powerpcdisassembler powerpcinfo profiledata remarks riscv riscvasmparser riscvcodegen riscvdesc riscvdisassembler riscvinfo riscvutils runtimedyld scalaropts selectiondag sparc sparcasmparser sparccodegen sparcdesc sparcdisassembler sparcinfo support symbolize systemz systemzasmparser systemzcodegen systemzdesc systemzdisassembler systemzinfo tablegen target textapi transformutils vectorize webassembly webassemblyasmparser webassemblycodegen webassemblydesc webassemblydisassembler webassemblyinfo windowsmanifest x86 x86asmparser x86codegen x86desc x86disassembler x86info x86utils xcore xcorecodegen xcoredesc xcoredisassembler xcoreinfo xray" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"


failed to run: /checkout/obj/build/bootstrap/debug/bootstrap --stage 2 test --exclude src/tools/tidy
Build completed unsuccessfully in 0:17:08

bors · 2020-12-22T15:34:30Z

☀️ Try build successful - checks-actions
Build commit: 63c71d4cdc96bd631e4a4da9b77587035aac443b (63c71d4cdc96bd631e4a4da9b77587035aac443b)

rust-timer · 2020-12-22T15:34:31Z

Queued 63c71d4cdc96bd631e4a4da9b77587035aac443b with parent 75e1acb, future comparison URL.

@rustbot label: +S-waiting-on-perf

rust-timer · 2020-12-22T18:01:05Z

Finished benchmarking try commit (63c71d4cdc96bd631e4a4da9b77587035aac443b): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

RalfJung · 2020-12-22T19:46:33Z

Now perf is looking clean (but @bjorn3 remarked that the impact would likely be largest when running debug builds).

bjorn3 · 2020-12-25T16:03:05Z

Perf results for ebobby/simple-raytracer@804a7a2:

Benchmark #1: ./raytracer_75e1acb
  Time (mean ± σ):      8.040 s ±  0.019 s    [User: 8.037 s, System: 0.004 s]
  Range (min … max):    8.013 s …  8.065 s    10 runs
 
Benchmark #2: ./raytracer_63c71d4
  Time (mean ± σ):      8.086 s ±  0.022 s    [User: 8.083 s, System: 0.003 s]
  Range (min … max):    8.061 s …  8.138 s    10 runs
 
Summary
  './raytracer_75e1acb' ran
    1.01 ± 0.00 times faster than './raytracer_63c71d4'

The perf difference is so small as simple-raytracer only spends a tiny bit of time in std::ptr::write. The codegen is noticably worse though.

Before:

       │    000000000007ab40 <core::ptr::write>:
       │    _ZN4core3ptr5write17h8616c78d24e53297E():
 32,01 │      sub    $0x18,%rsp
 19,97 │      mov    %rdi,0x8(%rsp)
 16,00 │      mov    %rsi,0x10(%rsp)
 32,02 │      mov    %rsi,(%rdi)
       │      add    $0x18,%rsp
       │    ← retq

After:

       │    000000000007ac00 <core::ptr::write>:
       │    _ZN4core3ptr5write17h8616c78d24e53297E():
 18,93 │      sub    $0x28,%rsp
  8,13 │      mov    %rsi,(%rsp)
 18,96 │      mov    %rdi,0x10(%rsp)
       │      movb   $0x0,0xf(%rsp)
  5,42 │      movb   $0x1,0xf(%rsp)
 10,72 │      mov    (%rsp),%rax
 29,71 │      mov    %rax,(%rdi)
  8,13 │      movb   $0x0,0xf(%rsp)
       │      add    $0x28,%rsp
       │    ← retq

library/core/src/ptr/mod.rs

RalfJung · 2020-12-26T10:00:49Z

@bjorn3 what kind of build of the raytracer is that (debug/release, which codegen backend)?

copy_nonoverlapping should directly lower to a memcpy. OTOH, looks like move_val_init is already lowered to an assignment during MIR building (but separate from @tmiasko's new pass). Looks like MIR assignments are lowered better than copy_nonoverlapping?

library/core/src/ptr/mod.rs

bjorn3 · 2020-12-26T11:02:15Z

@bjorn3 what kind of build of the raytracer is that (debug/release, which codegen backend)?

Debug build using cg_llvm.

copy_nonoverlapping should directly lower to a memcpy. OTOH, looks like move_val_init is already lowered to an assignment during MIR building (but separate from @tmiasko's new pass). Looks like MIR assignments are lowered better than copy_nonoverlapping?

Much better. copy_nonoverlapping results in several assignments and an extra taken reference in addition to the intrinsic call. It is even considered as capable of unwinding.

#![feature(core_intrinsics)]
pub unsafe fn write<T>(dst: *mut T, src: T) {
    std::intrinsics::move_val_init(&mut *dst, src)
}

fn write(_1: *mut T, _2: T) -> () {
    debug dst => _1;
    debug src => _2;
    let mut _0: ();
    let mut _3: *mut T;
    let mut _4: &mut T;

    bb0: {
        StorageLive(_4);
        _4 = &mut (*_1);
        _3 = &raw mut (*_4);
        (*_3) = move _2;
        StorageDead(_4);
        return;
    }
}

#![feature(core_intrinsics)]
pub unsafe fn write<T>(dst: *mut T, src: T) {
    std::intrinsics::copy_nonoverlapping(&src as *const T, dst, 1);
    std::intrinsics::forget(src);
}

fn write(_1: *mut T, _2: T) -> () {
    debug dst => _1;
    debug src => _2;
    let mut _0: ();
    let _3: ();
    let mut _4: *const T;
    let _5: &T;
    let mut _6: *mut T;
    let mut _7: bool;

    bb0: {
        _7 = const false;
        _7 = const true;
        StorageLive(_3);
        StorageLive(_4);
        StorageLive(_5);
        _5 = &_2;
        _4 = &raw const (*_5);
        StorageLive(_6);
        _6 = _1;
        _3 = copy_nonoverlapping::<T>(move _4, move _6, const 1_usize) -> [return: bb1, unwind: bb4];

    bb1: {
        StorageDead(_6);
        StorageDead(_4);
        StorageDead(_5);
        StorageDead(_3);
        _7 = const false;
        _0 = const ();
        return;
    }

    bb2 (cleanup): {
        resume;
    }

    bb3 (cleanup): {
        drop(_2) -> bb2;
    }

    bb4 (cleanup): {
        switchInt(_7) -> [false: bb2, otherwise: bb3];
    }
}

bors · 2021-01-16T17:28:37Z

⌛ Testing commit a5b89a0 with merge 492b83c...

bors · 2021-01-16T20:26:13Z

☀️ Test successful - checks-actions
Approved by: lcnr
Pushing 492b83c to master...

therealprof · 2021-01-18T10:03:39Z

There's a noticeable binary size regression between rustc 1.51.0-nightly (e38fb306b 2021-01-14) and rustc 1.51.0-nightly (4253153db 2021-01-17) on thumbv6m-none-eabi in dev mode. However looking at the assembly I have troubles accounting for all of it:

Section size:
   text    data     bss     dec     hex filename
   9820       0       4    9824    2660 blinky

vs

Section size:
   text    data     bss     dec     hex filename
   9884       0       4    9888    26a0 blinky

The only observable difference I can spot is:

 08000936 <core::ptr::write>:
- 8000936:      b083            sub     sp, #12
- 8000938:      9001            str     r0, [sp, #4]
- 800093a:      9102            str     r1, [sp, #8]
- 800093c:      6001            str     r1, [r0, #0]
- 800093e:      b003            add     sp, #12
- 8000940:      4770            bx      lr
+ 8000936:      b082            sub     sp, #8
+ 8000938:      9100            str     r1, [sp, #0]
+ 800093a:      9001            str     r0, [sp, #4]
+ 800093c:      9900            ldr     r1, [sp, #0]
+ 800093e:      6001            str     r1, [r0, #0]
+ 8000940:      b002            add     sp, #8
+ 8000942:      4770            bx      lr

RalfJung · 2021-01-18T10:15:57Z

@therealprof so is the binary of rustc itself bigger, or is some rustc-generated binary bigger? Is this a debug build or a release build?

https://github.com/kennytm/rustup-toolchain-install-master could be used to confirm that it is this PR vs some other PR that landed that day.

therealprof · 2021-01-18T10:27:24Z

@therealprof so is the binary of rustc itself bigger, or is some rustc-generated binary bigger? Is this a debug build or a release build?

Generated binaries in dev (or debug mode if you prefer) are larger.

https://github.com/kennytm/rustup-toolchain-install-master could be used to confirm that it is this PR vs some other PR that landed that day.

Sorry, I don't have time to bisect this.

I was just browsing the recently merged PRs and the mention of regressions piqued my interest so I decided to run my tools (https://github.com/stm32-rs/stm32f0xx-hal/blob/master/tools/capture_nightly_example_bloat.sh) on the latest nightly and sure enough I can see regressions happening between rustc 1.51.0-nightly (e38fb306b 2021-01-14) and rustc 1.51.0-nightly (4253153db 2021-01-17).

RalfJung · 2021-01-18T10:44:51Z

Some regression for debug builds was expected with this PR. Depending on how much this matters, I sketched an idea for how to mitigate this:

The debug performance regression could likely be fixed by adding a post-drop-elaboration MIR optimization pass which replaces copy_nonoverlapping(src, dst, 1) by *dst = *src.

therealprof · 2021-01-18T10:59:55Z

Well, debug mode in Rust is atrocious in every regard compared to other languages used for embedded development. The binaries are huge (often too big to fit into the flash of smaller microcontrollers) and slow (some peripherals like USB can't even be used since the generated code is some order of magnitudes too slow to react to USB events).

Every little improvement helps! I'd be happy to test and benchmark any changes once they've landed in nightly. I've collected a nice dataset over different versions in the current format (each stable release back to 1.41 and a quite a few more nightlies in between).

RalfJung · 2021-01-18T11:32:39Z

The reason this PR was deemed acceptable is that the issue should only affect debug builds. Anybody who cares about such issues would use release (or size-optimized) builds, we figured, and those should not be affected.

Why does the size of debug builds matter so much to you?

(Btw, discussions in a closed PR are bound to get lost, so if you think something should be done here or if you think there's something worth tracking, please open an issue.)

Every little improvement helps! I'd be happy to test and benchmark any changes once they've landed in nightly.

I won't have time to work on this, but maybe someone from @rust-lang/wg-mir-opt would be interested. Cc @tmiasko who recently added a closely related intrinsic lowering MIR pass.

therealprof · 2021-01-18T12:49:00Z

The reason this PR was deemed acceptable is that the issue should only affect debug builds. Anybody who cares about such issues would use release (or size-optimized) builds, we figured, and those should not be affected.

I can confirm release builds are not affected.

Why does the size of debug builds matter so much to you?

Well, only debug builds are really debuggable (in embedded context) for starters and debugging with a debugger has some extra relevance due to limited interaction capabilities compared with a regular application. Release builds are also built with all optimisation features Rust has to offer which really make them really slow to compile. And then there's the usual trap for young players that the default build mode is dev mode, so starters are frequently ending up in debug mode; best case they'll get a linker error telling them about their mistake...

(Btw, discussions in a closed PR are bound to get lost, so if you think something should be done here or if you think there's something worth tracking, please open an issue.)

Sure thing.

rylev · 2021-01-20T12:13:34Z

@RalfJung @lcnr There seems to be a small compile time perf regression after all in regex-debug full builds. Looks like LLVM_module_codegen_emit_obj regressed.Thoughts on if addressing this is worth it?

oli-obk · 2021-01-20T12:26:20Z

Considering that LLVM_module_optimize_module_passes now takes half the time, I think LLVM now fails to optimize something that then causes more llvm object code to be emitted. This could also have a runtime impact due to missing some optimizations.

Now... this is only for regex-debug, and we generally don't put too much effort into making debug builds be efficient, so the runtime perf loss is acceptable, but it is still not great that we lost some compile-time. I guess regex uses ptr::write a lot (or at least "for a lot of different T").

I don't think it is possible to regain this perf directly. We may get it back via always-run MIR optimizations that turn this ptr::write into basic MIR statements.

therealprof · 2021-01-20T12:35:40Z

cf. #81163 for ongoing discussion

Now... this is only for regex-debug, and we generally don't put too much effort into making debug builds be efficient, so the runtime perf loss is acceptable,

I very much disagree with this statement. The performance and binary size of debug builds are a huge problem for embedded Rust already; regressions are definitely not acceptable for us.

RalfJung · 2021-01-20T12:47:16Z

Also see #80290 (comment) for an approach that should regain perf. write wasn't itself an intrinsic before, so I don't think it is necessary to make it an intrinsic now.

rylev · 2021-01-20T13:52:11Z

Should we create an issue for #80290 (comment) to make sure it's not lost?

RalfJung · 2021-01-20T14:17:13Z

Yeah if people care enough, there should be a version of #81163 for ptr::write.

directly expose copy and copy_nonoverlapping intrinsics This effectively un-does rust-lang#57997. That should help with `ptr::read` codegen in debug builds (and any other of these low-level functions that bottoms out at `copy`/`copy_nonoverlapping`), where the wrapper function will not get inlined. See the discussion in rust-lang#80290 and rust-lang#81163. Cc `@bjorn3` `@therealprof`

directly expose copy and copy_nonoverlapping intrinsics This effectively un-does rust-lang/rust#57997. That should help with `ptr::read` codegen in debug builds (and any other of these low-level functions that bottoms out at `copy`/`copy_nonoverlapping`), where the wrapper function will not get inlined. See the discussion in rust-lang/rust#80290 and rust-lang/rust#81163. Cc `@bjorn3` `@therealprof`

The test mentioned by this comment was deleted long ago by <rust-lang#80290>.

rust-highfive assigned m-ou-se Dec 22, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 22, 2020

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 22, 2020

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 22, 2020

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 22, 2020

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 22, 2020

kpp reviewed Dec 25, 2020

View reviewed changes

library/core/src/ptr/mod.rs Show resolved Hide resolved

newpavlov reviewed Dec 26, 2020

View reviewed changes

library/core/src/ptr/mod.rs Show resolved Hide resolved

bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 16, 2021

bors merged commit 492b83c into rust-lang:master Jan 16, 2021

rustbot added this to the 1.51.0 milestone Jan 16, 2021

RalfJung deleted the less-intrinsic-write branch January 18, 2021 10:15

This was referenced Jan 18, 2021

Improve ptr::read code in debug builds #81163

Closed

Tracking Issue for const_ptr_read #80377

Closed

usbalbin mentioned this pull request Jan 18, 2021

Make ptr::write const #81167

Merged

RalfJung mentioned this pull request Jan 21, 2021

directly expose copy and copy_nonoverlapping intrinsics #81238

Merged

nbdd0121 mentioned this pull request Apr 24, 2023

Moving #[rustc_box] to move_val_init intrinsic #110715

Closed

est31 mentioned this pull request Apr 30, 2023

Make mem::replace simpler in codegen #111010

Merged

Zalathar added a commit to Zalathar/rust that referenced this pull request Jul 25, 2024

Remove an obsolete comment

31f31aa

The test mentioned by this comment was deleted long ago by <rust-lang#80290>.

implement ptr::write without dedicated intrinsic #80290

implement ptr::write without dedicated intrinsic #80290

Uh oh!

Conversation

RalfJung commented Dec 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented Dec 22, 2020

Uh oh!

RalfJung commented Dec 22, 2020

Uh oh!

bors commented Dec 22, 2020

Uh oh!

RalfJung commented Dec 22, 2020

Uh oh!

rust-timer commented Dec 22, 2020

Uh oh!

rust-log-analyzer commented Dec 22, 2020

Uh oh!

bors commented Dec 22, 2020

Uh oh!

rust-timer commented Dec 22, 2020

Uh oh!

rust-timer commented Dec 22, 2020

Uh oh!

RalfJung commented Dec 22, 2020

Uh oh!

rust-timer commented Dec 22, 2020

Uh oh!

bors commented Dec 22, 2020

Uh oh!

rust-log-analyzer commented Dec 22, 2020

Uh oh!

bors commented Dec 22, 2020

Uh oh!

rust-timer commented Dec 22, 2020

Uh oh!

rust-timer commented Dec 22, 2020

Uh oh!

RalfJung commented Dec 22, 2020

Uh oh!

bjorn3 commented Dec 25, 2020

Uh oh!

Uh oh!

RalfJung commented Dec 26, 2020

Uh oh!

Uh oh!

bjorn3 commented Dec 26, 2020

Uh oh!

bors commented Jan 16, 2021

Uh oh!

bors commented Jan 16, 2021

Uh oh!

therealprof commented Jan 18, 2021

Uh oh!

RalfJung commented Jan 18, 2021

Uh oh!

therealprof commented Jan 18, 2021

Uh oh!

RalfJung commented Jan 18, 2021

Uh oh!

therealprof commented Jan 18, 2021

Uh oh!

RalfJung commented Jan 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

therealprof commented Jan 18, 2021

Uh oh!

rylev commented Jan 20, 2021

Uh oh!

oli-obk commented Jan 20, 2021

Uh oh!

therealprof commented Jan 20, 2021

Uh oh!

RalfJung commented Jan 20, 2021

Uh oh!

rylev commented Jan 20, 2021

Uh oh!

RalfJung commented Jan 20, 2021

Uh oh!

RalfJung commented Dec 22, 2020 •

edited

Loading

RalfJung commented Jan 18, 2021 •

edited

Loading