[WIP]Use unaligned read/writes for `core::mem::swap` on x86_64 #98892

This generates better ASM: https://godbolt.org/z/Mr4rWfoad And misaligned accesses on modern x86_64 processors are fast (see docs for `core::mem::swap_chunked`). Main difference is that swapping `#[repr(packed)]` or aligned to 1 byte types now uses bigger chunks by utilizing `movq` and `movl` instructions. Also, bigger types (e.g. bigger or equal to XMM register) would use SIMD more effectively. Old code used them in not very effecient way, copying data to register and storing it in stack, then reading it back. It caused unneccessary memory reads and writes and completely removed benefits from SSE because number of instructions was similar to number of instructions for simple `usize` chunked swapping. New code instead stores temporary SIMD chunks entirely in registers by employing eiter 4 XMM registers, 2 YMM registers or 2 XMM registers depending on type size and compiler flags. Also, made size limit in condition for choosing chunked swap smaller to make types like `std::vec::Vec<T>` (especially `std::str::String`) use new optimizations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]Use unaligned read/writes for `core::mem::swap` on x86_64 #98892

[WIP]Use unaligned read/writes for `core::mem::swap` on x86_64 #98892

Commits on May 16, 2023

[WIP]Use unaligned read/writes for core::mem::swap on x86_64 #98892

[WIP]Use unaligned read/writes for core::mem::swap on x86_64 #98892

Commits on May 16, 2023

[WIP]Use unaligned read/writes for `core::mem::swap` on x86_64 #98892

[WIP]Use unaligned read/writes for `core::mem::swap` on x86_64 #98892