Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]Use unaligned read/writes for core::mem::swap on x86_64 #98892

Commits on May 16, 2023

  1. Use unaligned read/writes for core::mem::swap on x86_64

    This generates better ASM: https://godbolt.org/z/Mr4rWfoad
    And misaligned accesses on modern x86_64 processors are fast (see docs for `core::mem::swap_chunked`).
    
    Main difference is that swapping `#[repr(packed)]` or aligned to 1 byte types now uses bigger chunks by utilizing `movq` and `movl` instructions.
    
    Also, bigger types (e.g. bigger or equal to XMM register) would use SIMD more effectively. Old code used them in not very effecient way, copying data to register and storing it in stack, then reading it back. It caused unneccessary memory reads and writes and completely removed benefits from SSE because number of instructions was similar to number of instructions for simple `usize` chunked swapping. New code instead stores temporary SIMD chunks entirely in registers by employing eiter 4 XMM registers, 2 YMM registers or 2 XMM registers depending on type size and compiler flags.
    
    Also, made size limit in condition for choosing chunked swap smaller to make types like `std::vec::Vec<T>` (especially `std::str::String`) use new optimizations.
    AngelicosPhosphoros committed May 16, 2023
    Configuration menu
    Copy the full SHA
    4607601 View commit details
    Browse the repository at this point in the history