-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove spin dependency on x86 and x86_64 targets. #2331
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2331 +/- ##
=======================================
Coverage 96.63% 96.63%
=======================================
Files 175 176 +1
Lines 21551 21597 +46
Branches 526 527 +1
=======================================
+ Hits 20825 20871 +46
- Misses 611 613 +2
+ Partials 115 113 -2 ☔ View full report in Codecov by Sentry. |
28009d2
to
e5482c8
Compare
See the added README.txt for details.
Inline `get_or_try_init` into `get_or_init` since we never use get_or_try_init.
50008d7
to
7e92df5
Compare
This was rebased on top of #2338 and #2337 to eliminate some tiny regressions in CPU feature dispatching performance due to reloads. The new version incorporates matklad/once_cell#273. PR #2338 seems to obviate the need for adding |
7e92df5
to
f514079
Compare
With the reduced size of the CPU feature detection state, we can use simpler logic for storing and initializing it. This removes a dependency on x86/x86_64 targets. Here is a representative example of how this changes the generated assembly: ``` @@ -36,14 +36,11 @@ lea r12, [rcx - 4] Acquire => intrinsics::atomic_load_acquire(dst), - mov rax, qword ptr [rip + ring::cpu::intel::featureflags::FEATURES@GOTPCREL] - movzx eax, byte ptr [rax + 4] - match self.status.load(Ordering::Acquire) { - cmp al, 2 - if let Some(value) = self.get() { - jne .LBB85_2 + mov rax, qword ptr [rip + ring::cpu::intel::featureflags::FEATURES] + match NonZeroUsize::new(val) { + test rax, rax + je .LBB85_2 movabs rax, 274877906881 if input.len() > MAX_IN_OUT_LEN { @@ -104,10 +101,9 @@ if in_out.len() >= SSE_MIN_LEN { cmp r12, 129 jb .LBB85_8 - *features - mov rax, qword ptr [rip + ring::cpu::intel::featureflags::FEATURES@GOTPCREL] - mov eax, dword ptr [rax] + Acquire => intrinsics::atomic_load_acquire(dst), + mov rax, qword ptr [rip + ring::cpu::intel::featureflags::FEATURES] if (self.values() & MASK) == MASK { test eax, 256 @@ -216,9 +212,9 @@ .LBB85_2: .cfi_def_cfa rbp, 16 mov r13, r8 - self.try_call_once_slow(f) - call spin::once::Once<T,R>::try_call_once_slow + None => self.init(f), + call ring::polyfill::once_cell::race::OnceNonZeroUsize::init mov r8, r13 movabs rax, 274877906881 ```
f514079
to
96e2676
Compare
Here's a representative example of how this changes the generated assembly in callers: