Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove spin dependency on x86 and x86_64 targets. #2331

Merged
merged 5 commits into from
Feb 8, 2025
Merged

Conversation

briansmith
Copy link
Owner

@briansmith briansmith commented Feb 5, 2025

Here's a representative example of how this changes the generated assembly in callers:

@@ -36,14 +36,11 @@
 	lea r12, [rcx - 4]
3343
 		Acquire => intrinsics::atomic_load_acquire(dst),
-	mov rax, qword ptr [rip + ring::cpu::intel::featureflags::FEATURES@GOTPCREL]
-	movzx eax, byte ptr [rax + 4]
-		match self.status.load(Ordering::Acquire) {
-	cmp al, 2
-		if let Some(value) = self.get() {
-	jne .LBB85_2
+	mov rax, qword ptr [rip + ring::cpu::intel::featureflags::FEATURES]
+		match NonZeroUsize::new(val) {
+	test rax, rax
+	je .LBB85_2
 	movabs rax, 274877906881
 		if input.len() > MAX_IN_OUT_LEN {
@@ -104,10 +101,9 @@
 		if in_out.len() >= SSE_MIN_LEN {
 	cmp r12, 129
 	jb .LBB85_8
-		*features
-	mov rax, qword ptr [rip + ring::cpu::intel::featureflags::FEATURES@GOTPCREL]
-	mov eax, dword ptr [rax]
3343
+		Acquire => intrinsics::atomic_load_acquire(dst),
+	mov rax, qword ptr [rip + ring::cpu::intel::featureflags::FEATURES]
 		if (self.values() & MASK) == MASK {
 	test eax, 256
@@ -216,9 +212,9 @@
 .LBB85_2:
 	.cfi_def_cfa rbp, 16
 	mov r13, r8
-		self.try_call_once_slow(f)
-	call spin::once::Once<T,R>::try_call_once_slow
+		None => self.init(f),
+	call ring::polyfill::once_cell::race::OnceNonZeroUsize::init
 	mov r8, r13
 	movabs rax, 274877906881

@briansmith briansmith self-assigned this Feb 5, 2025
Copy link

codecov bot commented Feb 5, 2025

Codecov Report

Attention: Patch coverage is 90.38462% with 5 lines in your changes missing coverage. Please review.

Project coverage is 96.63%. Comparing base (be72a40) to head (96e2676).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
src/polyfill/once_cell/race.rs 82.14% 5 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2331   +/-   ##
=======================================
  Coverage   96.63%   96.63%           
=======================================
  Files         175      176    +1     
  Lines       21551    21597   +46     
  Branches      526      527    +1     
=======================================
+ Hits        20825    20871   +46     
- Misses        611      613    +2     
+ Partials      115      113    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@briansmith briansmith force-pushed the b/oncelock branch 4 times, most recently from 28009d2 to e5482c8 Compare February 6, 2025 23:39
See the added README.txt for details.
Inline `get_or_try_init` into `get_or_init` since we never use
get_or_try_init.
@briansmith briansmith force-pushed the b/oncelock branch 2 times, most recently from 50008d7 to 7e92df5 Compare February 7, 2025 22:42
@briansmith
Copy link
Owner Author

This was rebased on top of #2338 and #2337 to eliminate some tiny regressions in CPU feature dispatching performance due to reloads.

The new version incorporates matklad/once_cell#273. PR #2338 seems to obviate the need for adding get_unchecked but I did start the process of working w/ upstream on that at matklad/once_cell#274. I also filed a documentation clarification at rust-lang/rust#136669 to support the once_cell change.

With the reduced size of the CPU feature detection state, we can use
simpler logic for storing and initializing it. This removes a
dependency on x86/x86_64 targets.

Here is a representative example of how this changes the generated assembly:

```
@@ -36,14 +36,11 @@
 	lea r12, [rcx - 4]
 		Acquire => intrinsics::atomic_load_acquire(dst),
-	mov rax, qword ptr [rip + ring::cpu::intel::featureflags::FEATURES@GOTPCREL]
-	movzx eax, byte ptr [rax + 4]
-		match self.status.load(Ordering::Acquire) {
-	cmp al, 2
-		if let Some(value) = self.get() {
-	jne .LBB85_2
+	mov rax, qword ptr [rip + ring::cpu::intel::featureflags::FEATURES]
+		match NonZeroUsize::new(val) {
+	test rax, rax
+	je .LBB85_2
 	movabs rax, 274877906881
 		if input.len() > MAX_IN_OUT_LEN {
@@ -104,10 +101,9 @@
 		if in_out.len() >= SSE_MIN_LEN {
 	cmp r12, 129
 	jb .LBB85_8
-		*features
-	mov rax, qword ptr [rip + ring::cpu::intel::featureflags::FEATURES@GOTPCREL]
-	mov eax, dword ptr [rax]
+		Acquire => intrinsics::atomic_load_acquire(dst),
+	mov rax, qword ptr [rip + ring::cpu::intel::featureflags::FEATURES]
 		if (self.values() & MASK) == MASK {
 	test eax, 256
@@ -216,9 +212,9 @@
 .LBB85_2:
 	.cfi_def_cfa rbp, 16
 	mov r13, r8
-		self.try_call_once_slow(f)
-	call spin::once::Once<T,R>::try_call_once_slow
+		None => self.init(f),
+	call ring::polyfill::once_cell::race::OnceNonZeroUsize::init
 	mov r8, r13
 	movabs rax, 274877906881
```
@briansmith briansmith merged commit 06f42e8 into main Feb 8, 2025
173 of 174 checks passed
@briansmith briansmith deleted the b/oncelock branch February 8, 2025 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant