Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf: investigate manual inlining in layout.rs #99117

Closed
CAD97 opened this issue Jul 10, 2022 · 5 comments
Closed

Perf: investigate manual inlining in layout.rs #99117

CAD97 opened this issue Jul 10, 2022 · 5 comments
Labels
A-allocators Area: Custom and system allocators I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@CAD97
Copy link
Contributor

CAD97 commented Jul 10, 2022

Due to how many different times the layout code is instantiated, seemingly inconsequential changes can have noticeable impact on compile perf.

It may be beneficial to investigate some manual inlining (open coding) of code in the layout code (e.g. replacing ? and other option/result combinators with match/return) to reduce the amount of IR LLVM has to deal with.

@nnethercote
Copy link
Contributor

Another idea: Layout::from_size_align does two checks. One for alignment, one for size. I think the places where it was used in #95295 only need the size check. So an alternative unsafe version of Layout::from_size_align that does just the size check might help.

@scottmcm
Copy link
Member

mem::ValidAlign represents the alignment check in the type system, so I think it could even be safe but still only have the one check.

@CAD97
Copy link
Contributor Author

CAD97 commented Jul 10, 2022

If only tidy hadn't slowed me down I could've posted the PR before you said that 😛 (GitHub says the difference was 24 seconds)

bors added a commit to rust-lang-ci/rust that referenced this issue Jul 13, 2022
Take advantage of known-valid-align in layout.rs

An attempt to improve perf by `@nnethercote's` approach suggested in rust-lang#99117
bors added a commit to rust-lang-ci/rust that referenced this issue Aug 10, 2022
…oshtriplett

Reoptimize layout array

This way it's one check instead of two, so hopefully (cc rust-lang#99117) it'll be simpler for rustc perf too 🤞

Quick demonstration:
```rust
pub fn demo(n: usize) -> Option<Layout> {
    Layout::array::<i32>(n).ok()
}
```

Nightly: <https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=e97bf33508aa03f38968101cdeb5322d>
```nasm
	mov	rax, rdi
	mov	ecx, 4
	mul	rcx
	seto	cl
	movabs	rdx, 9223372036854775805
	xor	esi, esi
	cmp	rax, rdx
	setb	sil
	shl	rsi, 2
	xor	edx, edx
	test	cl, cl
	cmove	rdx, rsi
	ret
```

This PR (note no `mul`, in addition to being much shorter):
```nasm
	xor	edx, edx
	lea	rax, [4*rcx]
	shr	rcx, 61
	sete	dl
	shl	rdx, 2
	ret
```

This is built atop `@CAD97` 's rust-lang#99136; the new changes are cb8aba66ef6a0e17f08a0574e4820653e31b45a0.

I added a bunch more tests for `Layout::from_size_align` and `Layout::array` too.
@jyn514
Copy link
Member

jyn514 commented Apr 15, 2023

It looks like #99136 implemented this? Is there more work left to do?

@jyn514 jyn514 added I-slow Issue: Problems and improvements with respect to performance of generated code. A-allocators Area: Custom and system allocators T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 15, 2023
@nnethercote
Copy link
Contributor

I think it's time to declare victory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-allocators Area: Custom and system allocators I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants