-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
common: apply two stage copy to aarch64 #3145
Conversation
On aarch64 ZSTD_wildcopy uses a simple loop to do 16B based memory copy. There is existing optimized two stage copy that can achieve better performance. By applying this to aarch64 it is also observed ~1% uplift in silesia corpus. Signed-off-by: Jun He <jun.he@arm.com> Change-Id: Ic1253308e7a8a7df2d08963ba544e086c81ce8be
By |
1 similar comment
This comment was marked as duplicate.
This comment was marked as duplicate.
Hi, @Cyan4973 , the result have been benchmarked on the Arm N1/A72/A57 platforms and observed similar uplift. |
I can't remember why this code was added here. It could be that, with From what I can see, the second formulation just separates the first branch from later ones, so that it can have its own statistics (as opposed to being merged with other loop iterations). Such a construction is expected to be rather good in the context of So I'm gonna make an educated guess here and state that this PR seems tends to improve the situation, on top of simplifying it by removing a weird and poorly documented corner case. |
1 similar comment
This comment was marked as duplicate.
This comment was marked as duplicate.
Thanks for the PR @JunHe77! |
On aarch64 ZSTD_wildcopy uses a simple loop to do
16B based memory copy. There is existing optimized
two stage copy that can achieve better performance.
By applying this to aarch64 it is also observed ~1%
uplift in silesia corpus.
Signed-off-by: Jun He jun.he@arm.com
Change-Id: Ic1253308e7a8a7df2d08963ba544e086c81ce8be