We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Currently u32x8 shuffle1_dyn are not optimized and fallback is used which results in a whole mess of extract intrinsics. It is not very fast.
u32x8
shuffle1_dyn
Can we please add support for _mm256_permutevar8x32_epi32 and similar variants at the u32x8 (and f32x8, etc.) levels? It is a fairly large speedup.
_mm256_permutevar8x32_epi32
Thanks
The text was updated successfully, but these errors were encountered:
Wondering about this as well (it's 30x slower than what it should be, without warning the user).
(should this be posted to stdsimd repo?)
Sorry, something went wrong.
Yes, all development has moved there.
No branches or pull requests
Currently
u32x8
shuffle1_dyn
are not optimized and fallback is used which results in a whole mess of extract intrinsics. It is not very fast.Can we please add support for
_mm256_permutevar8x32_epi32
and similar variants at the u32x8 (and f32x8, etc.) levels? It is a fairly large speedup.Thanks
The text was updated successfully, but these errors were encountered: