Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added f32 and f64 unaligned stores and loads from avx512f set #873

Merged
merged 6 commits into from
Jul 11, 2020

Conversation

khodzha
Copy link
Contributor

@khodzha khodzha commented Jul 9, 2020

No description provided.

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @gnzlbg (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@khodzha khodzha force-pushed the avx_512_unaligned_load_store branch from 051049d to eeadf05 Compare July 9, 2020 21:15
// This intrinsic has no corresponding instruction.
pub unsafe fn _mm512_undefined_ps() -> __m512 {
// FIXME: this function should return MaybeUninit<__m512>
mem::MaybeUninit::<__m512>::uninit().assume_init()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is UB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The undefined intrinsics should return zero-initialized vectors like clang does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_mm_undefined_ps and _mm_256_undefined_p{d,s} do same thing, should i change them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes those should be fixed as well.

&mut dst as *mut __m512d as *mut u8,
mem::size_of::<__m512d>(),
);
dst
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does mem::transmute work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i used Amanieu version

&mut dst as *mut __m512d as *mut u8,
mem::size_of::<__m512d>(),
);
dst
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler version: ptr::read_unaligned(mem_addr as *const __m512d)

@khodzha
Copy link
Contributor Author

khodzha commented Jul 10, 2020

thanks, updated with feedback addressed

#[target_feature(enable = "avx512f")]
// This intrinsic has no corresponding instruction.
pub unsafe fn _mm512_undefined_pd() -> __m512d {
_mm512_set1_pd(0.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a zero float. I think it should be literal zero bytes. Not sure though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A zero float happens to be encoded with all bits zeroed.

&a as *const __m512d as *const u8,
mem_addr as *mut u8,
mem::size_of::<__m512d>(),
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use ptr::write_unaligned here just like the loads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@khodzha khodzha force-pushed the avx_512_unaligned_load_store branch from e5986e5 to 75c427e Compare July 10, 2020 20:17
@Amanieu
Copy link
Member

Amanieu commented Jul 10, 2020

For the CI failures just change the assert_instr to use vmovups. The two instructions are equivalent anyways.

@khodzha
Copy link
Contributor Author

khodzha commented Jul 10, 2020

interesting, there are these comments:

#[cfg_attr(test, assert_instr(vmovups))] // FIXME vmovupd expected

#[cfg_attr(test, assert_instr(vmovups))] // FIXME vmovupd expected

@Amanieu Amanieu merged commit 9faced9 into rust-lang:master Jul 11, 2020
@Amanieu
Copy link
Member

Amanieu commented Jul 11, 2020

Thanks!

@khodzha khodzha deleted the avx_512_unaligned_load_store branch July 11, 2020 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants