-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added f32 and f64 unaligned stores and loads from avx512f set #873
Merged
Amanieu
merged 6 commits into
rust-lang:master
from
khodzha:avx_512_unaligned_load_store
Jul 11, 2020
Merged
Changes from 2 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
eeadf05
added f32 and f64 unaligned stores and loads from avx512f set
khodzha abf64fb
fixed UB in _mm512_undefined_p{d,s}; improved _mm512_loads
khodzha eeeeef9
fixed UB in _mm256_undefined_p{d,s} and _mm_undefined_ps
khodzha 75c427e
replaced _mm512 storeus impls with ptr::write_unaligned
khodzha 31f4b24
replaced vmovupd with vmovups in assert_instr on _mm512_{load,store}u_pd
khodzha eb77316
moved _mm512_set{,r}_pd to x86 mod
khodzha File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
use crate::{ | ||
core_arch::{simd::*, simd_llvm::*, x86::*}, | ||
mem::{self, transmute}, | ||
ptr, | ||
}; | ||
|
||
#[cfg(test)] | ||
|
@@ -1633,6 +1634,83 @@ pub unsafe fn _mm512_mask_cmp_epi64_mask( | |
transmute(r) | ||
} | ||
|
||
/// Returns vector of type `__m512d` with undefined elements. | ||
/// | ||
/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_undefined_pd) | ||
#[inline] | ||
#[target_feature(enable = "avx512f")] | ||
// This intrinsic has no corresponding instruction. | ||
pub unsafe fn _mm512_undefined_pd() -> __m512d { | ||
_mm512_set1_pd(0.0) | ||
} | ||
|
||
/// Returns vector of type `__m512` with undefined elements. | ||
/// | ||
/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_undefined_ps) | ||
#[inline] | ||
#[target_feature(enable = "avx512f")] | ||
// This intrinsic has no corresponding instruction. | ||
pub unsafe fn _mm512_undefined_ps() -> __m512 { | ||
_mm512_set1_ps(0.0) | ||
} | ||
|
||
/// Loads 512-bits (composed of 8 packed double-precision (64-bit) | ||
/// floating-point elements) from memory into result. | ||
/// `mem_addr` does not need to be aligned on any particular boundary. | ||
/// | ||
/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_loadu_pd) | ||
#[inline] | ||
#[target_feature(enable = "avx512f")] | ||
#[cfg_attr(test, assert_instr(vmovupd))] | ||
pub unsafe fn _mm512_loadu_pd(mem_addr: *const f64) -> __m512d { | ||
ptr::read_unaligned(mem_addr as *const __m512d) | ||
} | ||
|
||
/// Stores 512-bits (composed of 8 packed double-precision (64-bit) | ||
/// floating-point elements) from `a` into memory. | ||
/// `mem_addr` does not need to be aligned on any particular boundary. | ||
/// | ||
/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_storeu_pd) | ||
#[inline] | ||
#[target_feature(enable = "avx512f")] | ||
#[cfg_attr(test, assert_instr(vmovupd))] | ||
pub unsafe fn _mm512_storeu_pd(mem_addr: *mut f64, a: __m512d) { | ||
ptr::copy_nonoverlapping( | ||
&a as *const __m512d as *const u8, | ||
mem_addr as *mut u8, | ||
mem::size_of::<__m512d>(), | ||
); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks |
||
} | ||
|
||
/// Loads 512-bits (composed of 16 packed single-precision (32-bit) | ||
/// floating-point elements) from memory into result. | ||
/// `mem_addr` does not need to be aligned on any particular boundary. | ||
/// | ||
/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_loadu_ps) | ||
#[inline] | ||
#[target_feature(enable = "avx512f")] | ||
#[cfg_attr(test, assert_instr(vmovups))] | ||
pub unsafe fn _mm512_loadu_ps(mem_addr: *const f32) -> __m512 { | ||
ptr::read_unaligned(mem_addr as *const __m512) | ||
} | ||
|
||
/// Stores 512-bits (composed of 16 packed single-precision (32-bit) | ||
/// floating-point elements) from `a` into memory. | ||
/// `mem_addr` does not need to be aligned on any particular boundary. | ||
/// | ||
/// [Intel's documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_storeu_ps) | ||
#[inline] | ||
#[target_feature(enable = "avx512f")] | ||
#[cfg_attr(test, assert_instr(vmovups))] | ||
#[stable(feature = "simd_x86", since = "1.27.0")] | ||
pub unsafe fn _mm512_storeu_ps(mem_addr: *mut f32, a: __m512) { | ||
ptr::copy_nonoverlapping( | ||
&a as *const __m512 as *const u8, | ||
mem_addr as *mut u8, | ||
mem::size_of::<__m512>(), | ||
); | ||
} | ||
|
||
/// Equal | ||
pub const _MM_CMPINT_EQ: _MM_CMPINT_ENUM = 0x00; | ||
/// Less-than | ||
|
@@ -1702,6 +1780,8 @@ mod tests { | |
use stdarch_test::simd_test; | ||
|
||
use crate::core_arch::x86::*; | ||
use crate::core_arch::x86_64::_mm512_setr_pd; | ||
use crate::hint::black_box; | ||
|
||
#[simd_test(enable = "avx512f")] | ||
unsafe fn test_mm512_abs_epi32() { | ||
|
@@ -2326,4 +2406,42 @@ mod tests { | |
unsafe fn test_mm512_setzero_ps() { | ||
assert_eq_m512(_mm512_setzero_ps(), _mm512_set1_ps(0.)); | ||
} | ||
|
||
#[simd_test(enable = "avx512f")] | ||
unsafe fn test_mm512_loadu_pd() { | ||
let a = &[4., 3., 2., 5., 8., 9., 64., 50.]; | ||
let p = a.as_ptr(); | ||
let r = _mm512_loadu_pd(black_box(p)); | ||
let e = _mm512_setr_pd(4., 3., 2., 5., 8., 9., 64., 50.); | ||
assert_eq_m512d(r, e); | ||
} | ||
|
||
#[simd_test(enable = "avx512f")] | ||
unsafe fn test_mm512_storeu_pd() { | ||
let a = _mm512_set1_pd(9.); | ||
let mut r = _mm512_undefined_pd(); | ||
_mm512_storeu_pd(&mut r as *mut _ as *mut f64, a); | ||
assert_eq_m512d(r, a); | ||
} | ||
|
||
#[simd_test(enable = "avx512f")] | ||
unsafe fn test_mm512_loadu_ps() { | ||
let a = &[ | ||
4., 3., 2., 5., 8., 9., 64., 50., -4., -3., -2., -5., -8., -9., -64., -50., | ||
]; | ||
let p = a.as_ptr(); | ||
let r = _mm512_loadu_ps(black_box(p)); | ||
let e = _mm512_setr_ps( | ||
4., 3., 2., 5., 8., 9., 64., 50., -4., -3., -2., -5., -8., -9., -64., -50., | ||
); | ||
assert_eq_m512(r, e); | ||
} | ||
|
||
#[simd_test(enable = "avx512f")] | ||
unsafe fn test_mm512_storeu_ps() { | ||
let a = _mm512_set1_ps(9.); | ||
let mut r = _mm512_undefined_ps(); | ||
_mm512_storeu_ps(&mut r as *mut _ as *mut f32, a); | ||
assert_eq_m512(r, a); | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a zero float. I think it should be literal zero bytes. Not sure though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A zero float happens to be encoded with all bits zeroed.