Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorized field trait #216

Merged
merged 2 commits into from
Sep 4, 2021
Merged

Vectorized field trait #216

merged 2 commits into from
Sep 4, 2021

Conversation

nbgl
Copy link
Contributor

@nbgl nbgl commented Sep 3, 2021

Update:
I’m breaking this massive PR up into a few smaller ones to ease review. This one is for the PackedField and Packable traits, as well as the Singleton struct. A PackedField is a wrapper around a vector of field elements, permitting a subset of field operations. Packable extends Field with the default PackedField for a particular Field. It defaults to Singleton a dummy container for a scalar, and is specialized per platform.

Original comment:
This PR introduces PackedFields, which permit SIMD field arithmetic. The default packing defers to scalar code, but can be overridden for particular architecture/field combinations.

I implemented Crandall arithmetic in AVX2, and rewrote fft_classic in vector as a test case. The vectorized FFT is 30-ish% faster (needs more thorough benchmarks) on Skylake.

While functional, this branch needs a cleanup before it can be reviewed. There’s some debug logging/unused imports/confusing names/undocumented tricks. But I welcome big-picture comments!

@nbgl
Copy link
Contributor Author

nbgl commented Sep 3, 2021

It looks like the build can’t compile num-bigint, so I suspect it would fail on main as well

@dlubarov
Copy link
Contributor

dlubarov commented Sep 3, 2021

Yep the latest nightly seems to break main. I'll look into it

@dlubarov
Copy link
Contributor

dlubarov commented Sep 3, 2021

Looks like we'll need to wait for a change in num -- see rust-num/num-bigint/issues/218. It affects lots of users so hopefully it will come soon.

@nbgl nbgl mentioned this pull request Sep 3, 2021

fn broadcast(x: Self::FieldType) -> Self;

fn from_arr(arr: &[Self::FieldType]) -> Self;
Copy link
Contributor

@dlubarov dlubarov Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this from_slice since it technically takes a slice? Or have it take [Self::FieldType; WIDTH]

Similarly for the method below, I'd either call it to_vec or have it return [Self::FieldType; WIDTH]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is already on the to-do list 😊


#[derive(Copy, Clone)]
#[repr(C, align(4))]
pub struct PackedCrandallAVX2(pub [u64; 4]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm would it be simpler to wrap the __m256i? Seems like we convert to it rather than doing anything directly with the [u64; 4]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an awful trick to stop Rust assuming 32-byte alignment. I’ve added the following comment to explain it:

// PackedCrandallAVX2 wraps an array of four u64s, with the new and get methods to convert that
// array to and from __m256i, which is the type we actually operate on. This indirection is a
// terrible trick to change PackedCrandallAVX2's alignment.
//   We'd like to be able to cast slices of CrandallField to slices of PackedCrandallAVX2. Rust
// aligns __m256i to 32 bytes but CrandallField has a lower alignment. That alignment extends to
// PackedCrandallAVX2 and it appears that it cannot be lowered with #[repr(C, blah)]. It is
// important for Rust not to assume 32-byte alignment, so we cannot wrap __m256i directly.
//   There are two versions of vectorized load/store instructions on x86: aligned (vmovaps and
// friends) and unaligned (vmovups etc.). The difference between them is that aligned loads and
// stores are permitted to segfault on unaligned accesses. Historically, the aligned instructions
// were faster, and although this is no longer the case, compilers prefer the aligned versions if
// they know that the address is aligned. Using aligned instructions on unaligned addresses leads to
// bugs that can be frustrating to diagnose. Hence, we can't have Rust assuming alignment, and
// therefore PackedCrandallAVX2 wraps [u64; 4] and not __m256i.

Please let me know if you can think of a nicer way to do this 😊

@nbgl nbgl changed the title WIP: Vectorized fields Vectorized field trait Sep 3, 2021
@nbgl nbgl requested a review from dlubarov September 3, 2021 22:39
Copy link
Contributor

@dlubarov dlubarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nbgl nbgl merged commit 032e2fe into main Sep 4, 2021
@nbgl nbgl deleted the jakub/vector-field-trait branch September 4, 2021 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants