-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐙 Optimise PPACK #2067
🐙 Optimise PPACK #2067
Conversation
Can confirm this works on CPU SIMD, with a 5% speedup |
could you merge master? easier to see what's new that way |
Sure! |
"[[maybe_unused]] arb_size_type {0}index_constraints_n_contiguous = pp->index_constraints.n_contiguous;\\\n" | ||
"[[maybe_unused]] arb_size_type {0}index_constraints_n_constant = pp->index_constraints.n_constant;\\\n" | ||
"[[maybe_unused]] arb_size_type {0}index_constraints_n_independent = pp->index_constraints.n_independent;\\\n" | ||
"[[maybe_unused]] arb_size_type {0}index_constraints_n_none = pp->index_constraints.n_none;\\\n" | ||
"[[maybe_unused]] arb_index_type* __restrict__ {0}index_constraints_contiguous = pp->index_constraints.contiguous;\\\n" | ||
"[[maybe_unused]] arb_index_type* __restrict__ {0}index_constraints_constant = pp->index_constraints.constant;\\\n" | ||
"[[maybe_unused]] arb_index_type* __restrict__ {0}index_constraints_independent = pp->index_constraints.independent;\\\n" | ||
"[[maybe_unused]] arb_index_type* __restrict__ {0}index_constraints_none = pp->index_constraints.none;\\\n"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these index constraints and their representation documented somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an internal indicator to basically multi-version the SIMDised loops.
- contiguous: this bundle of
N
indices is{k, k+1, ..., k+N-1}
- constant:
{k, k, ..., k}
- independent and none: seem to be the same, we know nothing about the set
These are then used to drive the ld/st operations from/to SIMD types, so that the
actual loop can operator on full-width SIMD types.
I do not know whether this has been documented, it was already here when I came ;)
I like the explicit types instead of |
Ideally one would hope for a warning about implicit conversions as well. |
PPACK_IFACE_BLOCK
will be__restrict__
ved_di
is not used anymore, gone and ABI bumpedPPACK_IFACE_BLOCK
NB. This builds on earlier work to test performance metrics; to be deleted and cherrypicked.