Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐙 Optimise PPACK #2067

Merged

Conversation

thorstenhater
Copy link
Contributor

@thorstenhater thorstenhater commented Dec 14, 2022

  • pointers in PPACK_IFACE_BLOCK will be __restrict__
    • this is expected to produce better binary code for the scalar case
    • for GPU we expect this to better utilise data caches as per CUDA docs
  • ved_di is not used anymore, gone and ABI bumped
  • GPU code never used the indexing structs, so remove from PPACK_IFACE_BLOCK

NB. This builds on earlier work to test performance metrics; to be deleted and cherrypicked.

@thorstenhater thorstenhater requested review from boeschf and removed request for boeschf December 14, 2022 20:03
@thorstenhater thorstenhater marked this pull request as draft December 14, 2022 20:04
@llandsmeer
Copy link
Contributor

Can confirm this works on CPU SIMD, with a 5% speedup

@thorstenhater thorstenhater requested review from bcumming and boeschf and removed request for bcumming December 21, 2022 12:13
@thorstenhater thorstenhater marked this pull request as ready for review December 21, 2022 12:13
@boeschf
Copy link
Contributor

boeschf commented Dec 21, 2022

could you merge master? easier to see what's new that way

@thorstenhater
Copy link
Contributor Author

Sure!

Comment on lines +268 to +275
"[[maybe_unused]] arb_size_type {0}index_constraints_n_contiguous = pp->index_constraints.n_contiguous;\\\n"
"[[maybe_unused]] arb_size_type {0}index_constraints_n_constant = pp->index_constraints.n_constant;\\\n"
"[[maybe_unused]] arb_size_type {0}index_constraints_n_independent = pp->index_constraints.n_independent;\\\n"
"[[maybe_unused]] arb_size_type {0}index_constraints_n_none = pp->index_constraints.n_none;\\\n"
"[[maybe_unused]] arb_index_type* __restrict__ {0}index_constraints_contiguous = pp->index_constraints.contiguous;\\\n"
"[[maybe_unused]] arb_index_type* __restrict__ {0}index_constraints_constant = pp->index_constraints.constant;\\\n"
"[[maybe_unused]] arb_index_type* __restrict__ {0}index_constraints_independent = pp->index_constraints.independent;\\\n"
"[[maybe_unused]] arb_index_type* __restrict__ {0}index_constraints_none = pp->index_constraints.none;\\\n"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these index constraints and their representation documented somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an internal indicator to basically multi-version the SIMDised loops.

  • contiguous: this bundle of N indices is {k, k+1, ..., k+N-1}
  • constant: {k, k, ..., k}
  • independent and none: seem to be the same, we know nothing about the set

These are then used to drive the ld/st operations from/to SIMD types, so that the
actual loop can operator on full-width SIMD types.

I do not know whether this has been documented, it was already here when I came ;)

@boeschf
Copy link
Contributor

boeschf commented Jan 12, 2023

I like the explicit types instead of auto in the generated code.

@thorstenhater
Copy link
Contributor Author

Ideally one would hope for a warning about implicit conversions as well.

@thorstenhater thorstenhater requested a review from boeschf January 19, 2023 10:59
@thorstenhater thorstenhater merged commit 8087cb8 into arbor-sim:master Feb 2, 2023
@thorstenhater thorstenhater deleted the modcc/restrict-all-the-things branch February 2, 2023 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants