Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling reduce_then_scan for "Set" family of scan APIs #1879

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

danhoeflinger
Copy link
Contributor

@danhoeflinger danhoeflinger commented Sep 30, 2024

Enabling reduce then scan algorithm to be used for set family APIs:
set_intersection
set_union
set_difference
set_symmetric_difference

Adds high-level helpers required to convert set family to use reduce_then_scan on GPU with size 32 subgroups.

Also makes set consistent with the other scan algorithms to select algorithm in the __parallel_* level of function call.

This provides significant performance improvements, especially at small sizes of n. At larger sizes of n, this is an improvement over the existing algorithm, but not by a lot.

There is still significant room for improvement here, this algorithm is very inefficient.


Three future things to try (from easiest to hardest) to improve the set family within the reduce then scan alg:

  1. We could explore our alternative __pstl_lower_bound, __shars_lower_bound to see if it provides benefit.
  2. We could propagate a "hint" minimum search index between blocks (this provides a lower bound and better "blocks" the second set, in addition to the blocking we do inherently on the first set)
  3. We could propagate a similar "hint" minimum search index subgroup from one iteration to the next. (This prevents redundant computation for consecutive iterations for a subgroup. Since the sets are ordered, subsequent searches will never go backward as long as the element we are searching's index in the first increases, which it does).

Right now, we are wasting lots of memory accesses and comparisons on sections of the second set which should be able to ruled out from knowledge we have access to at the time we are making the search (in the "Reduce" predicate). We also lose the "blocking" cache advantage of reduce_then_scan, because the second set always searches from the start of the list.

Copy link
Contributor

@adamfidel adamfidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass of comments. I think I generally understand the algorithm and the logic seems good to me.

Comment on lines 847 to 848
bool bres =
_IsOpDifference::value; //initialization in true in case of difference operation; false - intersection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool bres =
_IsOpDifference::value; //initialization in true in case of difference operation; false - intersection.
constexpr bool bres =
_IsOpDifference::value; //initialization is true in case of difference operation; false - intersection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i reverted the constexpr part of this suggestion since it is not used as a constexpr other than in this initialization.

danhoeflinger and others added 12 commits October 14, 2024 13:47
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
wording and constexpr

Co-authored-by: Adam Fidel <110841220+adamfidel@users.noreply.github.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
It is only initialized with a constexpr, later updated at runtime

Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
@danhoeflinger danhoeflinger force-pushed the dev/shared/reduce_then_scan_set_family branch from 01ee3a9 to adc6cc5 Compare October 14, 2024 18:13
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
@danhoeflinger
Copy link
Contributor Author

I realize I never expanded the set of reviewers from a small group, though I know we agreed not to attempt to spend too much effort on perfecting this approach (for performance) before merging. I've now added a few more for visibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants