Remove multi-pass mechanism #173

psalz · 2023-04-13T12:53:21Z

While the multi-pass approach to executing CGFs has served us well in getting the Celerity idea to work early on, it has also caused us a lot of problems in regards to reference captures and lifetimes. This PR removes the multi-pass execution and replaces it with an accessor "hydration" mechanism for an overall safer API that is more in line with SYCL.

Highlights:

Instead of storing the entire CGF, we now only store the inner "command function" lambda
Accessors captured into command function closures are "hydrated" before launching kernel
Buffers and host objects can (and should) now be captured by reference into CGFs; added deprecation warnings
Accessors, side-effects and reductions may now be created from non-const buffers
- Deprecate allow_by_ref
Introduce new mechnaism to tie buffer and host object lifetimes to tasks (required as we no longer store captured copies inside the CGF)

Aditionally, this introduces a new "CGF diagnostics" utility type for catching common errors early. It currently provides two types of diagnostics:

Check whether accessor target matches kernel type (we had this before, but now it is checked during CGF submission and throws synchronously in the main thread).
Check whether all accessors (and side effects) are being copied into a kernel. Fewer accessors being copied than expected either means there are unused accessors (potential performance bug), or accessors are being captured by reference (dangling reference - very bad!).

github-actions

clang-tidy made some suggestions

examples/reduction/reduction.cc

test/runtime_deprecation_tests.cc

test/runtime_tests.cc

PeterTh

Wonderful stuff!
I am both elated and saddened by getting rid of the prepass, it's the end of an era for Celerity ;)

One thing I noticed while reading through this is that we might be a bit inconsistent in our error handling. Mostly, it seems to follow the principle that implementation issues are checked using assert, and probably user errors are reported using exceptions (if it's not possible at compile time). But sometimes I think we assert things potentially caused by user errors.

include/accessor.h

include/buffer.h

test/graph_compaction_tests.cc

PeterTh · 2023-04-13T14:56:55Z

Oh, and please do include updated benchmark results before merging (even though nothing much should happen).

fknorr

We're making really quick progress here, I love it!

I have a couple of concerns nonetheless.

batch_sycl_reduction_maker seems both overly complicated and also not sane to me (the correct index assignment in .make(reductions...) depends on function argument evaluation order). It should be possible to implement this functionality much simpler without any helper class by using a matching size_t parameter pack like so:

template <typename KernelFlavor, typename KernelName, int Dims, typename Kernel, size_t... ReductionIndices, typename... Reductions>
auto make_device_kernel_launcher(const range<Dims>& global_range, const id<Dims>& global_offset,
    typename detail::kernel_flavor_traits<KernelFlavor, Dims>::local_size_type local_range, Kernel kernel,
    std::index_sequence<ReductionIndices...>, Reductions... reductions) {
    // ...
        detail::invoke_sycl_parallel_for<KernelName>(cgh, sycl_global_range,
            make_sycl_reduction(reductions, static_cast<DataT*>(m_ptrs[ReductionIndices])...,
            detail::bind_simple_kernel(hydrated_kernel, global_range, global_offset, detail::id_cast<Dims>(execution_sr.offset)));
    // ...
}

// call site:
auto launcher = make_device_kernel_launcher<KernelFlavor, KernelName, Dims>(global_range, global_offset, local_range, kernel,
        std::index_sequence_for<Reductions...>(), reductions...);

By deprecating allow_by_ref, there are no backstops to dangling reference captures inside host task kernels. Can we somehow emit a warning when a user relies on ref-captures instead of side effects? Ideally we would disallow these altogether in the future, but it would be best not to break the interface here.
Host kernels can capture arbitrary expensive-to-copy data. Should we maybe make accessor(accessor &&) hydrating as well to keep expensive kernel lambdas movable, or do we decide that this is not worth the effort (also a valid choice in my book!)

.clang-tidy

examples/reduction/reduction.cc

include/accessor.h

include/host_object.h

include/task.h

test/accessor_tests.cc

fknorr

Looking good now, thanks!

From live discussion with @psalz: We decided to abandon the allow_by_ref safeguard altogether, which means that ref-capturing anything into host kernels will lead to UB without any reasonable way for us to detect it. Even if we could somehow forward the information that submit was called with allow_by_ref, using std::is_standard_layout on the host lambda would mean that the user cannot value-capture any type that contains references, which seemed overly restrictive to us.

About hydration-on-move, this appears difficult to achieve since we could not report user errors though exceptions out of the noexcept move constructors. Since we need to copy lambdas for hydration at least on oversubscripted tasks, the use of this is probably very limited anyway.

include/closure_hydrator.h

This is now implemented by the minimum version of DPC++ we require.

…exity ...until our CI setup supports overriding configurations on a per-folder basis (this check generates too many false positives for our unit tests).

github-actions

clang-tidy made some suggestions

include/handler.h

- Instead of storing the entire CGF, we now only store the inner "command function" lambda - Accessors captured into command function closures are "hydrated" before launching kernel - Buffers and host objects can (and should) now be captured by reference into CGFs; added deprecation warnings - Accessors, side-effects and reductions may now be created from non-const buffers - Deprecate allow_by_ref - Introduce new mechnaism to tie buffer and host object lifetimes to tasks (required as we no longer store captured copies inside the CGF) Other changes: - Fix a bug in test "horizons correctly deal with antidependencies", which relied on fixed order of dependencies - Change output of "command graph printing is unchanged" smoke test (again due to change in ordering)

It currently provides two types of diagnostics: - Check whether accessor target matches kernel type (we had this before, but now it is checked during CGF submission and throws synchronously in the main thread). - Check whether all accessors (and side effects) are being copied into a kernel. Fewer accessors being copied than expected either means there are unused accessors (potential performance bug), or accessors are being captured by reference (dangling reference - very bad!).

PeterTh

Looks good to me aside from some minor missed documentation updates: docs/overview.md still has 2 mentions of "Prepass".

Also, performance in the overhead benchmark set is basically unchanged in overall metrics (less than 1‰ diff) which is good and expected.

Use unified terminology in accessors and closure hydrator to clarify which ranges and offsets refer to backing buffers, and which ones refer to virtual buffer coordinates.

psalz requested review from fknorr and PeterTh and removed request for fknorr April 13, 2023 12:53

github-actions bot reviewed Apr 13, 2023

View reviewed changes

examples/reduction/reduction.cc Show resolved Hide resolved

test/runtime_deprecation_tests.cc Show resolved Hide resolved

test/runtime_tests.cc Show resolved Hide resolved

test/runtime_tests.cc Show resolved Hide resolved

test/runtime_tests.cc Show resolved Hide resolved

PeterTh reviewed Apr 13, 2023

View reviewed changes

include/accessor.h Outdated Show resolved Hide resolved

include/buffer.h Show resolved Hide resolved

test/graph_compaction_tests.cc Show resolved Hide resolved

fknorr requested changes Apr 13, 2023

View reviewed changes

psalz force-pushed the remove-multipass branch 2 times, most recently from 62f59a9 to 11f522d Compare April 17, 2023 15:10

fknorr approved these changes Apr 25, 2023

View reviewed changes

facuMH mentioned this pull request May 3, 2023

Accessor boundary check #178

Merged

facuMH force-pushed the remove-multipass branch from f2e4d9c to ef428dc Compare May 3, 2023 13:11

fknorr reviewed May 12, 2023

View reviewed changes

include/closure_hydrator.h Outdated Show resolved Hide resolved

psalz added 2 commits May 23, 2023 15:11

Remove DPC++ workaround for local_accessor not having default ctor

3ecd38c

This is now implemented by the minimum version of DPC++ we require.

clang-tidy: Temporarily disable readability-function-congnitive-compl…

17829cb

…exity ...until our CI setup supports overriding configurations on a per-folder basis (this check generates too many false positives for our unit tests).

psalz force-pushed the remove-multipass branch from ef428dc to da141e4 Compare May 23, 2023 13:16

github-actions bot reviewed May 23, 2023

View reviewed changes

include/handler.h Show resolved Hide resolved

psalz added 2 commits May 23, 2023 18:21

psalz force-pushed the remove-multipass branch from a172676 to 21870bb Compare May 23, 2023 16:21

PeterTh reviewed May 24, 2023

View reviewed changes

psalz added 4 commits May 24, 2023 16:09

Update documentation after multi-pass removal

0bab71f

Avoid unnecessary copies of kernel functors

57d690d

Clarify & unify buffer range/offset terminology

d4d08d0

Use unified terminology in accessors and closure hydrator to clarify which ranges and offsets refer to backing buffers, and which ones refer to virtual buffer coordinates.

Update microbenchmark results for multipass removal

37cf095

psalz force-pushed the remove-multipass branch from 21870bb to 37cf095 Compare May 24, 2023 14:09

psalz merged commit 00b12cf into master May 24, 2023

psalz deleted the remove-multipass branch May 24, 2023 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove multi-pass mechanism #173

Remove multi-pass mechanism #173

psalz commented Apr 13, 2023 •

edited

Loading

github-actions bot left a comment

PeterTh left a comment

PeterTh commented Apr 13, 2023

fknorr left a comment

fknorr left a comment

github-actions bot left a comment

PeterTh left a comment •

edited

Loading

Remove multi-pass mechanism #173

Remove multi-pass mechanism #173

Conversation

psalz commented Apr 13, 2023 • edited Loading

github-actions bot left a comment

Choose a reason for hiding this comment

PeterTh left a comment

Choose a reason for hiding this comment

PeterTh commented Apr 13, 2023

fknorr left a comment

Choose a reason for hiding this comment

fknorr left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

PeterTh left a comment • edited Loading

Choose a reason for hiding this comment

psalz commented Apr 13, 2023 •

edited

Loading

PeterTh left a comment •

edited

Loading