Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Declarative side effects #68

Merged
merged 11 commits into from
Feb 17, 2022
Merged

Declarative side effects #68

merged 11 commits into from
Feb 17, 2022

Conversation

fknorr
Copy link
Contributor

@fknorr fknorr commented Dec 23, 2021

Currently, state that is not managed by Celerity must be modified through reference-captures in host tasks. This is opaque to the runtime and does not allow us to detect dependencies on global state between tasks and commands. SYCL proposes handler::depends_on for this purpose, mostly motivated by USM. In Celerity however, we want to avoid control-flow dependencies if possible and focus on data flow.

New Public APIs

This PR introduces new declarative APIs for host-task side-effects.

  • Creating a celerity::experimental::host_object<T> wraps and manages an arbitrary value which should be tracked by the runtime. This has roughly the semantics of a buffer. Specializations exists for host_object<T&> which tracks a user-owned value and host_object<void> which tracks access to global state.
  • Capturing a celerity::experimental::side_effect in a host-task kernel declares that the task will affect the host object, and that a dependency exists to preceding and following tasks affecting the same host object. A side_effect has roughly the semantics of an accessor.

This eliminates some use cases for reference-captures and properly solves the ordering problem in the wave_sim example that is being worked around in #66.

So far, side effects always generate true-dependencies to the last task that affected the same host object. We are aware that this might be overly restrictive for uses that do not care about access ordering or are fully thread-safe. A backwards-compatible API for specifying constraints (similar to celerity::access_mode or std::memory_order) will be investigated in a future RFC.

Internal additions

  • The runtime now holds a host_object_manager that manages the host_object_ids and knows when the last host-object goes out of scope to (potentially) initiate runtime shutdown.
  • celerity::handler tracks a side_effect_map similar to the buffer_access_map and passes it to the task on creation.
  • task_manager and graph_generator track the last task / command affecting each host object to properly generate dependencies.
  • Tests are added covering dependency generation, proper ordering, and correct behavior around horizons.

For an example usage, see the wave_sim example.

Resolved Questions

  • Do we need to handle host-object dependencies explicitly on horizon generation? (=> Yes, now explicitly handled in the spirit of Correctly apply horizons to collective-group dependencies #69)
  • Do we need init-task / init-command dependencies? (=> This might become a concern if we generate anti-dependencies after introducing more than one side_effect_order).
  • Should the buffer-equivalent hold the object by-value or by-reference? By-value is lifetime-safe and can restrict access, but by-reference might be convenient when porting to Celerity (=> we want both, as well as a void specialization for side effects on globals)
  • Can we consider the final design (after the RFC period) future-proof, or should we stage it through the experimental:: namespace? (=> experimental namespace for now)
  • Would a shared vs exclusive dichotomy be better than the current classification into read and write side effects (we were re-using access_mode in the first design iteration)? We use these to control concurrency, but if a mutating kernel does so atomically, we do not need to serialize it. Classifying it as a read would be counter-intuitive. Also, will exclusive vs. shared also change dependency generation in any way? (=> Will be solved in a future RFC)
  • The name host_object is not very descriptive, although it highlights the connection to host_task. The name needs to clarify that, unlike a buffer, this type is neither replicated nor distributed. (=> Good for now, though we might change the name before stabilization).
  • Currently, we've implemented a default-sequential order on side effects. Is that a good idea? (=> Yes, similar to memory_order::seq_cst being the default for atomics, this should only cause inefficiency, never incorrectness).

@fknorr fknorr force-pushed the side-effects branch 4 times, most recently from d0f80e9 to bc757a4 Compare December 27, 2021 14:03
@fknorr
Copy link
Contributor Author

fknorr commented Dec 27, 2021

From today's Celerity meeting:

  • The operator() access protection on side_effect breaks with what the API pattern of accessors, so we will remove it
  • We need to think about access modes further. exclusive vs shared might be better than read vs write to communicate that we want to constrain concurrency rather than declare data movement. Some operations are not easily classified into read/write patterns, and manual synchronization and atomic operations may allow concurrency between writing tasks.
  • The final name of host_object should express better that this type, unlike buffers, is neither distributed nor replicated
  • We want host_object<T&> to naturally provide side effects on user-managed variables and host_object<void> to represent side effects on globals
  • For now, the new features should reside in the experimental:: namespace

@psalz
Copy link
Member

psalz commented Dec 27, 2021

One more thing from the discussion: If we do end up using the existing deduction tags, we should probably opt for the *_host_task variants for consistency.

@fknorr fknorr force-pushed the side-effects branch 2 times, most recently from 84c8c95 to c20f377 Compare December 27, 2021 21:59
@fknorr
Copy link
Contributor Author

fknorr commented Jan 11, 2022

From yesterday's Celerity meeting:

  • This PR will proceed without a public API for different side effect access modes / concurrency and reordering constraints. We want to have this at some point, however it's not clear yet what the precise model should be. Side effects will however always default to the strictest mode, i.e. full serialization in submission order relative to other tasks affecting the same host object. This makes the API forward-compatible when we add "side effect order" deduction tags (think of std::atomic and std::memory_order).
  • Although we feel host_object is not optimally named, we will go forward with it for the lack of a better alternative. host_object might in the future have an equivalence in host-local, non-distributed buffers (i.e. thinly wrapped SYCL buffers for passing data between kernels locally per-node). If that ever becomes reality, both types should share a naming scheme.
  • host_object<void> will be necessary to serialize global side effects (like calls to printf), however host_object<T&> is more controversial. It might be irreplaceable while porting existing applications to Celerity, but it will be another footgun like celerity::allow_by_ref. We will keep it for now, but it might not ever be stabilized.

@fknorr fknorr force-pushed the side-effects branch 6 times, most recently from cce282c to 702d31f Compare January 12, 2022 09:46
@fknorr fknorr changed the title RFC: Declarative side effects Declarative side effects Jan 12, 2022
@fknorr fknorr requested a review from facuMH January 12, 2022 10:05
@fknorr fknorr marked this pull request as ready for review January 12, 2022 10:05
@fknorr
Copy link
Contributor Author

fknorr commented Jan 12, 2022

This has now left the RFC phase and is ready for review.

include/handler.h Show resolved Hide resolved
include/host_object.h Outdated Show resolved Hide resolved
include/host_object.h Show resolved Hide resolved
include/task.h Outdated Show resolved Hide resolved
include/types.h Show resolved Hide resolved
src/graph_generator.cc Show resolved Hide resolved
@fknorr fknorr force-pushed the side-effects branch 3 times, most recently from 1760326 to f4ed645 Compare January 19, 2022 08:25
Copy link
Contributor

@PeterTh PeterTh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me. The introduction of task_geometry also cleans that part up a bit.

include/task.h Outdated Show resolved Hide resolved
Copy link
Contributor

@facuMH facuMH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@BlackMark29A BlackMark29A left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from some minor bikeshedding, this looks good to me.

examples/wave_sim/wave_sim.cc Show resolved Hide resolved
examples/wave_sim/wave_sim.cc Outdated Show resolved Hide resolved
test/graph_generation_tests.cc Show resolved Hide resolved
test/task_graph_tests.cc Show resolved Hide resolved
Copy link
Member

@psalz psalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

include/host_object.h Show resolved Hide resolved
include/host_object.h Outdated Show resolved Hide resolved
include/host_object.h Show resolved Hide resolved
test/graph_generation_tests.cc Show resolved Hide resolved
include/task.h Show resolved Hide resolved
@fknorr fknorr merged commit 7a5326a into master Feb 17, 2022
@fknorr fknorr deleted the side-effects branch February 17, 2022 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants