Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fences as a replacement for Captures #151

Merged
merged 13 commits into from
Mar 28, 2023
Merged

Fences as a replacement for Captures #151

merged 13 commits into from
Mar 28, 2023

Conversation

fknorr
Copy link
Contributor

@fknorr fknorr commented Nov 7, 2022

As discussed in #94, the current proposal attaching data extraction facilities to epochs is suboptimal.

  • Extracting data on epochs prevents any concurrency between capture data transfers and other tasks
  • Generating data transfers for epochs violates the command-id ordering property around epochs as it is implemented currently
  • The distinction between captures on slow_full_sync and drain is subtle and confusing

This PR implements fences in the form of the user-facing celerity::experimental::fence(q, captures...) and the new fence task- and command types. Fences synchronize between executor and main threads for data extraction, but will allow independent tasks that have been submitted before to execute while the application is waiting on completion of the fence operation. To avoid serialization around multiple fences, a tuple of captures can be specified in the fence() call that will return a tuple of values.

A user can fence three different structures:

  • A buffer, returning an experimental::buffer_snapshot which is just a std::vector with metadata
  • An experimental::buffer_subrange, which combines a buffer with a subrange to extract. Likewise returns a buffer_snapshot.
  • A host_object, whose interior needs to be copyable.

Example

int main() {
    distr_queue q;
    buffer<int> buf_1;
    //...
    experimental::buffer_snapshot<int, 1> snap = experimental::fence(q,
        buffer_subrange(buf_1, subrange({1}, {1}});
    std::vector<int> data = std::move(snap.into_data());
}
int main() {
    distr_queue q;
    buffer<int> buf_1;
    buffer<float> buf_2;
    experimental::host_object<std::string> obj;
    //...
    auto [ints, floats, string] = experimental::fence(q, std::tuple(buf_1, buf_2, obj));
}

DAGs for fencing a buffer subrange

buffers

DAGs for fencing a host object

host-objects

@fknorr
Copy link
Contributor Author

fknorr commented Dec 7, 2022

From live discussion with @psalz: fence() should return a std::future, and fence_job should fulfill the corresponding promise. That way

  • fences can be submitted early and their values even be carried across subsequent buffer writes
  • we rid ourselves of the explicit (and manually implemented) barrier between executor and main threads
  • we don't need the interface for multiple captures, which was introduced to avoid the scheduler -> executor latency between multiple fences

This can be implemented with std::shared_future and collective host tasks today, but we will want to make it more efficient by avoiding the host_queue submission latency.

Eventually we want to allow futures in range mappers and other places to push back wait-points as far as possible.

@fknorr
Copy link
Contributor Author

fknorr commented Dec 14, 2022

I have updated this PR to use asynchronous fences through futures as discussed above.

The new interface is as follows:

namespace celerity::experimental {

template <typename DataT, int Dims>
class buffer_snapshot {
    ...
};

template <typename T>
std::future<T> fence(distr_queue&, const host_object<T>& obj);

template <typename DataT, int Dims>
std::future<buffer_snapshot<DataT, Dims>>
fence(distr_queue&, const buffer<DataT, Dims>& buf, const subrange<Dims>& sr);

template <typename DataT, int Dims>
std::future<buffer_snapshot<DataT, Dims>>
fence(distr_queue& q, const buffer<DataT, Dims>& buf);

}

The DPC++ CI failure is not related to these changes.

@fknorr fknorr marked this pull request as ready for review December 14, 2022 13:14
@fknorr fknorr changed the title RFC: Fences as a replacement for Captures Fences as a replacement for Captures Dec 14, 2022
Copy link
Member

@psalz psalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this feels like a good abstraction. I've added a few minor notes!

examples/distr_io/distr_io.cc Show resolved Hide resolved
examples/matmul/matmul.cc Show resolved Hide resolved
examples/matmul/matmul.cc Show resolved Hide resolved
examples/matmul/matmul.cc Show resolved Hide resolved
src/graph_generator.cc Show resolved Hide resolved
include/fence.h Outdated Show resolved Hide resolved
include/fence.h Outdated Show resolved Hide resolved
test/graph_generation_tests.cc Outdated Show resolved Hide resolved
test/task_graph_tests.cc Outdated Show resolved Hide resolved
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

examples/distr_io/distr_io.cc Outdated Show resolved Hide resolved
examples/matmul/matmul.cc Show resolved Hide resolved
include/fence.h Show resolved Hide resolved
include/fence.h Outdated Show resolved Hide resolved
include/fence.h Outdated Show resolved Hide resolved
src/worker_job.cc Show resolved Hide resolved
test/graph_generation_tests.cc Show resolved Hide resolved
test/graph_generation_tests.cc Show resolved Hide resolved
test/runtime_tests.cc Outdated Show resolved Hide resolved
test/system/distr_tests.cc Outdated Show resolved Hide resolved
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

include/task.h Show resolved Hide resolved
examples/matmul/matmul.cc Outdated Show resolved Hide resolved
include/command.h Show resolved Hide resolved
include/fence.h Outdated Show resolved Hide resolved
include/ranges.h Show resolved Hide resolved
include/task.h Outdated Show resolved Hide resolved
include/task.h Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants