Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly manage buffer / host object lifetimes in graph generation #246

Merged
merged 4 commits into from
Feb 20, 2024

Conversation

fknorr
Copy link
Contributor

@fknorr fknorr commented Feb 9, 2024

This is a back-port from IDAG development and a second attempt at #216.

task_manager and distributed_graph_generator already maintain state for each buffer (add_buffer()) and host object (implicit on creation) , but keep that state around indefinitely even after the buffer or host object in question is destroyed. Also, neither graph generator has access to buffer debug names and thus can't include that information in error reports (such as uninitialized-read detection, there is a TODO in the code about this).

This PR adds explicit member functions for tracking the creation and destruction of buffers (create_buffer, destroy_buffer) and host objects (create_host_object, destroy_host_object) to task_manager, distributed_graph_generator, scheduler; and now by necessity, runtime, which receives these requests directly instead of indirectly through buffer_manager::buffer_lifetime_callback.

It also removes the recorder -> buffer_manager dependency by replicating the buffer name (like all other metadata) in both graph generators. This foreshadows the eventual removal of buffer_manager post-IDAG.

I purposefully changed the recording API such that its user is responsible for constructing records and the recorder simply aggregates them. This is now possible because the recorders do not need to know about task_manager and buffer_manager anymore, and will allow us, in the future, to add debug info to records that is not present in the "real" DAG (see the upcoming instruction graph generator for how this might look).

@fknorr fknorr requested review from psalz, PeterTh and GagaLP February 9, 2024 16:09
@fknorr fknorr self-assigned this Feb 9, 2024
Copy link

github-actions bot commented Feb 9, 2024

Check-perf-impact results: (4c65f1399a47e0eb1340f63004745b17)

❓ No new benchmark data submitted. ❓
Please re-run the microbenchmarks and include the results if your commit could potentially affect performance.

@fknorr fknorr force-pushed the explicit-lifetime-mgmt branch 2 times, most recently from 7f025ed to c23899d Compare February 9, 2024 16:27
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

include/host_object.h Show resolved Hide resolved
include/recorders.h Outdated Show resolved Hide resolved
@fknorr fknorr force-pushed the explicit-lifetime-mgmt branch from c23899d to 33d4e65 Compare February 9, 2024 19:55
@coveralls
Copy link

coveralls commented Feb 9, 2024

Pull Request Test Coverage Report for Build 7976008559

Details

  • -1 of 206 (99.51%) changed or added relevant lines in 15 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.1%) to 94.007%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/recorders.cc 24 25 96.0%
Totals Coverage Status
Change from base Build 7970610878: 0.1%
Covered Lines: 5046
Relevant Lines: 5239

💛 - Coveralls

Copy link
Member

@psalz psalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, so this also solves the issue of having different names for a single buffer throughout the course of an application! I like the parity between distributed_graph_generator and task_manager. Also the operator<< on recorders looks neat!

include/distributed_graph_generator.h Show resolved Hide resolved
include/runtime.h Show resolved Hide resolved
include/runtime.h Show resolved Hide resolved
Copy link

Check-perf-impact results: (f7ef1830fe7f830eee417b0b5a37c2ef)

❓ No new benchmark data submitted. ❓
Please re-run the microbenchmarks and include the results if your commit could potentially affect performance.

@fknorr
Copy link
Contributor Author

fknorr commented Feb 13, 2024

I've gone ahead and renamed every buffer_state variable to buffer in the DGGen for consistency.

@fknorr fknorr added this to the 0.6.0 milestone Feb 19, 2024
include/recorders.h Outdated Show resolved Hide resolved
src/distributed_graph_generator.cc Outdated Show resolved Hide resolved
src/distributed_graph_generator.cc Outdated Show resolved Hide resolved
task_manager and distributed_graph_generator already maintain state for each
buffer and host object, but keep it around indefinitely even after the
buffer or host object in question is destroyed. Also, neither have
access to buffer debug names and thus can't include that information in
error reports (such as uninitialized-read detection).

This commit adds explicit methods for tracking the creation and
destruction of objects to task_manager, distributed_graph_generator,
scheduler (and now by necessity, runtime, which receives these requests
directly instead of via the buffer_lifetime_callback).

This also removes the recorder -> buffer_manager dependency by
replicating the buffer name (like all other metadata) in both graph
generators. This foreshadows the eventual removal of buffer_manager with
the merge of instruction graph scheduling.
@fknorr fknorr force-pushed the explicit-lifetime-mgmt branch from 2b675b1 to b4ef1c0 Compare February 20, 2024 13:01
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

include/recorders.h Show resolved Hide resolved
@fknorr fknorr force-pushed the explicit-lifetime-mgmt branch from b4ef1c0 to b0717e7 Compare February 20, 2024 14:00
Copy link
Contributor

@PeterTh PeterTh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link

Check-perf-impact results: (19e9c0e63e2eff8d602cc7a81b622c19)

✔️ No significant performance change in the microbenchmark set. You are good to go!

Relative execution time per category: (mean of relative medians)

  • command-graph : 1.00x
  • graph-nodes : 0.99x
  • grid : 1.00x
  • scheduler : 0.99x
  • system : 1.00x
  • task-graph : 0.99x

@fknorr fknorr force-pushed the explicit-lifetime-mgmt branch from 6098277 to 234e017 Compare February 20, 2024 15:34
@fknorr fknorr merged commit b0458fc into master Feb 20, 2024
25 of 29 checks passed
@fknorr fknorr deleted the explicit-lifetime-mgmt branch February 20, 2024 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants