Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect uninitialized reads and overlapping writes #224

Merged
merged 4 commits into from
Dec 5, 2023

Conversation

fknorr
Copy link
Contributor

@fknorr fknorr commented Nov 2, 2023

During IDAG development, asserting that instructions do not read from uninitialized memory turned out to be an invaluable tool for finding bugs in dependency tracking and data migration. This PR is a back-port of that detection logic, extended to also find overlapping writes between chunks on CDAG generation.

By default, task_manager and distributed_graph_generator throw when detecting an uninitialized read or overlapping writes in order to catch internal errors and report tests with bogus access patterns. In the runtime, results from these checks are reported through warning (uninitialized read) and error (overlapping writes) logs instead - both for backwards compatibillity, and because uninitialized reads are expected when submitting kernels with read_write access in a loop (see also celerity/meta#74).

All tests with incorrect access patterns were fixed, and the reporting feature is tested both with the TDAG / CDAG test context infrastructure as well as the actual runtime.

@fknorr fknorr requested review from psalz and PeterTh November 2, 2023 13:01
@fknorr fknorr self-assigned this Nov 2, 2023
@fknorr fknorr added this to the 0.5.0 milestone Nov 2, 2023
Copy link
Contributor

@PeterTh PeterTh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great change which should help prevent a common source of issues/UB. Some thoughts in the comments.

src/runtime.cc Outdated Show resolved Hide resolved
test/print_graph_tests.cc Outdated Show resolved Hide resolved
test/print_graph_tests.cc Outdated Show resolved Hide resolved
@fknorr fknorr force-pushed the access-pattern-diagnostics branch 2 times, most recently from 1bc771c to 7cd1d97 Compare November 2, 2023 16:25
fknorr added a commit that referenced this pull request Nov 2, 2023
Copy link

github-actions bot commented Nov 2, 2023

no optimizations

Check-perf-impact results: (1d1f40270fc25a329bb16053c8fa15f2)

⚠️ Significant slowdown (>1.25x) in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / expanding tree topology, generating large command graphs for N nodes - 16 / wave_sim topology, benchmark independent task pattern with N tasks - 100 / task generation
🚀 Significant speedup (<0.80x) in some microbenchmark results: normalizing a fully mergeable tiling of boxes - 1 / large, embedded in 3d

Relative execution time per category: (mean of relative medians)

  • command-graph : 1.17x ⚠️
  • graph-nodes : 0.99x
  • grid : 1.00x
  • scheduler : 1.12x ⚠️
  • system : 1.10x
  • task-graph : 1.01x

fknorr added a commit that referenced this pull request Nov 3, 2023
Copy link

github-actions bot commented Nov 3, 2023

distributed overlapping-write detection

Check-perf-impact results: (7e3c8bfa480dfad2a4f08c4fabe32cc2)

⚠️ Significant slowdown (>1.25x) in some microbenchmark results: normalizing randomized box sets - 3d / small - native, building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / expanding tree topology, benchmark independent task pattern with N tasks - 100 / task generation
🚀 Significant speedup (<0.80x) in some microbenchmark results: benchmark stencil pattern with N time steps - 50 / iterations

Relative execution time per category: (mean of relative medians)

  • command-graph : 1.09x
  • graph-nodes : 0.99x
  • grid : 1.02x
  • scheduler : 1.09x
  • system : 1.08x
  • task-graph : 1.02x

fknorr added a commit that referenced this pull request Nov 3, 2023
Copy link

github-actions bot commented Nov 3, 2023

distributed overlapping-write detection + bounding-box uninitialized-read detection

Check-perf-impact results: (385bf6fbfe375253106eaa9f9b53a88e)

⚠️ Significant slowdown (>1.25x) in some microbenchmark results: benchmark independent task pattern with N tasks - 100 / task generation
🚀 Significant speedup (<0.80x) in some microbenchmark results: benchmark stencil pattern with N time steps - 1000 / iterations

Relative execution time per category: (mean of relative medians)

  • command-graph : 1.10x
  • graph-nodes : 0.98x
  • grid : 1.00x
  • scheduler : 1.06x
  • system : 1.05x
  • task-graph : 0.98x

@fknorr
Copy link
Contributor Author

fknorr commented Nov 3, 2023

I managed to optimize this a bit further, but CDAG generation still takes a 10% performance hit.

Copy link
Member

@psalz psalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change, thank you! A few minor notes:

include/utils.h Outdated Show resolved Hide resolved
src/distributed_graph_generator.cc Outdated Show resolved Hide resolved
test/graph_generation_tests.cc Show resolved Hide resolved
test/runtime_tests.cc Outdated Show resolved Hide resolved
test/task_graph_tests.cc Outdated Show resolved Hide resolved
test/graph_gen_transfer_tests.cc Show resolved Hide resolved
@fknorr fknorr force-pushed the access-pattern-diagnostics branch from f43978e to 04ccd4d Compare November 8, 2023 11:19
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

test/dag_benchmarks.cc Outdated Show resolved Hide resolved
test/dag_benchmarks.cc Outdated Show resolved Hide resolved
Copy link
Contributor Author

@fknorr fknorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed most concerns, but realized some potential inconsistencies in other features.

CMakeLists.txt Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
test/dag_benchmarks.cc Outdated Show resolved Hide resolved
fknorr added a commit that referenced this pull request Nov 9, 2023
@celerity celerity deleted a comment from github-actions bot Nov 9, 2023
@fknorr fknorr force-pushed the access-pattern-diagnostics branch from 2df606b to a6a49f6 Compare November 13, 2023 16:04
fknorr added a commit that referenced this pull request Nov 14, 2023
Copy link

github-actions bot commented Nov 14, 2023

disabling diagnostics by default in release builds

Check-perf-impact results: (9c7d95e3b2850796c2d810616c0f1aff)

⚠️ Significant slowdown (>1.25x) in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / expanding tree topology
🚀 Significant speedup (<0.80x) in some microbenchmark results: normalizing a fully mergeable tiling of boxes - 1 / large, embedded in 3d

Relative execution time per category: (mean of relative medians)

  • command-graph : 1.02x
  • graph-nodes : 1.01x
  • grid : 1.00x
  • scheduler : 1.02x
  • system : 1.00x
  • task-graph : 1.00x

@celerity celerity deleted a comment from github-actions bot Nov 14, 2023
@celerity celerity deleted a comment from github-actions bot Nov 14, 2023
@celerity celerity deleted a comment from github-actions bot Nov 14, 2023
fknorr added a commit that referenced this pull request Nov 16, 2023
@fknorr fknorr force-pushed the access-pattern-diagnostics branch from 5318e68 to 8d13007 Compare November 16, 2023 21:51
fknorr added a commit that referenced this pull request Nov 18, 2023
Copy link

github-actions bot commented Nov 18, 2023

do not maintain initialized-region when diagnostics are disabled

Check-perf-impact results: (a1f01e5fe178a68d42a7063603612487)

⚠️ Significant slowdown (>1.25x) in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / expanding tree topology
🚀 Significant speedup (<0.80x) in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / chain topology

Relative execution time per category: (mean of relative medians)

  • command-graph : 1.01x
  • graph-nodes : 0.99x
  • grid : 1.01x
  • scheduler : 1.02x
  • system : 0.99x
  • task-graph : 1.00x

@fknorr
Copy link
Contributor Author

fknorr commented Nov 18, 2023

I have turned the task-manager / graph-generator policy settings into constructor arguments; that way I can skip updating the initialized-region of a buffer when diagnostics are disabled (previously this region had to be maintained in anticipation of a change in polices, which of course never happened). I've also added a bit more documentation.

@fknorr fknorr requested review from PeterTh and psalz November 18, 2023 12:36
fknorr added a commit to fknorr/celerity-runtime that referenced this pull request Nov 18, 2023
task_manager and distributed_graph_generator by default throw when
detecting an uninitialized read or overlapping writes in order to catch
invalid test code or internal bugs. In `runtime`, these conditions are
reported through warning / error logs instead.
Copy link
Contributor

@PeterTh PeterTh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just some smaller remarks.

include/task.h Show resolved Hide resolved
include/utils.h Outdated Show resolved Hide resolved
src/distributed_graph_generator.cc Show resolved Hide resolved
test/print_graph_tests.cc Show resolved Hide resolved
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/distributed_graph_generator.cc Outdated Show resolved Hide resolved
@fknorr fknorr force-pushed the access-pattern-diagnostics branch from d9c3a3b to 1df889a Compare November 29, 2023 13:22
Copy link

Check-perf-impact results: (3b34e58e3c100f4c3541a1ed59580f72)

❓ No new benchmark data submitted. ❓
Please re-run the microbenchmarks and include the results if your commit could potentially affect performance.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

test/utils_tests.cc Show resolved Hide resolved
fknorr added a commit that referenced this pull request Nov 29, 2023
@fknorr fknorr force-pushed the access-pattern-diagnostics branch from 1df889a to a785278 Compare November 29, 2023 15:49
Copy link

Check-perf-impact results: (54ff34045c40bf12cbc2ca95f475792b)

⚠️ Significant slowdown (>1.25x) in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / expanding tree topology
🚀 Significant speedup (<0.80x) in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / chain topology, building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / contracting tree topology

Relative execution time per category: (mean of relative medians)

  • command-graph : 1.00x
  • graph-nodes : 0.98x
  • grid : 1.01x
  • scheduler : 1.01x
  • system : 0.98x
  • task-graph : 1.01x

@fknorr
Copy link
Contributor Author

fknorr commented Nov 29, 2023

I have used this opportunity to refactor how task debug names are generated from class KernelName types and printed in debug outputs. Unnamed kernels now do not show "unnamed_kernel" as their name anymore and the task type is visualized as well.

There is now a new step_builder::name() function to conveniently set task debug names from a distributed_graph_generator_test_context.

fknorr added a commit that referenced this pull request Nov 29, 2023
@fknorr fknorr force-pushed the access-pattern-diagnostics branch 3 times, most recently from 3f4a28d to b7b6812 Compare December 4, 2023 13:42
Copy link
Member

@psalz psalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Muy bien!

* Stringify the default kernel name to "" instead of "unnamed_kernel"
* Kernel names from `<class KernelName>` are simplified before being
  inserted into `struct task`
* Move Type name demangling to utils.cc
* Add print_task_debug_label to show task type and optional label in
  debug messages
task_manager and distributed_graph_generator by default throw when
detecting an uninitialized read or overlapping writes in order to catch
invalid test code or internal bugs. In `runtime`, these conditions are
reported through warning / error logs instead.
@fknorr fknorr force-pushed the access-pattern-diagnostics branch from b7b6812 to 4b73a23 Compare December 4, 2023 15:53
@fknorr fknorr merged commit f12ddf7 into master Dec 5, 2023
28 checks passed
fknorr added a commit that referenced this pull request Dec 5, 2023
@fknorr fknorr deleted the access-pattern-diagnostics branch January 25, 2024 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants