Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unity device mapping algorithm #1459

Draft
wants to merge 29 commits into
base: repo-refactor
Choose a base branch
from
Draft

Conversation

wmdi
Copy link
Collaborator

@wmdi wmdi commented Aug 7, 2024

Description of changes:
This PR recovers the device mapping algorithm (the DP-based algorithm) used in Unity.

Related Issues:

Linked Issues:

  • Issue #

Issues closed by this PR:

  • Closes #

This change is Reviewable

@wmdi wmdi requested a review from lockshaw August 7, 2024 20:06
Copy link

codecov bot commented Aug 7, 2024

Codecov Report

Attention: Patch coverage is 90.56604% with 65 lines in your changes missing coverage. Please review.

Project coverage is 69.84%. Comparing base (2b4106f) to head (89ed108).

Files with missing lines Patch % Lines
lib/compiler/src/unity_algorithm.cc 0.00% 33 Missing ⚠️
lib/compiler/src/graph_optimize_state.cc 53.19% 22 Missing ⚠️
...ler/machine_mapping/get_optimal_machine_mapping.cc 96.91% 5 Missing ⚠️
...compiler/machine_mapping/split_sp_decomposition.cc 78.26% 5 Missing ⚠️
Additional details and impacted files
@@                Coverage Diff                @@
##           repo-refactor    #1459      +/-   ##
=================================================
+ Coverage          68.49%   69.84%   +1.34%     
=================================================
  Files                612      624      +12     
  Lines              18120    18708     +588     
  Branches             519      541      +22     
=================================================
+ Hits               12412    13067     +655     
+ Misses              5708     5641      -67     
Flag Coverage Δ
unittests 69.84% <90.56%> (+1.34%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
lib/compiler/include/compiler/cost_estimator.h 100.00% <ø> (ø)
...b/compiler/include/compiler/graph_optimize_state.h 100.00% <100.00%> (ø)
...e/compiler/machine_mapping/machine_mapping_cache.h 100.00% <100.00%> (ø)
.../machine_mapping/get_allowed_machine_views_list.cc 100.00% <100.00%> (ø)
...er/src/compiler/machine_mapping/machine_mapping.cc 100.00% <100.00%> (ø)
.../compiler/machine_mapping/machine_mapping_cache.cc 100.00% <100.00%> (ø)
...compiler/machine_mapping/machine_mapping_result.cc 100.00% <100.00%> (ø)
...compiler/machine_mapping/cost_estimator_for_test.h 100.00% <100.00%> (ø)
...ler/machine_mapping/get_optimal_machine_mapping.cc 100.00% <100.00%> (ø)
...st/src/compiler/machine_mapping/machine_mapping.cc 100.00% <100.00%> (ø)
... and 7 more

... and 7 files with indirect coverage changes

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add compiler-tests to CI and proj?

Also, overall this code could use a bit of simplifying and readability-improving. For a lot of the cases where you're unwrapping types consider just defining equivalent functions for those types, which would help simplify things. Overall, more small functions with better names would probably help over the current large functions which can be a bit hard to actually read through

Reviewed 7 of 15 files at r1, 8 of 8 files at r2, all commit messages.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @wmdi)


lib/compiler/include/compiler/graph_optimize_result.struct.toml line 7 at r2 (raw file):

includes = [ 
  "compiler/machine_mapping.h"
]

Suggestion:

includes = [
  "compiler/machine_mapping.dtg.h",
  "pcg/parallel_computation_graph/parallel_computation_graph.dtg.h",
]

lib/compiler/include/compiler/graph_optimize_state.h line 13 at r2 (raw file):

  float runtime;

  bool operator==(GraphOptimizeState const &other) const;

GraphOptimizeState::operator== should probably have tests


lib/compiler/include/compiler/unity_algorithm.h line 13 at r2 (raw file):

namespace FlexFlow {

struct OptimizerConfig {

Move over to dtgen so it's printable, equality-checkable, hashable, etc.?


lib/compiler/src/graph_optimize_state.cc line 6 at r2 (raw file):

bool GraphOptimizeState::operator==(GraphOptimizeState const &other) const {
  auto layers1 = topological_ordering(graph_optimize_result.pcg);

I'm not sure topological_ordering is the best thing to rely on here. This seems like a graph-level comparison function, so it might be better to figure out what the semantics are and then it can get added to graph so you only have to use it here?


lib/compiler/src/graph_optimize_state.cc line 30 at r2 (raw file):

  return !(*this == other);
}

Where is operator<? I think it was declared?


lib/compiler/src/machine_mapping.cc line 159 at r2 (raw file):

        cached_subgraph_costs(cached_subgraph_costs) {}

  ParallelComputationGraph pcg;

Why move the PCG from OptimalCostFunctor to MachineMappingSearcher?


lib/compiler/src/machine_mapping.cc line 166 at r2 (raw file):

  OptimalCostCache &cached_subgraph_costs;

  struct OptimalCostFunctor {

Why do we have this separate OptimalCostFunctor?


lib/compiler/src/machine_mapping.cc line 219 at r2 (raw file):

    std::unordered_set<DataflowOutput> split_outputs;
    for (auto const &[value, _] :

Might be simpler with transform? Also the assert isn't really necessary, get will already throw if you perform an invalid access


lib/compiler/src/machine_mapping.cc line 287 at r2 (raw file):

          optimal_result,
          OptimalCostResult::parallel_combine(
              std::visit(OptimalCostFunctor(

Why not just pull out a separate function that does std::visit(OptimalCostFunctor(...)) rather than repeat it each time?


lib/compiler/src/machine_mapping.cc line 320 at r2 (raw file):

           enumerate_machine_views({node}, resource)) {
        MachineMapping mv_map{{{node, node_machine_views.at(node)}}};
        std::unordered_map<OpenDataflowValue, MachineView> machine_views =

Might be cleaner using merge_maps and generate_map?


lib/compiler/src/machine_mapping.cc line 334 at r2 (raw file):

  std::vector<std::unordered_map<Node, MachineView>>
      enumerate_machine_views(std::unordered_set<Node> const &nodes,

Not entirely clear what these do? Can you rename them to something a bit more descriptive?


lib/compiler/test/src/test_optimal_cost.cc line 9 at r2 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("optimal_cost_0") {

Suggestion:

  TEST_CASE("optimal_cost does not crash on minimal inputs") {

lib/substitutions/include/substitutions/sub_parallel_computation_graph.h line 25 at r2 (raw file):

SubParallelComputationGraph
    sub_pcg_from_partial_pcg(ParallelComputationGraph const &,

Is this just get_subgraph for PCGs?


lib/substitutions/src/substitutions/sub_parallel_computation_graph.cc line 61 at r2 (raw file):

                             std::unordered_set<Node> const &nodes) {
  auto as_open = view_as_labelled_open_dataflow_graph(pcg.raw_graph);
  OpenDataflowSubgraphResult subgraph_result = get_subgraph(as_open, nodes);

Probably better to just add get_subgraph for LabelledOpenDataflowGraph and then use it here

Copy link
Collaborator Author

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @lockshaw)


lib/compiler/include/compiler/graph_optimize_result.struct.toml line 7 at r2 (raw file):

includes = [ 
  "compiler/machine_mapping.h"
]

Done.


lib/compiler/include/compiler/graph_optimize_state.h line 13 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

GraphOptimizeState::operator== should probably have tests

Done.


lib/compiler/include/compiler/unity_algorithm.h line 13 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Move over to dtgen so it's printable, equality-checkable, hashable, etc.?

Done.


lib/compiler/src/graph_optimize_state.cc line 6 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I'm not sure topological_ordering is the best thing to rely on here. This seems like a graph-level comparison function, so it might be better to figure out what the semantics are and then it can get added to graph so you only have to use it here?

I think I require homomorphism assuming inputs are ordered.


lib/compiler/src/machine_mapping.cc line 159 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why move the PCG from OptimalCostFunctor to MachineMappingSearcher?

This is to avoid transferring variables that are not changed during DP, i.e., they are 'constants' to the DP, instead of states.


lib/compiler/src/machine_mapping.cc line 166 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why do we have this separate OptimalCostFunctor?

To separate DP states with DP constants (environments).


lib/compiler/src/machine_mapping.cc line 219 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Might be simpler with transform? Also the assert isn't really necessary, get will already throw if you perform an invalid access

I do not find methods to get the set of left/right values of a bidict


lib/compiler/src/machine_mapping.cc line 287 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why not just pull out a separate function that does std::visit(OptimalCostFunctor(...)) rather than repeat it each time?

Done.


lib/compiler/src/machine_mapping.cc line 320 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Might be cleaner using merge_maps and generate_map?

We do not have the utils required for bidict


lib/compiler/src/machine_mapping.cc line 334 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Not entirely clear what these do? Can you rename them to something a bit more descriptive?

Done.


lib/compiler/test/src/test_optimal_cost.cc line 9 at r2 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("optimal_cost_0") {

Done.


lib/substitutions/include/substitutions/sub_parallel_computation_graph.h line 25 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Is this just get_subgraph for PCGs?

Yes

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 9 files at r3, all commit messages.
Reviewable status: 15 of 17 files reviewed, 9 unresolved discussions (waiting on @wmdi)


lib/compiler/include/compiler/optimal_cost_state.struct.toml line 20 at r3 (raw file):

  "utils/hash/unordered_map.h",
  "utils/fmt/unordered_map.h", 
]

Suggestion:

includes = [
  "utils/graph/serial_parallel/serial_parallel_decomposition.dtg.h",
  "pcg/machine_specification.dtg.h",
  "pcg/machine_view.dtg.h",
  "utils/graph/open_dataflow_graph/open_dataflow_edge.dtg.h",
  "utils/graph/open_dataflow_graph/open_dataflow_value.dtg.h",
]

src_includes = [
  "utils/hash/unordered_map.h",
  "utils/fmt/unordered_map.h",
]

lib/compiler/src/graph_optimize_state.cc line 6 at r2 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

I think I require homomorphism assuming inputs are ordered.

Using a hack like this for now is fine, but eventually it would be good to move over to a more principled algorithm implemented in utils/graph that doesn't make correctness tradeoffs and more explicitly uses the properties of DataflowGraph that you're relying on for efficiency. Until that's implemented let's add a comment in the code that clarifies that this is a hack and links to an issue with the plan for moving this over to something more correct


lib/compiler/src/machine_mapping.cc line 159 at r2 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

This is to avoid transferring variables that are not changed during DP, i.e., they are 'constants' to the DP, instead of states.

As discussed in the Wednesday meeting, it should be possible to remove OptimalCostFunctor, especially as we're now on C++17 and can use overload

There seem to be two potential designs (either (1) making optimal_cost a method on MachineMappingSearcher, or (2) making MachineMappingSearcher just a wrapper for the global state (probably renamed to something like DPContext) and passing that around between top-level functions as a mutable reference, i.e., essentially doing the passing of this manually)


lib/compiler/src/machine_mapping.cc line 219 at r2 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

I do not find methods to get the set of left/right values of a bidict

See keys in utils/containers/keys.h


lib/compiler/src/machine_mapping.cc line 320 at r2 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

We do not have the utils required for bidict

Not exactly sure the details of simplifying this, but making it more clear what this is doing would be nice (maybe using functions from utils/containers or pulling out some pieces into separate named lambdas)


lib/compiler/test/src/test_graph_optimize_state.cc line 8 at r3 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("graph_optimize_state:equality") {

Suggestion:

  TEST_CASE("GraphOptimizeState::operator==") {

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 9 files at r3.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @wmdi)


lib/compiler/src/graph_optimize_state.cc line 57 at r3 (raw file):

namespace std {

size_t hash<::FlexFlow::GraphOptimizeState>::operator()(

See https://reviewable.io/reviews/flexflow/FlexFlow/1459#-O3kIAym5Yc4ECMISkBW. Eventually it might be good to use a proper graph hash like https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.graph_hashing.weisfeiler_lehman_graph_hash.html#networkx.algorithms.graph_hashing.weisfeiler_lehman_graph_hash


lib/compiler/test/src/test_graph_optimize_state.cc line 44 at r3 (raw file):

                                                  "dense1");

    ParallelComputationGraph pcg = builder.pcg;

Suggestion:

    ParallelComputationGraph pcg1 = [&] {
      ParallelComputationGraphBuilder builder;
      
      
      ParallelTensorShape input_shape =
        ParallelTensorShape{ParallelTensorDims{
                                FFOrdered<ShardParallelDim>{
                                    ShardParallelDim{32, 2},
                                    ShardParallelDim{16, 1},
                                },
                                ReplicaParallelDimSet{
                                    SumDegree{1},
                                    DiscardCopyDegree{1},
                                },
                            },
                            DataType::FLOAT};

      parallel_tensor_guid_t input0 =
        builder.create_input_tensor(input_shape);
      parallel_tensor_guid_t dense0 = builder.dense(input0,
                                                  8);

      parallel_tensor_guid_t dense1 = builder.dense(dense0,
                                                  4);
      return builder.pcg
    }();

lib/compiler/test/src/test_graph_optimize_state.cc line 67 at r3 (raw file):

                                                   "dense0");

    ParallelComputationGraph pcg_ = builder.pcg;

Suggestion:

    ParallelComputationGraph pcg2 = [&] {
      ParallelComputationGraphBuilder builder;  
      
      parallel_tensor_guid_t input0 =
        builder.create_input_tensor(input_shape);
      parallel_tensor_guid_t dense0 = builder.dense(input0,
                                                   8);
                                                   
      return builder.pcg;                          
    }();

lib/compiler/test/src/test_graph_optimize_state.cc line 69 at r3 (raw file):

    ParallelComputationGraph pcg_ = builder.pcg;

    CHECK(GraphOptimizeState(GraphOptimizeResult(pcg, empty_machine_mapping), 0) !=

This doesn't really sufficiently test GraphOptimizeState::operator==--from this I'm not sure (1) what role the machine mappings and the cost play in equality, and (2) whether any graphs compare equal at all, or whether all of them just return not equal. A few more test/sub-cases are needed


lib/compiler/test/src/test_graph_optimize_state.cc line 70 at r3 (raw file):

    CHECK(GraphOptimizeState(GraphOptimizeResult(pcg, empty_machine_mapping), 0) !=
          GraphOptimizeState(GraphOptimizeResult(pcg_, empty_machine_mapping), 0));

Suggestion:

    GraphOptimizeState state1 = GraphOptimizeState{
      GraphOptimizeResult{pcg, empty_machine_mapping},
      0,  
    };
    
    GraphOptimzieState state2 = GraphOptimizeState{
      GraphOptimizeResult{pcg, empty_machine_mapping},
      0,
    };
    
    CHECK(state1 != state2);

lib/substitutions/include/substitutions/sub_parallel_computation_graph.h line 25 at r2 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

Yes

In that case let's rename it to get_pcg_subgraph

Copy link
Collaborator Author

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 41 files reviewed, 13 unresolved discussions (waiting on @lockshaw)


lib/compiler/src/graph_optimize_state.cc line 6 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Using a hack like this for now is fine, but eventually it would be good to move over to a more principled algorithm implemented in utils/graph that doesn't make correctness tradeoffs and more explicitly uses the properties of DataflowGraph that you're relying on for efficiency. Until that's implemented let's add a comment in the code that clarifies that this is a hack and links to an issue with the plan for moving this over to something more correct

Done.


lib/compiler/src/machine_mapping.cc line 159 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

As discussed in the Wednesday meeting, it should be possible to remove OptimalCostFunctor, especially as we're now on C++17 and can use overload

There seem to be two potential designs (either (1) making optimal_cost a method on MachineMappingSearcher, or (2) making MachineMappingSearcher just a wrapper for the global state (probably renamed to something like DPContext) and passing that around between top-level functions as a mutable reference, i.e., essentially doing the passing of this manually)

Done.


lib/compiler/src/machine_mapping.cc line 219 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

See keys in utils/containers/keys.h

Done.


lib/compiler/src/machine_mapping.cc line 320 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Not exactly sure the details of simplifying this, but making it more clear what this is doing would be nice (maybe using functions from utils/containers or pulling out some pieces into separate named lambdas)

Done.


lib/compiler/test/src/test_graph_optimize_state.cc line 8 at r3 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("graph_optimize_state:equality") {

Done.


lib/compiler/test/src/test_graph_optimize_state.cc line 44 at r3 (raw file):

                                                  "dense1");

    ParallelComputationGraph pcg = builder.pcg;

Done.


lib/compiler/test/src/test_graph_optimize_state.cc line 67 at r3 (raw file):

                                                   "dense0");

    ParallelComputationGraph pcg_ = builder.pcg;

Done.


lib/substitutions/include/substitutions/sub_parallel_computation_graph.h line 25 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

In that case let's rename it to get_pcg_subgraph

Done.


lib/compiler/include/compiler/optimal_cost_state.struct.toml line 20 at r3 (raw file):

  "utils/hash/unordered_map.h",
  "utils/fmt/unordered_map.h", 
]

Done.

Copy link
Collaborator Author

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 41 files reviewed, 13 unresolved discussions (waiting on @lockshaw)


lib/compiler/src/graph_optimize_state.cc line 57 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

See https://reviewable.io/reviews/flexflow/FlexFlow/1459#-O3kIAym5Yc4ECMISkBW. Eventually it might be good to use a proper graph hash like https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.graph_hashing.weisfeiler_lehman_graph_hash.html#networkx.algorithms.graph_hashing.weisfeiler_lehman_graph_hash

Done.


lib/compiler/test/src/test_graph_optimize_state.cc line 69 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

This doesn't really sufficiently test GraphOptimizeState::operator==--from this I'm not sure (1) what role the machine mappings and the cost play in equality, and (2) whether any graphs compare equal at all, or whether all of them just return not equal. A few more test/sub-cases are needed

This will be fixed in another PR.


lib/compiler/test/src/test_graph_optimize_state.cc line 70 at r3 (raw file):

    CHECK(GraphOptimizeState(GraphOptimizeResult(pcg, empty_machine_mapping), 0) !=
          GraphOptimizeState(GraphOptimizeResult(pcg_, empty_machine_mapping), 0));

This will be fixed in another PR.


lib/substitutions/src/substitutions/sub_parallel_computation_graph.cc line 61 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Probably better to just add get_subgraph for LabelledOpenDataflowGraph and then use it here

Done.

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 27 of 37 files at r4, 6 of 7 files at r5, 1 of 2 files at r6, all commit messages.
Reviewable status: 38 of 41 files reviewed, 31 unresolved discussions (waiting on @wmdi)


lib/compiler/include/compiler/machine_mapping/get_optimal_machine_mapping.h line 6 at r6 (raw file):

#include "machine_mapping.h"
#include "machine_mapping_cache.h"
#include "machine_mapping_context.dtg.h"

Suggestion:

#include "compiler/machine_mapping/machine_mapping.h"
#include "compiler/machine_mapping/machine_mapping_cache.h"
#include "compiler/machine_mapping/machine_mapping_context.dtg.h"

lib/compiler/include/compiler/machine_mapping/get_optimal_machine_mapping.h line 16 at r6 (raw file):

    ParallelComputationGraph const &pcg,
    std::function<std::unordered_set<MachineView>(
        ParallelLayerAttrs const &, MachineSpecification const &)> const

This isn't going to allow access to the input and output tensor shapes of the operator, which are probably necessary?

Code quote:

        ParallelLayerAttrs const &, M

lib/compiler/include/compiler/machine_mapping/get_optimal_machine_mapping.h line 23 at r6 (raw file):

MachineMappingResult
    get_optimal_machine_mapping_internal(MachineMappingContext &context,

What is this overload for? Why does it not take a SerialParallelDecomposition?


lib/compiler/include/compiler/machine_mapping/machine_mapping.h line 4 at r6 (raw file):

#define _FLEXFLOW_COMPILER_MACHINE_MAPPING_H

#include "machine_mapping.dtg.h"

Suggestion:

#include "compiler/machine_mapping/machine_mapping.dtg.h"

lib/compiler/include/compiler/machine_mapping/machine_mapping.h line 8 at r6 (raw file):

namespace FlexFlow {

MachineMapping combine(MachineMapping const &, MachineMapping const &);

Suggestion:

MachineMapping combine_disjoint_mappings(MachineMapping const &, MachineMapping const &);

lib/compiler/include/compiler/machine_mapping/machine_mapping_cache.h line 10 at r6 (raw file):

namespace FlexFlow {

class MachineMappingCache {

Does it make sense to have a separate class/object for this, or this not just the std::unordered_map API?


lib/compiler/include/compiler/machine_mapping/machine_mapping_context.struct.toml line 7 at r5 (raw file):

includes = [
  "machine_mapping.dtg.h",

Suggestion:

  "compiler/machine_mapping/machine_mapping.dtg.h",

lib/compiler/include/compiler/machine_mapping/machine_mapping_context.struct.toml line 29 at r6 (raw file):

[[fields]]
name = "cached_subgraph_results"
type = "::FlexFlow::MachineMappingCache"

What about moving MachineMappingCache out of this class so MachineMappingContext can be passed as const & and MachineMappingCache alone gets passed as mutable &?


lib/compiler/include/compiler/machine_mapping/machine_mapping_result.h line 14 at r6 (raw file):

MachineMappingResult get_infinity_machine_mapping_result();

void minimize_runtime(MachineMappingResult &m1, MachineMappingResult const &m2);

Why this vs just calling min or something?

Code quote:

void minimize_runtime(MachineMappingResult &m1, MachineMappingResult const &m2);

lib/compiler/include/compiler/machine_mapping/machine_mapping_result.struct.toml line 9 at r5 (raw file):

includes = [
  "machine_mapping.dtg.h",

Suggestion:

  "compiler/machine_mapping/machine_mapping.dtg.h",

lib/compiler/src/unity_algorithm.cc line 39 at r6 (raw file):

  MachineMappingCache cached_subgraph_costs;
  DeduplicatedPriorityQueue<GraphOptimizeState> candidates;

Would it be cleaner here to, rather than having a custom GraphOptimizeState type whose hash and operator== ignore the runtime cost, instead have this be a DeduplicatedPriorityQueue<SomeWrapperAroundAPCG, CustomComparator> where the CustomComparator is something like below, where MyCustomType is analagous to SomeWrapperAroundAPCG, and I suppose instead of a std::unordered_map<MyCustomType, float> you'd have a MachineMappingCache

Code snippet:

#include <map>
#include <queue>
#include <unordered_map>
#include <stdlib.h>
#include <iostream>

int guid_ctr = 0; 

struct MyCustomType {
  MyCustomType()
    : guid(guid_ctr++) {}

  bool operator==(MyCustomType const &other) const {
    return this->guid == other.guid;
  }

  bool operator!=(MyCustomType const &other) const {
    return this->guid != other.guid;
  }

  int guid;
};

namespace std {

template <>
struct hash<MyCustomType> {
  size_t operator()(MyCustomType const &x) const {
    return x.guid;
  }
};

}

float get_cost(MyCustomType const &) {
  return rand();
}

int main() {
  std::unordered_map<MyCustomType, float> costs;

  auto cached_get_cost = [&](MyCustomType const &x) -> float {
    if (costs.find(x) == costs.end()) {
      costs.insert({x, get_cost(x)});
    }

    return costs.at(x);
  };

  auto my_custom_comparator = [&](MyCustomType const &lhs, MyCustomType const &rhs) -> bool {
    return cached_get_cost(lhs) < cached_get_cost(rhs);
  };

  std::priority_queue<MyCustomType, std::vector<MyCustomType>, decltype(my_custom_comparator)> q{my_custom_comparator};

  for (int i = 0; i < 10; i++) {
    q.push(MyCustomType{});
  }

  while (!q.empty()) {
    MyCustomType curr = q.top();
    q.pop();

    std::cout << cached_get_cost(curr) << std::endl;
  }
}

lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 81 at r6 (raw file):

  return enumeration;
}

Move to a separate allowed_machine_mappings file, this file is already getting rather complicated/large

Code quote:

std::vector<std::unordered_map<Node, MachineView>>
    allowed_machine_mappings(MachineMappingContext const &context,
                             std::unordered_set<Node> const &nodes,
                             MachineSpecification const &resource) {
  if (nodes.empty()) {
    return {{}};
  }
  Node node = get_first(nodes);
  std::vector<std::unordered_map<Node, MachineView>> partial_enumeration =
      allowed_machine_mappings(context, set_minus(nodes, {node}), resource);
  std::unordered_set<MachineView> allowed_machine_views_for_node =
      context.allowed_machine_views(context.pcg.raw_graph.at(node), resource);
  std::vector<std::unordered_map<Node, MachineView>> enumeration;
  for (MachineView const &mv : allowed_machine_views_for_node) {
    for (std::unordered_map<Node, MachineView> const &partial :
         partial_enumeration) {
      enumeration.push_back(merge_maps(
          partial, std::unordered_map<Node, MachineView>{{node, mv}}));
    }
  }
  return enumeration;
}

std::vector<std::unordered_map<DataflowOutput, MachineView>>
    allowed_machine_mappings(MachineMappingContext const &context,
                             std::unordered_set<DataflowOutput> const &values,
                             MachineSpecification const &resource) {
  std::unordered_set<Node> nodes;
  for (DataflowOutput const &v : values) {
    nodes.insert(v.node);
  }

  std::vector<std::unordered_map<Node, MachineView>> node_enumeration =
      allowed_machine_mappings(context, nodes, resource);
  std::vector<std::unordered_map<DataflowOutput, MachineView>> enumeration;

  for (std::unordered_map<Node, MachineView> _node_enumeration :
       node_enumeration) {
    std::unordered_map<DataflowOutput, MachineView> _emumeration;
    for (DataflowOutput const &v : values) {
      _emumeration.emplace(v, _node_enumeration.at(v.node));
    }
    enumeration.push_back(_emumeration);
  }

  return enumeration;
}

lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 187 at r6 (raw file):

  GraphSplit graph_split = get_graph_split(decompn1, decompn2);

  OpenDataflowSubgraphResult subgraph_res1 = get_subgraph(

Is there really a point to the subgraphs at this point, or can we just rely on the SP decomposition for this information? I guess it's that the graph cost is cached, and if you cached it for the SP decomposition then you wouldn't be agnostic to different locations of the same graph?


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 192 at r6 (raw file):

      sub_pcg_from_full_pcg(context.pcg).raw_graph, graph_split.second);

  std::unordered_set<DataflowOutput> split_outputs = transform(

FYI you can use parallel_tensor_guid_t for this once #1471 is merged

Code quote:

DataflowOutput

lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 199 at r6 (raw file):

           &split_machine_views :
       allowed_machine_mappings(context, split_outputs, resource)) {
    std::unordered_map<OpenDataflowValue, MachineView> fixed_machine_views1 =

FYI you can use open_parallel_tensor_guid_t for this once #1471 is merged

Code quote:

OpenDataflowValue

lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 206 at r6 (raw file):

                      get_open_dataflow_values(subgraph_res2.graph));

    for (auto const &[split_value, split_input] :

Suggestion:

    for (auto const &[full_graph_value, subgraph_input] :

lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 283 at r6 (raw file):

    assert(contains(
        context.allowed_machine_views(context.pcg.raw_graph.at(node), resource),
        fixed_machine_views.at(any_output)));

Simplify

Code quote:

    assert(contains(
        context.allowed_machine_views(context.pcg.raw_graph.at(node), resource),
        fixed_machine_views.at(any_output)));

lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 302 at r6 (raw file):

                                   return OpenDataflowValue(o);
                                 }),
                       [&](OpenDataflowValue const &o) { return mv; });

Simplify, probably separate into two expressions

Code quote:

      std::unordered_map<OpenDataflowValue, MachineView> output_mv_map =
          generate_map(transform(get_outputs(context.pcg.raw_graph, node),
                                 [](DataflowOutput const &o) {
                                   return OpenDataflowValue(o);
                                 }),
                       [&](OpenDataflowValue const &o) { return mv; });

lib/compiler/src/compiler/machine_mapping/machine_mapping_cache.cc line 10 at r6 (raw file):

  if (contains_key(cache, state)) {
    MachineMappingResult result = cache.at(state);
    return std::make_optional(result);

Suggestion:

    return result;

lib/compiler/test/src/compiler/machine_mapping/test_get_optimal_machine_mapping.cc line 100 at r6 (raw file):

    }();

    ParallelComputationGraph pcg_non_sp = [&] {

These should be in the SUBCASEs as they're not shared between them


lib/compiler/test/src/compiler/machine_mapping/test_get_optimal_machine_mapping.cc line 146 at r6 (raw file):

      CHECK(bool(result.runtime > 0));
      // TODO(@Mengdi Wu): fill it with actual cost
      // CHECK(result.runtime == xx);

Rather than running it, getting a cost, and then just hardcoding that into the file, I'd focus on generating testcases (such as restricting the allowed machine views to only a few, controlling the cost function, etc.) where you can actually set analytically derived values


lib/compiler/test/src/compiler/machine_mapping/test_get_optimal_machine_mapping.cc line 171 at r6 (raw file):

    }

    SUBCASE("PCG is not sp-izable due to multiple inputs") {

You'd want separate testcases for this too that actually check the graph manipulation, not just get_optimal_machine_mapping-level tests


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping.cc line 0 at r6 (raw file):
Rename machine_mapping.cc


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping.cc line 18 at r6 (raw file):

    MachineMapping result = combine(machine_mapping_0, machine_mapping_1);
    CHECK(result == combined);
  }

Suggestion:

  TEST_CASE("combine(MachineMapping, MachineMapping)") {
    MachineView machine_view_0 = make_1d_machine_view(gpu_id_t(0), gpu_id_t(1));
    MachineView machine_view_1 = make_1d_machine_view(gpu_id_t(0), gpu_id_t(2));
    
    MachineMapping machine_mapping_0 = MachineMapping{{
      {Node(0), machine_view_0},
    }};
    MachineMapping machine_mapping_1 = MachineMapping{{
      {Node(1), machine_view_1},
    }};
    
    MachineMapping correct = MachineMapping{{
        {Node(0), machine_view_0}, 
        {Node(1), machine_view_1},
    }};
    
    MachineMapping result = combine(machine_mapping_0, machine_mapping_1);
    
    CHECK(result == correct);
  }

lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping.cc line 20 at r6 (raw file):

  }

  TEST_CASE("nodes_are_disjoint") {

Suggestion:

  TEST_CASE("nodes_are_disjoint(MachineMapping, MachineMapping)") {

lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping.cc line 28 at r6 (raw file):

        {{Node(0), machine_view_0}, {Node(1), machine_view_1}});
    CHECK(nodes_are_disjoint(machine_mapping_0, machine_mapping_1));
    CHECK_FALSE(nodes_are_disjoint(machine_mapping_0, combined));

Suggestion:

    MachineMapping machine_mapping_0({{Node(0), machine_view_0}});
    
    SUBCASE("nodes are disjoint") {
      MachineMapping machine_mapping_1({{Node(1), machine_view_1}});
      
      bool correct = true;
      bool result = nodes_are_disjoint(machine_mapping_0, machine_mapping_1);
      
      CHECK(result == correct);
    }
    
    SUBCASE("nodes are not disjoint") {
      MachineMapping machine_mapping_1(
        {{Node(0), machine_view_0}, {Node(1), machine_view_1}});
        
      bool correct = false;
      bool result = nodes_are_disjoint(machine_mapping_0, machine_mapping_1);
      
      CHECK(result == correct);
    }

lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping_cache.cc line 11 at r6 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("machine_mapping_cache") {

Suggestion:

  TEST_CASE("MachineMappingCache") {

lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping_result.cc line 77 at r6 (raw file):

    CHECK(s1.runtime < inf.runtime);
    CHECK(s2.runtime < inf.runtime);
  }

Delete

Code quote:

  TEST_CASE("get_infinity_machine_mapping_result") {
    MachineView machine_view_0 = make_1d_machine_view(gpu_id_t(0), gpu_id_t(1));
    MachineView machine_view_1 = make_1d_machine_view(gpu_id_t(0), gpu_id_t(2));
    MachineMapping machine_mapping_empty(
        std::unordered_map<Node, MachineView>{});
    MachineMapping machine_mapping_0({{Node(0), machine_view_0}});
    MachineMapping machine_mapping_1({{Node(1), machine_view_1}});
    MachineMapping combined(
        {{Node(0), machine_view_0}, {Node(1), machine_view_1}});
    MachineMappingResult s0(0, machine_mapping_empty);
    MachineMappingResult s1(1, machine_mapping_0);
    MachineMappingResult s2(2, machine_mapping_1);

    MachineMappingResult inf = get_infinity_machine_mapping_result();
    CHECK(s0.runtime < inf.runtime);
    CHECK(s1.runtime < inf.runtime);
    CHECK(s2.runtime < inf.runtime);
  }

lib/substitutions/include/substitutions/sub_parallel_computation_graph.h line 25 at r6 (raw file):

SubParallelComputationGraph get_pcg_subgraph(ParallelComputationGraph const &,
                                             std::unordered_set<Node> const &);

Suggestion:

                                             std::unordered_set<parallel_layer_guid_t> const &);

lib/compiler/include/compiler/machine_mapping/machine_mapping.struct.toml line 17 at r6 (raw file):

  "utils/hash/unordered_map.h",
  "utils/fmt/unordered_map.h", 
]

Suggestion:

includes = [
  "utils/graph/node/node.dtg.h",
  "pcg/machine_view.dtg.h",
]  
 
src_includes = [   
  "utils/hash/unordered_map.h",
  "utils/fmt/unordered_map.h",
]

lib/compiler/include/compiler/machine_mapping/machine_mapping.struct.toml line 21 at r6 (raw file):

[[fields]]
name = "machine_views"
type = "std::unordered_map<::FlexFlow::Node, ::FlexFlow::MachineView>"

Suggestion:

type = "std::unordered_map<::FlexFlow::parallel_layer_guid_t, ::FlexFlow::MachineView>"

Copy link
Collaborator Author

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 38 of 41 files reviewed, 31 unresolved discussions (waiting on @lockshaw)


lib/compiler/include/compiler/machine_mapping/get_optimal_machine_mapping.h line 6 at r6 (raw file):

#include "machine_mapping.h"
#include "machine_mapping_cache.h"
#include "machine_mapping_context.dtg.h"

Done.


lib/compiler/include/compiler/machine_mapping/get_optimal_machine_mapping.h line 16 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

This isn't going to allow access to the input and output tensor shapes of the operator, which are probably necessary?

I see. Will fix it when merging the machine view PR.


lib/compiler/include/compiler/machine_mapping/get_optimal_machine_mapping.h line 23 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

What is this overload for? Why does it not take a SerialParallelDecomposition?

This is to further split the logics into multiple pieces. get_optimal_machine_mapping wraps global variables as MachineMappingContext and calls this function, and this function performs sp decomposition (and solves the un-sp-izable issue in a later version) and calls other overloads.


lib/compiler/include/compiler/machine_mapping/machine_mapping.h line 4 at r6 (raw file):

#define _FLEXFLOW_COMPILER_MACHINE_MAPPING_H

#include "machine_mapping.dtg.h"

Done.


lib/compiler/include/compiler/machine_mapping/machine_mapping.h line 8 at r6 (raw file):

namespace FlexFlow {

MachineMapping combine(MachineMapping const &, MachineMapping const &);

Done.


lib/compiler/include/compiler/machine_mapping/machine_mapping_cache.h line 10 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Does it make sense to have a separate class/object for this, or this not just the std::unordered_map API?

I think it makes sense to wrap up the save and load logic as they handle some existence cases.


lib/compiler/include/compiler/machine_mapping/machine_mapping_context.struct.toml line 7 at r5 (raw file):

includes = [
  "machine_mapping.dtg.h",

Done.


lib/compiler/include/compiler/machine_mapping/machine_mapping_result.h line 14 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why this vs just calling min or something?

This function only compares MachineMappingResults according to the runtime. It does not make sense to have an operator< for MachineMappingResult that only compares runtime, which is required by min.


lib/compiler/include/compiler/machine_mapping/machine_mapping_result.struct.toml line 9 at r5 (raw file):

includes = [
  "machine_mapping.dtg.h",

Done.


lib/compiler/src/compiler/machine_mapping/machine_mapping_cache.cc line 10 at r6 (raw file):

  if (contains_key(cache, state)) {
    MachineMappingResult result = cache.at(state);
    return std::make_optional(result);

Done.


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping.cc line 18 at r6 (raw file):

    MachineMapping result = combine(machine_mapping_0, machine_mapping_1);
    CHECK(result == combined);
  }

Done.


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping.cc line 20 at r6 (raw file):

  }

  TEST_CASE("nodes_are_disjoint") {

Done.


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping.cc line 28 at r6 (raw file):

        {{Node(0), machine_view_0}, {Node(1), machine_view_1}});
    CHECK(nodes_are_disjoint(machine_mapping_0, machine_mapping_1));
    CHECK_FALSE(nodes_are_disjoint(machine_mapping_0, combined));

Done.


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping.cc line at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Rename machine_mapping.cc

What does this mean?


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping_cache.cc line 11 at r6 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("machine_mapping_cache") {

Done.


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping_result.cc line 77 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Delete

Done.


lib/compiler/include/compiler/machine_mapping/machine_mapping.struct.toml line 17 at r6 (raw file):

  "utils/hash/unordered_map.h",
  "utils/fmt/unordered_map.h", 
]

Done.

Copy link
Collaborator Author

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 38 of 41 files reviewed, 31 unresolved discussions (waiting on @lockshaw)


lib/compiler/include/compiler/machine_mapping/machine_mapping_context.struct.toml line 29 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

What about moving MachineMappingCache out of this class so MachineMappingContext can be passed as const & and MachineMappingCache alone gets passed as mutable &?

I think MachineMappingCache should be part of the context, and it is fine to have a mutable context.


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 81 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Move to a separate allowed_machine_mappings file, this file is already getting rather complicated/large

Done.

Copy link
Collaborator Author

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 38 of 41 files reviewed, 31 unresolved discussions (waiting on @lockshaw)


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 206 at r6 (raw file):

                      get_open_dataflow_values(subgraph_res2.graph));

    for (auto const &[split_value, split_input] :

Done.


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 283 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Simplify

Done.


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 302 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Simplify, probably separate into two expressions

Done.


lib/compiler/test/src/compiler/machine_mapping/test_get_optimal_machine_mapping.cc line 100 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

These should be in the SUBCASEs as they're not shared between them

Done.


lib/compiler/test/src/compiler/machine_mapping/test_get_optimal_machine_mapping.cc line 146 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Rather than running it, getting a cost, and then just hardcoding that into the file, I'd focus on generating testcases (such as restricting the allowed machine views to only a few, controlling the cost function, etc.) where you can actually set analytically derived values

In terms of actual cost, I mean the cost model we have for a pcg. Even for analytical results, we need a precise cost model.


lib/compiler/test/src/compiler/machine_mapping/test_get_optimal_machine_mapping.cc line 171 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

You'd want separate testcases for this too that actually check the graph manipulation, not just get_optimal_machine_mapping-level tests

Done.


lib/substitutions/include/substitutions/sub_parallel_computation_graph.h line 25 at r6 (raw file):

SubParallelComputationGraph get_pcg_subgraph(ParallelComputationGraph const &,
                                             std::unordered_set<Node> const &);

Why?

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 37 files at r4, 7 of 17 files at r7, all commit messages.
Reviewable status: 33 of 43 files reviewed, 24 unresolved discussions (waiting on @wmdi)


lib/compiler/include/compiler/machine_mapping/allowed_machine_mappings.h line 12 at r7 (raw file):

    allowed_machine_mappings(MachineMappingContext const &context,
                             std::unordered_set<Node> const &nodes,
                             MachineSpecification const &resource);

Suggestion:

std::vector<std::unordered_map<parallel_layer_guid_t, MachineView>>
    allowed_machine_mappings(MachineMappingContext const &context,
                             std::unordered_set<parallel_layer_guid_t> const &nodes,
                             MachineSpecification const &resource);

lib/compiler/include/compiler/machine_mapping/allowed_machine_mappings.h line 16 at r7 (raw file):

std::vector<std::unordered_map<DataflowOutput, MachineView>>
    allowed_machine_mappings(MachineMappingContext const &context,
                             std::unordered_set<DataflowOutput> const &values,

Suggestion:

                            std::unordered_set<parallel_tensor_guid_t> const &values,

lib/compiler/test/src/test_graph_optimize_state.cc line 70 at r3 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

This will be fixed in another PR.

Isn't this a trivial fix?


lib/compiler/test/src/compiler/machine_mapping/test_get_optimal_machine_mapping.cc line 146 at r6 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

In terms of actual cost, I mean the cost model we have for a pcg. Even for analytical results, we need a precise cost model.

In a way yes, but you control the cost model: you can add a custom implementation of CostEstimator that returns whatever numbers you think would be useful for testing. We absolutely do want to (at some point) test the whole compiler+costestimator+machineviews+everything stack, but even if all of that was present we would also want to have tests that just that compiler is correct in isolation, which is what most of these tests should be--provide an arbitrary tiny set of allowed machine views (say, two of them), create a tiny PCG (with, say, 2 operators in parallel), and write a CostEstimator subclass that gives you back some chosen constants (1 for each of the possible node-machineview assignments). Then solve for what the resulting value of get_optimal_machine_mappings should be, run the function, and check that the returned value is the same as what you solved analytically.

At this point there shouldn't be anything stopping you from testing pretty much the whole Unity DP in isolation, and once everything is merged we'll add a couple end-to-end tests just to make sure there weren't any unexpected issues at the boundaries between components


lib/compiler/test/src/compiler/machine_mapping/test_get_optimal_machine_mapping.cc line 12 at r7 (raw file):

    auto allowed_machine_views1 = [&](ParallelLayerAttrs const &,
                                      MachineSpecification const &) {
      // TODO(@Mengdi Wu): Replace it with actual allowed machine views when

Ideally don't do this--these tests are supposed to confirm that get_optimal_machine_mapping is correct, not that get_allowed_machine_views is correct. If you call get_allowed_machine_views in this test and the test fails, you don't know whether it is because get_optimal_machine_mapping is wrong or if it s because get_allowed_machine_views is wrong


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping.cc line at r6 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

What does this mean?

Rename from test_machine_mapping.cc to machine_mapping.cc--the current convention is to not have the test_ prefix, as they're already under the test directory


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping.cc line 25 at r7 (raw file):

        combine_disjoint_mappings(machine_mapping_0, machine_mapping_1);
    CHECK(result == correct);
  }

Add a SUBCASE to test the failure case

Suggestion:

  TEST_CASE("combine_disjoint_mappings(MachineMapping, MachineMappping)") {
    MachineView machine_view_0 = make_1d_machine_view(gpu_id_t(0), gpu_id_t(1));
    MachineView machine_view_1 = make_1d_machine_view(gpu_id_t(0), gpu_id_t(2));
    
    SUBCASE("input machine mappings are disjoint") {
      MachineMapping machine_mapping_0 = MachineMapping({
          {Node(0), machine_view_0},
      });
      MachineMapping machine_mapping_1 = MachineMapping({
          {Node(1), machine_view_1},
      });
      MachineMapping correct = MachineMapping({
          {Node(0), machine_view_0},
          {Node(1), machine_view_1},
      });
      MachineMapping result =
          combine_disjoint_mappings(machine_mapping_0, machine_mapping_1);
      CHECK(result == correct);
    }
    
    SUBCASE("input machine mappings are not disjoint") {
      ...
    }
  }

lib/substitutions/include/substitutions/sub_parallel_computation_graph.h line 25 at r6 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

Why?

parallel_layer_guid_t is a wrapper for Node that we use for representing Nodes in a ParallelComputationGraph. It primarily improves readability and helps prevent accidentally passing the wrong node to things

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 17 files at r7.
Reviewable status: all files reviewed, 28 unresolved discussions (waiting on @wmdi)


lib/compiler/include/compiler/machine_mapping/allowed_machine_mappings.h line 10 at r7 (raw file):

std::vector<std::unordered_map<Node, MachineView>>
    allowed_machine_mappings(MachineMappingContext const &context,

The type signatures of these methods are a bit odd? At least it's not 100% clear for me what they're doing based on the names and input types, especially since there are two overloads? Can you either clarify the names a bit and/or add doxygen docstrings giving a brief description of what they do?


lib/compiler/include/compiler/machine_mapping/get_optimal_machine_mapping.h line 23 at r6 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

This is to further split the logics into multiple pieces. get_optimal_machine_mapping wraps global variables as MachineMappingContext and calls this function, and this function performs sp decomposition (and solves the un-sp-izable issue in a later version) and calls other overloads.

I think it's probably better to think of the SP decomposition as coming from outside of the optimal machine mapping algorithm, as with Pietro's stuff it at least seems more like creating an SP-ization of a given PCG is a separate step than just part of optimal machine mapping algorithm, but this isn't a bit deal and can always get refactored later. That said, it seems a bit odd to have this function, which conceptually seems to do entirely different things from the other get_optimal_machine_mapping_internal overloads, have the same name as them, so if you do prefer to use this structure it should at least be renamed to something clearer--ideally I should be able to make a pretty accurate educated guess of what a function does from its name and arguments, and I don't think that's the case here.


lib/compiler/include/compiler/machine_mapping/machine_mapping_context.struct.toml line 29 at r6 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

I think MachineMappingCache should be part of the context, and it is fine to have a mutable context.

Is there a significant downside to moving it out? It would make the code a bit more readable as it would make it more obvious what gets mutated and what doesn't


lib/compiler/src/compiler/machine_mapping/allowed_machine_mappings.cc line 9 at r7 (raw file):

namespace FlexFlow {

std::vector<std::unordered_map<Node, MachineView>>

Might be cleaner to define a machine_mapping.struct.toml wrapper type around std::unordered_map<parallel_layer_guid_t, MachineView> so this turns into the more readable std::vector<MachineMapping>? Also why is this a std::vector and not a std::unordered_set/std::set? Should there be duplicates or does order matter?


lib/compiler/src/compiler/machine_mapping/allowed_machine_mappings.cc line 14 at r7 (raw file):

                             MachineSpecification const &resource) {
  if (nodes.empty()) {
    return {{}};

Why not an empty vector as a return value? Does an empty machine mapping really have a meaning?


lib/compiler/src/compiler/machine_mapping/allowed_machine_mappings.cc line 29 at r7 (raw file):

    }
  }
  return enumeration;

Suggestion:

  Node curr_node = get_first(nodes);
  std::unordered_set<Node> other_nodes = set_minus(nodes, {node});
  
  std::vector<std::unordered_map<Node, MachineView>> other_node_mappings_from_recursion =
      allowed_machine_mappings(context, other_nodes, resource);
      
  std::unordered_set<MachineView> allowed_machine_views_for_curr_node =
      context.allowed_machine_views(context.pcg.raw_graph.at(node), resource);
      
  std::vector<std::unordered_map<Node, MachineView>> result;
  for (MachineView const &for_curr_node : allowed_machine_views_for_curr_node) {
    for (std::unordered_map<Node, MachineView> const &for_other_nodes :
         other_node_mappings_from_recursion) {
           
      MachineMapping merged = merge_disjoint_machine_mappings(
        MachineView{{
          {curr_node, for_cudd_node},
        }},
        for_other_nodes,
      });
      
      enumeration.push_back(merged);
    }
  }
  return result;

lib/compiler/src/compiler/machine_mapping/allowed_machine_mappings.cc line 39 at r7 (raw file):

  for (DataflowOutput const &v : values) {
    nodes.insert(v.node);
  }

Suggestion:

  std::unordered_set<Node> nodes transform(values, [](DataflowOutput const &v) { return v.node; });

lib/compiler/src/compiler/machine_mapping/allowed_machine_mappings.cc line 47 at r7 (raw file):

  for (std::unordered_map<Node, MachineView> _node_enumeration :
       node_enumeration) {
    std::unordered_map<DataflowOutput, MachineView> _emumeration;

Don't use leading underscore, if you need to distinguish it from enumeration it's probably better to come up with a clearer name as enumeration is a bit unclear (what is being enumerated? does this have to do with enumerate from utils/containers/enumerate.h?)

Code quote:

_emumeration;

lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 257 at r7 (raw file):

      std::unordered_map<OpenDataflowValue, MachineView> output_mv_map =
          generate_map(outputs_of_node,
                       [&](OpenDataflowValue const &o) { return mv; });

Suggestion:

                     [&](OpenDataflowValue const &) { return mv; });

lib/compiler/test/src/test_graph_optimize_state.cc line 69 at r3 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

This will be fixed in another PR.

In that case can you create an issue for that and link it here?


lib/compiler/test/src/compiler/machine_mapping/test_get_optimal_machine_mapping.cc line 171 at r6 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

Done.

I just see a comment added? Are we planning to add the multiple-input handling in this PR or in a separate one?


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping_cache.cc line 12 at r7 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("MachineMappingCache") {
    ParallelComputationGraph pcg = [&] {

Why create a PCG here? It looks like the tests don't actually need it--they just need an SPDecomposition object?


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping_cache.cc line 58 at r7 (raw file):

    MachineSpecification machine_spec(1, 1, 1, 1, 1);
    MachineMappingState state0(subgraph0, machine_spec, {});

Prefer curly-brace initialization for consistency with the rest of the FF codebase

Suggestion:

    MachineMappingState state0 = MachineMappingState{
      subgraph0, machine_spec, {}};

lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping_cache.cc line 72 at r7 (raw file):

    cache.save(state0, result0);
    CHECK(cache.load(state0).value() == result0);

Break into SUBCASEs with titles that explain what you're testing--right now it's just a sea of checks which makes this test rather difficult to understand


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping_cache.cc line 81 at r7 (raw file):

    CHECK(cache.load(state2) == std::nullopt);

    cache.save(state2, result2);

Include failure cases--what happens if I save a key twice?


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping_result.cc line 21 at r7 (raw file):

    MachineMappingResult s2(2, machine_mapping_1);

    MachineMappingResult result0 = sequential_combine(s0, s1);

Each of these checks/function evaluations should have a SUBCASE with a title explaining what it's testing/why it's there. Right now it's just a bunch of checks, which makes it difficult to understand quickly what high-level properties you're trying to verify with each. If you can't come up with a concise property that's being verified, the test might need some restructuring/redesigning

(and same with the parallel_combine testcase below, and the minimize_runtime testcase below that)

For an example of what I mean, see https://github.com/flexflow/FlexFlow/blob/add-candle-uno-pcg/lib/op-attrs/test/src/op-attrs/ops/concat.cc#L175-L312--each of the failure cases and success cases has a separate SUBCASE with a title indicating what is being checked, or for a simpler example, https://github.com/flexflow/FlexFlow/blob/add-candle-uno-pcg/lib/utils/test/src/utils/containers/try_merge_nondisjoint_unordered_maps.cc#L9-L75


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping_result.cc line 23 at r7 (raw file):

    MachineMappingResult result0 = sequential_combine(s0, s1);
    CHECK(result0.runtime == 1);
    CHECK(result0.machine_mapping == machine_mapping_0);

Suggestion:

    MachineMappingResult result0 = sequential_combine(s0, s1);
    MachineMappingResult correct = MachineMappingResult{machine_mapping_0, 1};

Copy link
Collaborator Author

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 28 unresolved discussions (waiting on @lockshaw)


lib/compiler/include/compiler/machine_mapping/allowed_machine_mappings.h line 12 at r7 (raw file):

    allowed_machine_mappings(MachineMappingContext const &context,
                             std::unordered_set<Node> const &nodes,
                             MachineSpecification const &resource);

Done.


lib/compiler/include/compiler/machine_mapping/allowed_machine_mappings.h line 16 at r7 (raw file):

std::vector<std::unordered_map<DataflowOutput, MachineView>>
    allowed_machine_mappings(MachineMappingContext const &context,
                             std::unordered_set<DataflowOutput> const &values,

Done.


lib/compiler/include/compiler/machine_mapping/get_optimal_machine_mapping.h line 23 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I think it's probably better to think of the SP decomposition as coming from outside of the optimal machine mapping algorithm, as with Pietro's stuff it at least seems more like creating an SP-ization of a given PCG is a separate step than just part of optimal machine mapping algorithm, but this isn't a bit deal and can always get refactored later. That said, it seems a bit odd to have this function, which conceptually seems to do entirely different things from the other get_optimal_machine_mapping_internal overloads, have the same name as them, so if you do prefer to use this structure it should at least be renamed to something clearer--ideally I should be able to make a pretty accurate educated guess of what a function does from its name and arguments, and I don't think that's the case here.

I see. But I don't think it is a problem to have this signature. First, it is only used internally, so it has little effect on other parts. And it actually make the internal implementation more convenient. Second, it actually has the similar functionality as other functions that have the same name but just different format of the inputs. We should name functions according to their functionality instead of the detailed implementation.


lib/compiler/include/compiler/machine_mapping/machine_mapping_context.struct.toml line 29 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Is there a significant downside to moving it out? It would make the code a bit more readable as it would make it more obvious what gets mutated and what doesn't

I feel like we should put global variables in MachineMappingContext as possible and only explicitly pass the variables that belong to DP states. I think it is more clear this way.


lib/compiler/src/unity_algorithm.cc line 39 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Would it be cleaner here to, rather than having a custom GraphOptimizeState type whose hash and operator== ignore the runtime cost, instead have this be a DeduplicatedPriorityQueue<SomeWrapperAroundAPCG, CustomComparator> where the CustomComparator is something like below, where MyCustomType is analagous to SomeWrapperAroundAPCG, and I suppose instead of a std::unordered_map<MyCustomType, float> you'd have a MachineMappingCache

Sounds a good idea and we will leave all the MCMC search part to another PR.


lib/compiler/src/compiler/machine_mapping/allowed_machine_mappings.cc line 9 at r7 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Might be cleaner to define a machine_mapping.struct.toml wrapper type around std::unordered_map<parallel_layer_guid_t, MachineView> so this turns into the more readable std::vector<MachineMapping>? Also why is this a std::vector and not a std::unordered_set/std::set? Should there be duplicates or does order matter?

It is a vector because no deduplication is required here.


lib/compiler/src/compiler/machine_mapping/allowed_machine_mappings.cc line 14 at r7 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why not an empty vector as a return value? Does an empty machine mapping really have a meaning?

It means the only machine mapping for an empty set of input nodes.


lib/compiler/src/compiler/machine_mapping/allowed_machine_mappings.cc line 29 at r7 (raw file):

    }
  }
  return enumeration;

Done.


lib/compiler/src/compiler/machine_mapping/allowed_machine_mappings.cc line 39 at r7 (raw file):

  for (DataflowOutput const &v : values) {
    nodes.insert(v.node);
  }

I don't think we should use transform since it is not a one-to-one mapping but a many-to-one mapping. We should not expect transform to handle many-to-one mapping not matter whether it actually does.


lib/compiler/src/compiler/machine_mapping/allowed_machine_mappings.cc line 47 at r7 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Don't use leading underscore, if you need to distinguish it from enumeration it's probably better to come up with a clearer name as enumeration is a bit unclear (what is being enumerated? does this have to do with enumerate from utils/containers/enumerate.h?)

Done.


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 187 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Is there really a point to the subgraphs at this point, or can we just rely on the SP decomposition for this information? I guess it's that the graph cost is cached, and if you cached it for the SP decomposition then you wouldn't be agnostic to different locations of the same graph?

Done.


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 257 at r7 (raw file):

      std::unordered_map<OpenDataflowValue, MachineView> output_mv_map =
          generate_map(outputs_of_node,
                       [&](OpenDataflowValue const &o) { return mv; });

Done.


lib/compiler/test/src/compiler/machine_mapping/test_get_optimal_machine_mapping.cc line 146 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

In a way yes, but you control the cost model: you can add a custom implementation of CostEstimator that returns whatever numbers you think would be useful for testing. We absolutely do want to (at some point) test the whole compiler+costestimator+machineviews+everything stack, but even if all of that was present we would also want to have tests that just that compiler is correct in isolation, which is what most of these tests should be--provide an arbitrary tiny set of allowed machine views (say, two of them), create a tiny PCG (with, say, 2 operators in parallel), and write a CostEstimator subclass that gives you back some chosen constants (1 for each of the possible node-machineview assignments). Then solve for what the resulting value of get_optimal_machine_mappings should be, run the function, and check that the returned value is the same as what you solved analytically.

At this point there shouldn't be anything stopping you from testing pretty much the whole Unity DP in isolation, and once everything is merged we'll add a couple end-to-end tests just to make sure there weren't any unexpected issues at the boundaries between components

Done.


lib/compiler/test/src/compiler/machine_mapping/test_get_optimal_machine_mapping.cc line 171 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I just see a comment added? Are we planning to add the multiple-input handling in this PR or in a separate one?

In a separate one.


lib/compiler/test/src/compiler/machine_mapping/test_get_optimal_machine_mapping.cc line 12 at r7 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Ideally don't do this--these tests are supposed to confirm that get_optimal_machine_mapping is correct, not that get_allowed_machine_views is correct. If you call get_allowed_machine_views in this test and the test fails, you don't know whether it is because get_optimal_machine_mapping is wrong or if it s because get_allowed_machine_views is wrong

Done.


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping.cc line at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Rename from test_machine_mapping.cc to machine_mapping.cc--the current convention is to not have the test_ prefix, as they're already under the test directory

Done.


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping.cc line 25 at r7 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Add a SUBCASE to test the failure case

Done.


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping_cache.cc line 12 at r7 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why create a PCG here? It looks like the tests don't actually need it--they just need an SPDecomposition object?

Yes, this PCG is only used to create the SPD


lib/substitutions/include/substitutions/sub_parallel_computation_graph.h line 25 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

parallel_layer_guid_t is a wrapper for Node that we use for representing Nodes in a ParallelComputationGraph. It primarily improves readability and helps prevent accidentally passing the wrong node to things

Done.


lib/compiler/include/compiler/machine_mapping/machine_mapping.struct.toml line 21 at r6 (raw file):

[[fields]]
name = "machine_views"
type = "std::unordered_map<::FlexFlow::Node, ::FlexFlow::MachineView>"

Done.

Copy link
Collaborator Author

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 24 of 45 files reviewed, 28 unresolved discussions (waiting on @lockshaw)


lib/compiler/include/compiler/machine_mapping/allowed_machine_mappings.h line 10 at r7 (raw file):

Previously, lockshaw (Colin Unger) wrote…

The type signatures of these methods are a bit odd? At least it's not 100% clear for me what they're doing based on the names and input types, especially since there are two overloads? Can you either clarify the names a bit and/or add doxygen docstrings giving a brief description of what they do?

Done.


lib/compiler/test/src/test_graph_optimize_state.cc line 69 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

In that case can you create an issue for that and link it here?

I think we will have another PR to handle the entire MCMC search part, which will clearly include the tests files.


lib/compiler/test/src/test_graph_optimize_state.cc line 70 at r3 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Isn't this a trivial fix?

I think we will have another PR to handle the entire MCMC search part, which will clearly include the tests files. Also it is very likely we will modify GraphOptimizeReulst (as you have suggested), so I prefer to put off all the MCMC tests after that.

Copy link
Collaborator Author

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 24 of 45 files reviewed, 28 unresolved discussions (waiting on @lockshaw)


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping_cache.cc line 58 at r7 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Prefer curly-brace initialization for consistency with the rest of the FF codebase

Why? Since we are moving towards dtgen style code, we should use direct initialization, right? Also, it is safer than curly-brace initialization.

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 21 of 21 files at r8, all commit messages.
Reviewable status: all files reviewed, 32 unresolved discussions (waiting on @wmdi)


lib/compiler/include/compiler/machine_mapping/get_allowed_machine_views_list.h line 11 at r8 (raw file):

std::vector<std::unordered_map<parallel_layer_guid_t, MachineView>>
    get_allowed_machine_views_list(

Isn't this just a std::vector<MachineMapping>? If so, why not make that change to the type signature and change the function name to get_allowed_machine_mappings (or get_allowed_machine_mappings_for_layers)?

Code quote:

std::vector<std::unordered_map<parallel_layer_guid_t, MachineView>>
    get_allowed_machine_views_list(

lib/compiler/include/compiler/machine_mapping/get_allowed_machine_views_list.h line 17 at r8 (raw file):

std::vector<std::unordered_map<parallel_tensor_guid_t, MachineView>>
    get_allowed_src_machine_views_list(

I'm not sure I totally get the semantics of this function. Is it intended to be used to get the machine views of the tensors crossing a split? In that case maybe the return type should be different? This operation just doesn't currently seem to make that much sense to me by itself, though maybe I just need a better explanation of what this is intended to do.

Code quote:

    get_allowed_src_machine_views_list

lib/compiler/include/compiler/machine_mapping/get_optimal_machine_mapping.h line 23 at r6 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

I see. But I don't think it is a problem to have this signature. First, it is only used internally, so it has little effect on other parts. And it actually make the internal implementation more convenient. Second, it actually has the similar functionality as other functions that have the same name but just different format of the inputs. We should name functions according to their functionality instead of the detailed implementation.

Fair enough, let's keep this for now and reevaluate if it still makes sense once approximate SP stuff is actually in place


lib/compiler/include/compiler/machine_mapping/machine_mapping_context.struct.toml line 29 at r6 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

I feel like we should put global variables in MachineMappingContext as possible and only explicitly pass the variables that belong to DP states. I think it is more clear this way.

I agree up to the point of grouping mutable and non-mutable global variables into the same type as then the mutation properties become unclear (how do I know that your function isn't also mutating the pcg?), especially since only one of the members ever mutable. This is one case where I have a strong preference that we split MachineMappingCache out so that the function signatures remain clear about what is safe from mutation


lib/compiler/src/unity_algorithm.cc line 39 at r6 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

Sounds a good idea and we will leave all the MCMC search part to another PR.

Sounds good, tracked in #1501


lib/compiler/src/compiler/machine_mapping/get_allowed_machine_views_list.cc line 44 at r8 (raw file):

    }
  }
  return result;

Might be more readable as something like this? We'd need to add a transform_tuples function or something like it that takes in a container of std::tuple<A, B, ...> and a lambda matching std::function<T(A const &, B const &, ...)> and combines the destructuring and transform operations, but I'm sure I or @Marsella8 would be happy to add this to containers as I can think of quite a few places where it would be useful. Not a high priority and the current code is fine as is, just something I was thinking of

Suggestion:

  return transform_tuples(as_vector(get_all_permutations(allowed_machine_view_for_curr_layer, other_machine_view_from_recursion)),
                          [](MachineView const &for_curr_node, MachineMapping const &for_other_layers) {
                             return add_view_for_layer_to_mapping(for_other_layers, curr_node, for_curr_node);
                          });

lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 63 at r8 (raw file):

        &machine_views) {
  // TODO: Replace it with the actual implementation.
  auto get_input_shapes = [&](parallel_layer_guid_t) {

Migrate to using #1493 once it's merged


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 73 at r8 (raw file):

  };

  assert(contains_key(machine_views, get_layer_outputs(pcg, layer)[0]));

FYI at already does this check so the separate assert isn't really necessary


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 83 at r8 (raw file):

                              layer_machine_view);
  float communication_cost = 0;
  for (parallel_tensor_guid_t const &input : get_layer_inputs(pcg, layer)) {

iirc the old Unity implementation took into account the outgoing tensor communication costs rather than the incoming tensor communication costs. As incoming/outgoing is not symmetric in DataflowGraph (a tensor can only be produced once but can be consumed many times), does this lead to any changes compared to the original unity implementation?

Also, would it be clearn/simpler to instead calculate the tensor communication cost in the split rather than in the base case?


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 84 at r8 (raw file):

  float communication_cost = 0;
  for (parallel_tensor_guid_t const &input : get_layer_inputs(pcg, layer)) {
    assert(contains_key(machine_views, input));

Might be more readable to use transform and maximum from containers.h? Up to you 🙂


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 167 at r8 (raw file):

  auto [decompn1, decompn2] = split_sp_decomposition(serial);

  GraphSplit graph_split = get_graph_split(decompn1, decompn2);

If you want you can add a wrapper PCGGraphSplit that contains parallel_layer_guid_ts rather than Node, but this is up to you--not really a right answer either way

Code quote:

  GraphSplit graph_split = get_graph_split(decompn1, decompn2);

lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 169 at r8 (raw file):

  GraphSplit graph_split = get_graph_split(decompn1, decompn2);

  auto is_subgraph_input = [&](std::unordered_set<Node> const &subgraph_nodes,

Suggestion:

std::unordered_set<parallel_layer_guid_t> const &subgraph_nodes

lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 175 at r8 (raw file):

  std::unordered_set<parallel_tensor_guid_t> all_edges1 =
      set_union(transform(graph_split.first, [&](Node const &node) {

Technically this should more all_tensors1, and it might be cleaner to add a get_tensors_produced_by_subgraph(ParallelComputationGraph const &, std::unordered_set<parallel_layer_guid_t> const &)


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 179 at r8 (raw file):

            get_layer_outputs(context.pcg, parallel_layer_guid_t(node)));
      }));
  std::unordered_set<parallel_tensor_guid_t> all_edges2 =

Technically this should more all_tensors2, and it might be cleaner to add a get_tensors_produced_by_subgraph(ParallelComputationGraph const &, std::unordered_set<parallel_layer_guid_t> const &)


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 184 at r8 (raw file):

            get_layer_inputs(context.pcg, parallel_layer_guid_t(node)));
      }));
  std::unordered_set<parallel_tensor_guid_t> split_edges =

Add a get_edges_between_subgraphs(ParallelComputationGraph const &, GraphSplit const &) and any necessary supporting functions in utils/graph


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 229 at r8 (raw file):

  std::unordered_set<parallel_tensor_guid_t> all_edges1 =
      set_union(transform(graph_split.first, [&](Node const &node) {

https://reviewable.io/reviews/flexflow/FlexFlow/1459#-O6bXnLVC5uD0a0y007y


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 234 at r8 (raw file):

      }));
  std::unordered_set<parallel_tensor_guid_t> all_edges2 =
      set_union(transform(graph_split.second, [&](Node const &node) {

https://reviewable.io/reviews/flexflow/FlexFlow/1459#-O6bXnLVC5uD0a0y007y


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 272 at r8 (raw file):

  parallel_layer_guid_t layer = parallel_layer_guid_t(node);
  std::unordered_set<parallel_tensor_guid_t> machine_views_not_fixed =

Is it possible to end up in a state where machine views aren't fixed? If all edges across the split are fixed every time we do a split, and we do splits until we're down to a single node, won't we end up with all used tensors at least being fixed?


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 290 at r8 (raw file):

    MachineMapping machine_mapping =
        MachineMapping{std::unordered_map<parallel_layer_guid_t, MachineView>{
            {layer,

Might be nice to have a make_singleton_machine_mapping helper function in machine_mapping.h?


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 291 at r8 (raw file):

        MachineMapping{std::unordered_map<parallel_layer_guid_t, MachineView>{
            {layer,
             full_machine_views.at(get_layer_outputs(context.pcg, layer)[0])},

Suggestion:

    MachineMapping machine_mapping =
        MachineMapping{{
            {layer,
             full_machine_views.at(get_layer_outputs(context.pcg, layer)[0])},

lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 291 at r8 (raw file):

        MachineMapping{std::unordered_map<parallel_layer_guid_t, MachineView>{
            {layer,
             full_machine_views.at(get_layer_outputs(context.pcg, layer)[0])},

Uh what's going on here? This makes me think we don't have the right helper functions set up, as these types of complex manipulations shouldn't really be necessary if we have the right function signatures and abstractions in place. Seems like we might need a function that takes in a std::unordered_map<parallel_tensor_guid_t, MachineView> and gives you back a MachineMapping while checking that all of the MachineViews for the parallel_tensor_guid_ts for each parallel_layer_guid_t have the same MachineView assigned, and if not throw an error? Or something like that? Conceptually I (mostly*) get the need to go from mapping parallel_tensor_guid_ts to mapping parallel_layer_guid_ts , but that feels like a rather nuanced operation (what if the parallel_tensor_guid_ts don't agree?) and so it would be better to pull that out into a separate function that can be tested and robust rather than inlining it every place we need it which makes it easy to miss/forget checks

  • I'm not actually sure we need this--if we have the full PCG in all or our splits, rather than fixing MachineViews for parallel_tensor_guid_ts, why not instead fix MachineViews for their corresponding source parallel_layer_guid_ts? Conceptually what we do is at every split we concretize a part of our solution, and our solution is a MachineMapping, so it would seem to make sense that at every split we concretize mappings for parallel_layer_guid_ts?

Code quote:

             full_machine_views.at(get_layer_outputs(context.pcg, layer)[0])},

lib/compiler/src/compiler/machine_mapping/allowed_machine_mappings.cc line 9 at r7 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

It is a vector because no deduplication is required here.

Sure, but as a consumer of this function I can't be guaranteed that there are no duplicates whereas if you return a std::unordered_set then anyone who is using this function is guaranteed this property


lib/compiler/src/compiler/machine_mapping/allowed_machine_mappings.cc line 14 at r7 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

It means the only machine mapping for an empty set of input nodes.

Fair enough, let's just make sure there's a @note in the doxygen docstring and a testcase checking this edge case behavior


lib/compiler/src/compiler/machine_mapping/allowed_machine_mappings.cc line 39 at r7 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

I don't think we should use transform since it is not a one-to-one mapping but a many-to-one mapping. We should not expect transform to handle many-to-one mapping not matter whether it actually does.

transform is already pretty broadly used in FlexFlow for this case and even has explicit support for std::unordered_sets, and haskell's Set type has the same semantics with regards to map (https://hackage.haskell.org/package/containers-0.7/docs/Data-Set.html#v:map), so I wouldn't worry about trying to maintain this property of transform--if you want a function that is guaranteed to have this property a new name/function may be necessary


lib/substitutions/src/substitutions/sub_parallel_computation_graph.cc line 58 at r4 (raw file):

SubParallelComputationGraph
    get_pcg_subgraph(ParallelComputationGraph const &pcg,

Why remove this? It was fine as long as nodes was changed to be a std::unordered_set<parallel_layer_guid_t>


lib/compiler/test/src/test_graph_optimize_state.cc line 69 at r3 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

I think we will have another PR to handle the entire MCMC search part, which will clearly include the tests files.

Sounds good, tracked in #1501


lib/compiler/test/src/test_graph_optimize_state.cc line 70 at r3 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

I think we will have another PR to handle the entire MCMC search part, which will clearly include the tests files. Also it is very likely we will modify GraphOptimizeReulst (as you have suggested), so I prefer to put off all the MCMC tests after that.

Sounds good, tracked in #1501


lib/compiler/test/src/compiler/machine_mapping/cost_estimator_for_test.h line 14 at r8 (raw file):

                             std::vector<ParallelTensorAttrs> const &outputs,
                             MachineView const &mv) const override {
    return 1;

These functions will probably need to change per-test-case in order to actually test all the different properties you want to, so it might be better to have the cost estimator for testing take in a map or a lambda so that it can be configured per-test-case


lib/compiler/test/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 13 at r8 (raw file):

                                      MachineSpecification const &) {
      return std::unordered_set<MachineView>{
          make_1d_machine_view(gpu_id_t(1), gpu_id_t(2))};

We'll want some testcase that actually check that the different machine views are compared, right? Right now it seems that there's only ever one choice of machine view, which makes complete sense to test some properties, but there should also be some tests where there are, say, two possible machine views


lib/compiler/test/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 19 at r8 (raw file):

    MachineMappingCache cached_results1;

    SUBCASE("simple PCG") {

Ideally focus more with these subcase names on what is being tested--for example, for this testcase it appears that you're trying to check something about the cost function, as there is only one choice for the machine view? So would this be something like "serial node costs are added" or something like that? But doesn't this require parallel operations as well? Right now all the test cases feel a bit arbitrary and not like they're actually checking properties of the underlying function (for example, if all of these already involve both serial and parallel splits, why are there then a bunch of tests that also check more serial and parallel splits? Also, you can for the most part assume that serial-parallel decomposition works--it's being tested pretty thoroughly somewhere else, so having a bunch of test cases here for that part is likely redundant and leads to actually useful testcases such as: choosing between multiple machine views, how the costs are added up, the different ways (a|b, a->b, b ->a) parallel splits are handled in terms of cost and the corresponding restrictions on machine resources for each of those cases, how network transfer cost between nodes gets factored in, etc.)

In short, right now it feels like I have 5 testcases for the same property while a bunch of properties are missed, rather than (what it should be which is) 1 testcase for each of the 5 different properties that we want to validate. See https://reviewable.io/reviews/flexflow/FlexFlow/1459#-O696S9TBPtXH3khrM-t for more explanation


lib/compiler/test/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 106 at r8 (raw file):

                                      machine_spec1,
                                      cached_results1);
      CHECK(result.runtime == 13);

Right now all the checks are on result.runtime but nothing appears to check that the right machine mapping is returned?


lib/compiler/test/src/compiler/machine_mapping/machine_mapping.cc line 13 at r8 (raw file):

    MachineView machine_view_1 = make_1d_machine_view(gpu_id_t(0), gpu_id_t(2));
    MachineMapping machine_mapping_0 = MachineMapping({
        {parallel_layer_guid_t(Node(0)), machine_view_0},

FYI prefer bracket initialization where possible for consistency with the rest of the codebase (yes I know these are doing the same thing, but the fact that there are two equivalent syntaxes makes the codebase harder for people newer to C++ to get started with)

Suggestion:

parallel_layer_guid_t{Node{0}}

lib/compiler/test/src/compiler/machine_mapping/machine_mapping_cache.cc line 85 at r8 (raw file):

    CHECK(cache.load(state0).value() == result0);
    CHECK(cache.load(state1).value() == result1);
    CHECK(!cache.load(state2));

Try to avoid using implicit bool conversion of std::optional, I much prefer an explicit == std::nullopt check


lib/compiler/test/src/compiler/machine_mapping/test_get_optimal_machine_mapping.cc line 171 at r6 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

In a separate one.

Sounds good, tracked in #1500


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping.cc line 25 at r7 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

Done.

Not seeing it added for the combine_disjoint_mappings test case (but thanks for adding it to nodes_are_disjoint, it was needed there too


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping_cache.cc line 12 at r7 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

Yes, this PCG is only used to create the SPD

In that case why not just construct a SPD object directly?


lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping_cache.cc line 57 at r7 (raw file):

        split_sp_decomposition(subgraph0.get<SerialSplit>());

    MachineSpecification machine_spec(1, 1, 1, 1, 1);

Suggestion:

    MachineSpecification machine_spec{
      /*num_nodes=*/1,
      /*num_cpus_per_node=*/1, 
      /*num_gpus_per_node=*/1,
      /*inter_node_bandwidth=*/1,
      /*intra_node_bandwidth=*/1,
    };

lib/compiler/test/src/compiler/machine_mapping/test_machine_mapping_cache.cc line 58 at r7 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

Why? Since we are moving towards dtgen style code, we should use direct initialization, right? Also, it is safer than curly-brace initialization.

Currently I've leaned toward curly-brace initialization because (1) it prevents implicit conversion of arguments, and (2) it allows trailing commas. As much as it counts, https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#es23-prefer-the--initializer-syntax agrees. However, there certainly are still edge cases with std::initializer_list in with brace initialization so I don't entirely disagree, and so for the most part I think we stick to this convention to keep this consistent across the codebase (if everything sometimes uses () vs {} you never know if there's a reason or not, whereas sticking to one of the two makes it clear when there's a reason to use the other)

The preference for T t = T{...} vs T t{...} is less important and is generally because I find that the assignment better conveys the concept of what I'm trying to express esp when dealing with dtgen-style trivial named-tuple types (I think it picked this up from a style guide somewhere, but I can't seem to find where it was from now) and so I generally make the comment with consistency-with-the-rest-of-the-codebase in mind, but it's not a big deal either way and I'd be totally open to having my mind changed here

Overall both of these comments are usually made not so much because there's a real reason for one over the other, but because having a bunch of mixing of styles leads to me getting a bunch of questions from new people about why something is one way or another and why there are weird divergences between different parts of the codebase. Who knows, maybe it's just a lost battle and there's no way to stop C++ from being super confusing to new people 🤷

Marsella8
Marsella8 previously approved these changes Sep 13, 2024
Copy link

@Marsella8 Marsella8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 33 unresolved discussions (waiting on @wmdi)


lib/compiler/include/compiler/graph_optimize_state.h line 1 at r8 (raw file):

#ifndef _FLEXFLOW_COMPILER_MCMC_STATE_H

Why not dtgen?


lib/compiler/src/compiler/machine_mapping/get_optimal_machine_mapping.cc line 140 at r8 (raw file):

  MachineMappingResult result = decompn.visit<MachineMappingResult>(
      overload{[&](SerialSplit const &serial) {

i think this can be subbed by decompn.visit<MachineMappingResult>([](auto const &x) { return get_optimal_machine_mapping_internal(x); });)

Alternatively, probably cleaner to add another overload to get_optimal_machine_mapping_internal which delegates the call to the other overloads.

@lockshaw lockshaw mentioned this pull request Sep 18, 2024
20 tasks
@lockshaw lockshaw self-assigned this Sep 26, 2024
@lockshaw lockshaw added the repo-refactor Topics related to the repo and search refactors label Sep 26, 2024
@lockshaw lockshaw marked this pull request as draft September 26, 2024 04:56
Copy link

@Marsella8 Marsella8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r9, 106 of 150 files at r10, 117 of 145 files at r11, 51 of 60 files at r12, 23 of 23 files at r13, all commit messages.
Reviewable status: all files reviewed, 52 unresolved discussions (waiting on @wmdi)


lib/compiler/include/compiler/machine_mapping/include_unconstrained.struct.toml line 2 at r11 (raw file):

namespace = "FlexFlow"
name = "IncludeUnconstrained"

maybe enum instead? (just an idea)


lib/compiler/include/compiler/machine_mapping/machine_mapping_result.h line 9 at r12 (raw file):

namespace FlexFlow {

[[nodiscard]] MachineMappingResult infeasible_machine_mapping_result();

any strong reason for [[nodiscard]]?


lib/compiler/include/compiler/machine_mapping/machine_mapping_problem_tree/mm_problem_tree_parallel_split_label.struct.toml line 2 at r11 (raw file):

namespace = "FlexFlow"
name = "MMProblemTreeParallelSplitLabel"

why is it empty?


lib/compiler/src/compiler/machine_mapping/get_machine_resource_splits.cc line 27 at r12 (raw file):

    result.insert(std::make_pair(sub_resource2, sub_resource1));
  }

note that we are only partitioning across GPUs here.

Currently, I consider how MachineSpecification fuses GPU and CPU nodes to be a bit awkard (e.g. it doesn't even have a separate intra and inter node bandwitdh per parameters), I think it should instead have as attributes num_nodes, num_devices_per_node, device_type, inter_node_bandwidth, and intra_node_bandwidth, since this way we can ignore the CPU stuff a lot more and only use it when needed.

Also with the i*=2 in this part of the code feels a bit arbitrary?


lib/compiler/src/compiler/machine_mapping/get_tensor_set_movement_across_split.cc line 4 at r13 (raw file):

#include "compiler/machine_mapping/abstracted_tensor_set_movement/abstracted_tensor_set_movement.h"
#include "compiler/machine_mapping/abstracted_tensor_set_movement/get_abstracted_tensor_set_movement_across_split.h"
#include "compiler/machine_mapping/transitive_reduced_pcg.h"

some of this imports seem superfluous


lib/compiler/src/compiler/machine_mapping/machine_mapping_result.cc line 26 at r13 (raw file):

  for (MachineMappingResult const &candidate : candidates) {
    result = minimize_runtime(result, candidate);

optionally foldl, but it's honestly pretty clear as it is


lib/compiler/src/compiler/machine_mapping/machine_mapping_result.cc line 51 at r13 (raw file):

  ParallelLayerGuidObliviousMachineMapping mapping = [&] {
    if (parallel_split_transformation.has_value()

I think this should be if (parallel_split_transformation.has_value()) then evaluate whether it should be LthenR or RthenL, here if the split_transformation has no value it calls it with LthenR


lib/compiler/src/compiler/machine_mapping/machine_mapping_result.cc line 97 at r13 (raw file):

MachineMappingResult minimize_runtime(MachineMappingResult const &maybe_m1,
                                      MachineMappingResult const &maybe_m2) {
  FeasibleMachineMappingResult m1 = ({

This pattern to get the feasible results seems to appear fairly often here, maybe pacakge it into a separate function?


lib/compiler/src/compiler/machine_mapping/machine_mapping_result.cc line 111 at r13 (raw file):

  });

  if (m2.runtime < m1.runtime) {

ternary optionally


lib/compiler/test/src/compiler/machine_mapping/machine_mapping_problem_tree/get_machine_mapping_problem_tree.cc line 63 at r13 (raw file):

      UnmappedOpCostEstimateKey input_key = make_input_key(input_shape);

      PCGBinarySPDecomposition sp_decomposition = \

remove

Code quote:

\

lib/local-execution/src/ops/element_binary.h line 1 at r13 (raw file):

#ifndef _FLEXFLOW_ELEMENT_BINARY_H

change for all src/ops/ to also include the path? not a big deal

Code quote:

_FLEXFLOW_ELEMENT_BINARY_H

lib/local-execution/src/ops/layer_norm.h line 1 at r13 (raw file):

#ifndef _FLEXFLOW_RUNTIME_SRC_OPS_LAYER_NORM_H

LOCAL_EXECUTION

Code quote:

RUNTIME

lib/utils/include/utils/full_binary_tree/find_paths_to_leaf.h line 16 at r13 (raw file):

template <typename ParentLabel, typename LeafLabel>
std::unordered_set<BinaryTreePath> find_paths_to_leaf(FullBinaryTree<ParentLabel, LeafLabel> const &tree,

super clean implementation


lib/utils/include/utils/graph/digraph/algorithms/get_edges_from_subgraph_to_subgraph.h line 8 at r13 (raw file):

std::unordered_set<DirectedEdge> get_edges_from_subgraph_to_subgraph(DiGraphView const &,
                                                                     std::unordered_set<Node> const &,

add variable names for clarity


lib/utils/include/utils/graph/series_parallel/binary_sp_decomposition_tree/generic_binary_sp_decomposition_tree/make.h line 7 at r13 (raw file):

namespace FlexFlow {

maybe a couple of using SomeShorterName = GenericBinarySPDecompositionTree<SeriesLabel, ParallelLabel, LeafLabel> and same for the others?
I know it's not really standard for the codebase, but I think in this specific case it could help readability quite a bit.


lib/utils/include/utils/graph/series_parallel/binary_sp_decomposition_tree/leaf_only_binary_sp_decomposition_tree/leaf_only_binary_parallel_split_label.struct.toml line 2 at r13 (raw file):

namespace = "FlexFlow"
name = "LeafOnlyBinaryParallelSplitLabel"

why empty?


lib/utils/include/utils/graph/series_parallel/binary_sp_decomposition_tree/leaf_only_binary_sp_decomposition_tree/leaf_only_binary_series_split_label.struct.toml line 11 at r13 (raw file):

  "rapidcheck",
]

why empty?


lib/utils/src/utils/graph/dataflow_graph/algorithms/transitive_reduced_dataflow_graph/transitive_reduced_dataflow_graph.cc line 7 at r13 (raw file):

TransitiveReducedDataflowGraphView get_dataflow_graph_transitive_reduction(DataflowGraphView const &g) {
  DiGraphView as_digraph = g;

use an explicit to_digraph to cast it (just so this won't break in case we decide to modify the coercion rules)


lib/utils/test/src/utils/containers/unordered_map_from_pairs.cc line 42 at r13 (raw file):

        {1, "b"},
      };

I think we should throw an error in this case


lib/compiler/src/compiler/machine_mapping/get_allowed_machine_views_list.cc line 44 at r8 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Might be more readable as something like this? We'd need to add a transform_tuples function or something like it that takes in a container of std::tuple<A, B, ...> and a lambda matching std::function<T(A const &, B const &, ...)> and combines the destructuring and transform operations, but I'm sure I or @Marsella8 would be happy to add this to containers as I can think of quite a few places where it would be useful. Not a high priority and the current code is fine as is, just something I was thinking of

yeah seems like a good function to add, let me know if it is needed


lib/compiler/src/compiler/machine_mapping/machine_mapping_constraints.cc line 75 at r13 (raw file):

    } else {
      if (current_machine_view.value() != machine_view) {
        throw mk_runtime_error(fmt::format("with_additional_layer_machine_views received machine view assignment for layer {} "

with_additional_constraints

Code quote:

with_additional_layer_machine_views

Copy link
Collaborator Author

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 15 files at r1, 1 of 9 files at r3, 9 of 37 files at r4, 1 of 17 files at r7, 4 of 21 files at r8, 1 of 3 files at r9, 103 of 150 files at r10, 84 of 145 files at r11, 40 of 60 files at r12, 18 of 23 files at r13, 83 of 83 files at r14, all commit messages.
Reviewable status: all files reviewed, 52 unresolved discussions (waiting on @lockshaw)


lib/compiler/include/compiler/machine_mapping/machine_mapping_context.struct.toml line 29 at r6 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I agree up to the point of grouping mutable and non-mutable global variables into the same type as then the mutation properties become unclear (how do I know that your function isn't also mutating the pcg?), especially since only one of the members ever mutable. This is one case where I have a strong preference that we split MachineMappingCache out so that the function signatures remain clear about what is safe from mutation

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
repo-refactor Topics related to the repo and search refactors
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants