-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sp-ization Algorithm #1423
base: repo-refactor
Are you sure you want to change the base?
Sp-ization Algorithm #1423
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## repo-refactor #1423 +/- ##
=================================================
+ Coverage 61.44% 65.36% +3.91%
=================================================
Files 479 495 +16
Lines 13055 14431 +1376
Branches 410 446 +36
=================================================
+ Hits 8022 9433 +1411
+ Misses 5033 4998 -35
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 5 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @Marsella8)
lib/utils/include/utils/graph/simple_sp_ization.h
line 10 at r2 (raw file):
namespace FlexFlow { DiGraph simple_sp_ization(DiGraph g);
Suggestion:
DiGraph simple_sp_ization(DiGraphView const &g);
lib/utils/src/graph/simple_sp_ization.cc
line 12 at r2 (raw file):
namespace FlexFlow { DiGraph simple_sp_ization(DiGraph g) {
Maybe instead just output the serial-parallel decomposition of the resulting graph as produced by the existing serial-parallel algorithm? It seems like what you really want for this is a serial-parallel structure, not actually a graph, and this also solves the problem of what to do with these random barrier nodes that otherwise get inserted.
lib/utils/src/graph/simple_sp_ization.cc
line 14 at r2 (raw file):
DiGraph simple_sp_ization(DiGraph g) { std::unordered_set<Node> sources = get_sources(g);
Break this into two functions: one that finds the distances of all of the nodes from one the sources, and one that actually inserts the barrier nodes
lib/utils/src/graph/simple_sp_ization.cc
line 44 at r2 (raw file):
if (layers.count(layer + 1)) { // Skip if last layer std::vector<Node> successors = add_nodes(sp, layers.at(layer + 1).size()); Node barrier_node = add_nodes(sp, 1)[0];
Suggestion:
Node barrier_node = sp.add_node();
lib/utils/src/graph/simple_sp_ization.cc
line 45 at r2 (raw file):
std::vector<Node> successors = add_nodes(sp, layers.at(layer + 1).size()); Node barrier_node = add_nodes(sp, 1)[0]; for (auto const &p : predecessors) {
Suggestion:
for (Node const &p : predecessors) {
lib/utils/src/graph/simple_sp_ization.cc
line 51 at r2 (raw file):
sp.add_edge({barrier_node, s}); } predecessors = successors;
Try to not re-assign variables as much as possible, it makes the code much less readable when I have to track all the state changes through the loop. Here you could just have
std::vector<Node> predecessors = add_nodes(sp, layers.at(current_layer));
std::vector<Node> successors = add_nodes(sp, layers.at(current_layer + 1));
At the top of the loop head, which avoids all this reassignment. Even better, make layers
a vector and then iterate through pairs of elements (and if a function that produces a vector of pairs of elements is not already in utils/containers
, add it) (though this may all be obviated by the comment at the top of this function)
lib/utils/test/src/test_simple_sp_ization.cc
line 0 at r2 (raw file):
Add this file to the list of tests in the utils/test/CMakeLists.txt
lib/utils/test/src/test_simple_sp_ization.cc
line 13 at r2 (raw file):
TEST_CASE("Simple SP-ization: sanity checks") { /* graph TD;
Ideally prefer graphviz/dot syntax over mermaid (we just use mermaid in readmes because github auto-renders it in those cases). Here, the equivalent graphviz would be
digraph {
n1 -> n2;
n1 -> n3;
n2 -> n4;
n2 -> n5;
n3 -> n4;
n3 -> n5;
n4 -> n6;
n5 -> n6;
}
Also, isn't this graph serial-parallel as long as you treat all-to-all connections as sequential splits (which we do)?
lib/utils/test/src/test_simple_sp_ization.cc
line 50 at r2 (raw file):
std::unordered_set<DirectedEdge> sp_edges = get_edges(sp); CHECK(sp_edges.size() == 3 + 4 + 3); // 3 for first layer, 4 for second, ...
Add a check that the resulting graph is sp-decomposable?
lib/utils/test/src/test_simple_sp_ization.cc
line 53 at r2 (raw file):
} TEST_CASE("Simple SP-ization: integrity checks") {
Why two separate test cases here for the same input graph? Probably better to just move the checks into the previous test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For both get_longest_path_lengths and the simple_sp_ization the tests are failing due to the directed graph flipping the edges issue (I've checked and constructing the graphs in reverse makes the tests pass).
Reviewable status: 0 of 8 files reviewed, 10 unresolved discussions (waiting on @lockshaw)
lib/utils/src/graph/simple_sp_ization.cc
line 12 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Maybe instead just output the serial-parallel decomposition of the resulting graph as produced by the existing serial-parallel algorithm? It seems like what you really want for this is a serial-parallel structure, not actually a graph, and this also solves the problem of what to do with these random barrier nodes that otherwise get inserted.
Yes, sorry for that. Currently returns a SerialParallelDecomposition (which is a Serial Node that contains a vector of Parallel Nodes, each representing a layer).
lib/utils/src/graph/simple_sp_ization.cc
line 14 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Break this into two functions: one that finds the distances of all of the nodes from one the sources, and one that actually inserts the barrier nodes
Done.
lib/utils/src/graph/simple_sp_ization.cc
line 44 at r2 (raw file):
if (layers.count(layer + 1)) { // Skip if last layer std::vector<Node> successors = add_nodes(sp, layers.at(layer + 1).size()); Node barrier_node = add_nodes(sp, 1)[0];
Done.
lib/utils/src/graph/simple_sp_ization.cc
line 45 at r2 (raw file):
std::vector<Node> successors = add_nodes(sp, layers.at(layer + 1).size()); Node barrier_node = add_nodes(sp, 1)[0]; for (auto const &p : predecessors) {
Done.
lib/utils/src/graph/simple_sp_ization.cc
line 51 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Try to not re-assign variables as much as possible, it makes the code much less readable when I have to track all the state changes through the loop. Here you could just have
std::vector<Node> predecessors = add_nodes(sp, layers.at(current_layer)); std::vector<Node> successors = add_nodes(sp, layers.at(current_layer + 1));
At the top of the loop head, which avoids all this reassignment. Even better, make
layers
a vector and then iterate through pairs of elements (and if a function that produces a vector of pairs of elements is not already inutils/containers
, add it) (though this may all be obviated by the comment at the top of this function)
Done.
lib/utils/test/src/test_simple_sp_ization.cc
line 13 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Ideally prefer graphviz/dot syntax over mermaid (we just use mermaid in readmes because github auto-renders it in those cases). Here, the equivalent graphviz would be
digraph { n1 -> n2; n1 -> n3; n2 -> n4; n2 -> n5; n3 -> n4; n3 -> n5; n4 -> n6; n5 -> n6; }
Also, isn't this graph serial-parallel as long as you treat all-to-all connections as sequential splits (which we do)?
Changed the graph to non-sp
lib/utils/test/src/test_simple_sp_ization.cc
line 50 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Add a check that the resulting graph is sp-decomposable?
Done.
lib/utils/test/src/test_simple_sp_ization.cc
line 53 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why two separate test cases here for the same input graph? Probably better to just move the checks into the previous test case?
Done.
lib/utils/test/src/test_simple_sp_ization.cc
line at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Add this file to the list of tests in the
utils/test/CMakeLists.txt
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 8 files reviewed, 10 unresolved discussions (waiting on @lockshaw)
lib/utils/include/utils/graph/simple_sp_ization.h
line 10 at r2 (raw file):
namespace FlexFlow { DiGraph simple_sp_ization(DiGraph g);
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to include #1449 and then re-request review
Reviewed 1 of 7 files at r8.
Reviewable status: 1 of 12 files reviewed, 22 unresolved discussions (waiting on @Marsella8)
lib/utils/src/graph/sp_ization.cc
line 15 at r5 (raw file):
Previously, Marsella8 wrote…
changed to
is_2_terminal_sp_compliant
How about is_two_terminal_dag
, since none of these properties actually have to do with SP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Reviewable status: 1 of 54 files reviewed, 22 unresolved discussions (waiting on @lockshaw)
lib/utils/src/graph/sp_ization.cc
line 15 at r5 (raw file):
Previously, lockshaw (Colin Unger) wrote…
How about
is_two_terminal_dag
, since none of these properties actually have to do with SP
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 52 of 53 files at r11, 2 of 2 files at r13, all commit messages.
Reviewable status: all files reviewed, 58 unresolved discussions (waiting on @Marsella8)
lib/utils/include/utils/containers.h
line 26 at r11 (raw file):
#include <map> #include <numeric> #include <set>
Why?
lib/utils/include/utils/containers/as_unordered_set.h
line 9 at r11 (raw file):
template <typename C, typename T = typename C::value_type> std::unordered_set<T> as_unordered_set(C const &c) {
Use unordered_set_of
(yes the naming is a bit irregular, but let's not try to fix that in this PR)
lib/utils/include/utils/containers/invert_map.h
line 19 at r11 (raw file):
} return m_inv; }
Unless there's a need for the extra polymorphism it's probably simpler to just have it defined for unordered_map
Suggestion:
template <typename K, typename V>
auto invert_map(std::unordered_map<K, V> const &m) {
std::unordered_map<V, std::unordered_set<K>> result;
for (auto const &[key, value] : m) {
result[value].insert(key);
}
return result;
}
lib/utils/include/utils/graph/digraph/algorithms/get_longest_path_lengths_from_root.h
line 9 at r11 (raw file):
namespace FlexFlow { std::unordered_map<Node, float> get_weighted_longest_path_lengths_from_root(
Maybe add a quick doxygen comment to explain what these functions do? Up to you, only do it if you can come up with a good prose explanation/docstring
lib/utils/include/utils/graph/digraph/algorithms/get_topological_ordering.h
line 11 at r11 (raw file):
std::vector<Node> get_topological_ordering_from_starting_node(DiGraphView const &g,
Move to its own header file
lib/utils/include/utils/graph/serial_parallel/graph_generation.h
line 19 at r11 (raw file):
DataflowGraphView const &g2); std::unordered_map<Node, Node> parallel_extend(DiGraph &g,
Why were these added?
lib/utils/include/utils/graph/serial_parallel/parallel_composition.h
line 8 at r11 (raw file):
namespace FlexFlow { SerialParallelDecomposition parallel_composition(
Probably can just be part of serial_parallel_decomposition.h
as it feels more like a primitive operation
lib/utils/include/utils/graph/serial_parallel/serial_composition.h
line 8 at r11 (raw file):
namespace FlexFlow { SerialParallelDecomposition serial_composition(
Probably should just be part of serial_parallel_decomposition.h
as this feels like a rather primitive operation
lib/utils/include/utils/graph/serial_parallel/serial_parallel_decomposition.variant.toml
line 8 at r11 (raw file):
"fmt", ] explicit_constructors = false
Use explicit_constructors
unless you have a strong reason--it's a bit more verbose but the code is usually a lot clearer and less bug prone
lib/utils/include/utils/graph/serial_parallel/serial_parallel_isempty.h
line 8 at r11 (raw file):
namespace FlexFlow { bool isempty(SerialParallelDecomposition const &sp);
Should probably be moved to serial_parallel_decomposition.h
, this feels like a rather primitive operation
Suggestion:
bool is_empty(SerialParallelDecomposition const &sp);
lib/utils/include/utils/graph/serial_parallel/serial_parallel_metrics.h
line 11 at r11 (raw file):
std::unordered_map<Node, size_t> node_frequency_counter(SerialParallelDecomposition const &sp);
Might be more clear if named get_node_num_occurences_map
, or get_node_frequency_map
?
lib/utils/include/utils/graph/serial_parallel/serial_parallel_metrics.h
line 13 at r11 (raw file):
node_frequency_counter(SerialParallelDecomposition const &sp); size_t num_nodes(SerialParallelDecomposition const &sp);
Move to serial_parallel_decomposition.h
lib/utils/include/utils/graph/serial_parallel/serial_parallel_normalize.h
line 0 at r11 (raw file):
Rename to normalize_sp_decomposition.h
lib/utils/include/utils/graph/serial_parallel/serial_parallel_normalize.h
line 13 at r11 (raw file):
* @details This function performs the following semantic substitutions: * - Deletes every empty SerialSplit and ParallelSplit item: * S(P(S()), Node(1), Node(2)) -> S(Node(1), Node(2))
Suggestion:
* - Deletes every empty SerialSplit and ParallelSplit item, e.g.,
* <tt>S(P(S()), Node(1), Node(2)) -> S(Node(1), Node(2))</tt>
lib/utils/include/utils/graph/serial_parallel/serial_parallel_normalize.h
line 16 at r11 (raw file):
* * - Replaces SerialSplit and ParallelSplit of size 1 with their content: * S(S(Node(1)), P(Node(2))) -> S(Node(1), Node(2))
Suggestion:
* - Replaces SerialSplit and ParallelSplit of size 1 with their content, e.g.,
* <tt>S(S(Node(1)), P(Node(2))) -> S(Node(1), Node(2))</tt>)
lib/utils/include/utils/graph/serial_parallel/serial_parallel_normalize.h
line 19 at r11 (raw file):
* */ SerialParallelDecomposition normalize(SerialParallelDecomposition const &sp);
Suggestion:
normalize_sp_decomposition
lib/utils/include/utils/graph/serial_parallel/serial_parallel_splits.h
line 15 at r11 (raw file):
struct SerialSplit { public: SerialSplit() = delete;
I'm not a big fan of default-constructibility, why should this have a default constructor?
lib/utils/include/utils/graph/serial_parallel/sp_ization/critical_path_preserving_sp_ization.h
line 10 at r11 (raw file):
namespace FlexFlow { SerialParallelDecomposition
A couple brief doxygen comments would be rather nice here if you can come up with some
lib/utils/src/utils/graph/digraph/algorithms/get_longest_path_lengths_from_root.cc
line 53 at r11 (raw file):
} std::unordered_map<Node, float> get_weighted_longest_path_lengths_from_root(
Probably better to just merge the checked
and unchecked
versions here, unless you have a strong counterargument
lib/utils/src/utils/graph/digraph/algorithms/is_acyclic.cc
line 33 at r13 (raw file):
} bool is_acyclic(DiGraphView const &g) {
This should either be simplified or additional explanation added. Why are there three states? What invariants are expected to hold? Collapsing dfs_is_acyclic
down to a lambda could also help. Also a better/clearer name for dfs_is_acylic
would likely help.
lib/utils/src/utils/graph/serial_parallel/parallel_composition.cc
line 8 at r13 (raw file):
SerialParallelDecomposition parallel_composition( std::unordered_set<SerialParallelDecomposition> const &sp_compositions) {
What if sp_compositions
is empty?
lib/utils/src/utils/graph/serial_parallel/parallel_composition.cc
line 14 at r13 (raw file):
ParallelSplit composition({}); for (SerialParallelDecomposition const &sp_comp : sp_compositions) { if (sp_comp.has<ParallelSplit>()) {
Might be cleaner with overload
?
lib/utils/src/utils/graph/serial_parallel/serial_composition.cc
line 8 at r13 (raw file):
SerialParallelDecomposition serial_composition( std::vector<SerialParallelDecomposition> const &sp_compositions) {
What if sp_compositions
is empty?
lib/utils/src/utils/graph/serial_parallel/serial_composition.cc
line 14 at r13 (raw file):
SerialSplit composition{}; for (SerialParallelDecomposition const &sp_comp : sp_compositions) { if (sp_comp.has<SerialSplit>()) {
Might be cleaner with overload
lib/utils/src/utils/graph/serial_parallel/serial_parallel_isempty.cc
line 9 at r13 (raw file):
namespace FlexFlow { bool isempty(SerialParallelDecomposition const &sp) {
Likely simpler to just add overloads for SerialSplit
, ParallelSplit
, etc. and then you can also avoid all the widen
ing
Also probably cleaner with overload
lib/utils/src/utils/graph/serial_parallel/serial_parallel_metrics.cc
line 14 at r13 (raw file):
std::unordered_map<Node, size_t> node_frequency_counter(SerialParallelDecomposition const &sp) {
This function needs simplifying
lib/utils/src/utils/graph/serial_parallel/serial_parallel_metrics.cc
line 67 at r13 (raw file):
float critical_path_cost(SerialParallelDecomposition const &sp, std::unordered_map<Node, float> const &cost_map) { if (sp.has<Node>()) {
overload
and visit
would probably clean up this code
lib/utils/src/utils/graph/serial_parallel/serial_parallel_metrics.cc
line 81 at r13 (raw file):
[&](std::variant<SerialSplit, Node> const &child) { return critical_path_cost( widen<SerialParallelDecomposition>(child),
Avoid the widen calls, preferably add critical_path_cost
overloads for ParallelSplit
and SerialSplit
Code quote:
widen<SerialParallelDecomposition>(
lib/utils/src/utils/graph/serial_parallel/serial_parallel_normalize.cc
line 27 at r13 (raw file):
} SerialParallelDecomposition normalize(SerialParallelDecomposition const &sp) {
Why is this function so complicated? Is there not a cleaner way of doing this? Is it due to the alternating Serial-Parallel invariant? At the very least it seems like you could transform sp
back into a binary decomposition tree and then use get_final_ast
to re-flatten it which I think should handle normalization?
lib/utils/src/utils/graph/serial_parallel/sp_ization/critical_path_preserving_sp_ization.cc
line 0 at r13 (raw file):
A bunch of these functions could use some simplifying, in addition to ideally some docstrings as their names are rather technical (e.g., critical_path_preserving_sp_ization_unchecked_with_coalescing
is a bit hard to understand unless you're Pietro 😛)
lib/utils/src/utils/graph/serial_parallel/sp_ization/work_preserving_sp_ization.cc
line 0 at r13 (raw file):
Try to simplify these functions--right now they're unnecessarily hard to read.
Also, while I appreciate the effort to use functions from utils/containers
, they're not always the most readable version of the code--often they are, but this does need to be determined on a case-by-case basis
lib/utils/src/utils/graph/serial_parallel/sp_ization/work_preserving_sp_ization.cc
line 33 at r13 (raw file):
namespace FlexFlow { std::vector<std::unordered_set<Node>> naive_layer_split(DiGraphView const &g) {
Try not to overload layer
--maybe tier
or stratum
or something derived from the timeline analogy?
Code quote:
std::vector<std::unordered_set<Node>> naive_layer_split(DiGraphView const &g) {
lib/utils/src/utils/graph/serial_parallel/sp_ization/work_preserving_sp_ization.cc
line 33 at r13 (raw file):
namespace FlexFlow { std::vector<std::unordered_set<Node>> naive_layer_split(DiGraphView const &g) {
get_strata_assuming_unit_cost
?
Code quote:
naive_layer_split(
lib/utils/src/utils/graph/serial_parallel/sp_ization/work_preserving_sp_ization.cc
line 43 at r13 (raw file):
} return layer_to_nodes; }
Suggestion:
std::vector<std::unordered_set<Node>> naive_layer_split(DiGraphView const &g) {
std::vector<std::unordered_set<Node>> result;
for (auto const &[node, depth] : get_longest_path_lengths_from_root(g)) {
result[depth].insert(node);
}
return result;
}
lib/utils/src/utils/graph/serial_parallel/sp_ization/work_preserving_sp_ization.cc
line 46 at r13 (raw file):
static SerialParallelDecomposition naive_layer_merge(std::vector<std::unordered_set<Node>> layer_to_node) {
Usually we use a_to_b
for maps, it's a bit weird to have it for a vector
?
Code quote:
layer_to_node
lib/utils/src/utils/graph/serial_parallel/sp_ization/work_preserving_sp_ization.cc
line 55 at r13 (raw file):
sp.children.push_back(layer); } return normalize(SerialParallelDecomposition(sp));
Suggestion:
return serial_composition(transform(layer_to_node,
[](std::unordered_set<Node> const &stratum_nodes) {
return ParallelSplit{stratum_nodes};
});
lib/utils/test/CMakeLists.txt
line 6 at r13 (raw file):
SRC_PATTERNS src/utils/*.cc # src/*.cc
Suggestion:
src/utils/*.cc
lib/utils/test/src/test_algorithms.cc
line 248 at r13 (raw file):
} TEST_CASE("get_longest_path_lengths_from_root - linear graph") {
Move these tests of get_longest_path_lengths_from_root
to lib/utils/test/src/utils/graph/digraph/algorithms/get_longest_path_lengths_from_root.cc
lib/utils/test/src/utils/graph/digraph/algorithms/is_acyclic.cc
line 26 at r11 (raw file):
CHECK(result == correct); }
Apply to all subcases below
Suggestion:
TEST_CASE("is_acyclic") {
DiGraph g = DiGraph::create<AdjacencyDiGraph>();
SUBCASE("empty graph") {
bool correct = true;
bool result = is_acyclic(g);
CHECK(result == correct);
}
SUBCASE("is_acyclic - single node") {
add_nodes(g, 1);
bool correct = true;
bool result = is_acyclic(g);
CHECK(result == correct);
}
}
lib/utils/test/src/utils/graph/serial_parallel/serial_parallel_isempty.cc
line 8 at r11 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("isempty function") {
Suggestion:
TEST_CASE("is_empty(SerialParallelDecomposition)") {
lib/utils/test/src/utils/graph/serial_parallel/serial_parallel_normalize.cc
line 8 at r11 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("normalize function") {
Suggestion:
TEST_CASE("normalize_sp_decomposition") {
lib/utils/test/src/utils/graph/serial_parallel/serial_parallel_normalize.cc
line 9 at r11 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("normalize function") { Node n1{1};
Personally I prefer this syntax, but this isn't a huge deal, just would help with style consistency
Suggestion:
Node n1 = Node{1};
lib/utils/test/src/utils/graph/serial_parallel/serial_parallel_normalize.cc
line 16 at r11 (raw file):
SerialParallelDecomposition sp{n1}; SerialParallelDecomposition expected = sp; CHECK(expected == normalize(sp));
Apply this pattern to all of the SUBCASE
s below as well
Suggestion:
SerialParallelDecomposition input = {n1};
SerialParallelDecomposition result = normalize(sp);
SerialParallelDecomposition correct = input;
CHECK(result == correct);
lib/utils/test/src/utils/graph/serial_parallel/serial_parallel_splits.cc
line 12 at r13 (raw file):
// The following definitions are equivalent ParallelSplit p1{n[0],
Prefer bounds-checked accesses
Suggestion:
ParallelSplit p1{n.at(0),
lib/utils/test/src/utils/graph/serial_parallel/serial_parallel_splits.cc
line 21 at r13 (raw file):
SerialSplit{n[1], n[2]}, SerialSplit{n[3], ParallelSplit{n[0], n[5], n[4]}}}; ParallelSplit p4{SerialSplit{n[3], ParallelSplit{n[5], n[4], n[0]}},
Rather than using a bunch of complicated test cases each time, try to have small/minimal test cases for each of the properties, and then if you want add a big end-to-end test case at the end. For example, it might be better to have a test case checking that ParallelSplit{n.at(0), n.at(1)} == ParallelSplit{n.at(1), n.at(0)}
instead all of these big splits where it's difficult to tell what's getting tested
lib/utils/test/src/utils/graph/serial_parallel/serial_parallel_splits.cc
line 53 at r13 (raw file):
} TEST_CASE("ParallelSplit::operator== - nested SP") {
How are these nested test cases any different than the test case above?
Code quote:
- nested SP")
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/critical_path_preserving_sp_ization.cc
line 13 at r13 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("critical_path_preserving_sp_ization") {
Add some simple test cases in addition to composite ones
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/critical_path_preserving_sp_ization.cc
line 20 at r13 (raw file):
add_edges(g, { DirectedEdge{n[0], n[1]},
Prefer bounds-checked accessors
Suggestion:
n.at(0),
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/critical_path_preserving_sp_ization.cc
line 33 at r13 (raw file):
{n[0], 3}, {n[1], 2}, {n[2], 1}, {n[3], 1}, {n[4], 1}, {n[5], 5}}; CHECK(work_cost(g, cost_map) == 13);
Why are there two work_cost
checks?
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/critical_path_preserving_sp_ization.cc
line 49 at r13 (raw file):
CHECK(expected == result); } SUBCASE("work cost") {
Try to avoid testing a bunch of different functions all at once--it seems like an easy way to get additional coverage with less work, but the test cases become rather hard to read as the cases aren't chosen for that function, they're just big giant things intended to test everything all at once. For example, work_cost
doesn't take in a function, it takes in an sp-decomposition--but here it's all tied up with a graph. Focus on testing one property of one function at a time
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/critical_path_preserving_sp_ization.cc
line 92 at r13 (raw file):
} SUBCASE("work cost") { float expected = 16;
Suggestion:
float correct = 16;
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/work_preserving_sp_ization.cc
line 15 at r13 (raw file):
TEST_CASE("work_preserving_") { SUBCASE("Sample Graph #1") {
See comments about having smaller test cases and testing one property of one function at a time--right now I can't tell what these test cases are actually trying to check
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/benchmarking/distributions.h
line 4 at r13 (raw file):
#define _FLEXFLOW_DISTRIBUTIONS_H #include "utils/graph/node/graph.h"
Suggestion:
#include "utils/graph/node/node.dtg.h"
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/benchmarking/distributions.h
line 29 at r13 (raw file):
struct Bernoulli { float p; Bernoulli(float p = 0.5) : p(p) {}
Move implementations to .cc
file
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/benchmarking/sample_graphs.h
line 29 at r13 (raw file):
std::vector<DirectedEdge> edges = { DirectedEdge(inputs[0], sep[1]), DirectedEdge(inputs[0], id[1]),
Prefer bounds-checked variants
Suggestion:
inputs.at(0)
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/benchmarking/sample_graphs.h
line 243 at r13 (raw file):
std::vector<DirectedEdge> edges = { DirectedEdge(n[0], n[1]), DirectedEdge(n[0], n[2]),
Prefer bracket initialization for consistency
Suggestion:
DirectedEdge{n[0], n[1]}
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/benchmarking/sample_graphs.h
line 264 at r13 (raw file):
// From the TASO paper, pg 57 DiGraph g = DiGraph::create<AdjacencyDiGraph>(); Node root = get_only(add_nodes(g, 1));
Suggestion:
Node root = g.add_node();
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/benchmarking/sample_graphs.h
line 270 at r13 (raw file):
std::vector<Node> avg = add_nodes(g, 3); std::vector<Node> add = add_nodes(g, 5); Node concat = get_only(add_nodes(g, 1));
Suggestion:
Node concat = g.add_node();
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/benchmarking/sample_graphs.h
line 289 at r13 (raw file):
add_edges(g, edges); for (auto const &a : add) {
Suggestion:
for (Node const &a : add) {
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/benchmarking/sample_graphs.h
line 306 at r13 (raw file):
} } auto sinks = get_sinks(g);
Prefer actual type names, i.e., std::unordered_set<Node>
Code quote:
auto sinks = get_sinks(g);
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/benchmarking/sp_ization_benchmarking.cc
line 1 at r13 (raw file):
// #include "distributions.h"
What's happening with all of this commented-out code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 68 files reviewed, 58 unresolved discussions (waiting on @lockshaw)
lib/utils/include/utils/containers/invert_map.h
line 19 at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Unless there's a need for the extra polymorphism it's probably simpler to just have it defined for
unordered_map
changed
lib/utils/include/utils/graph/digraph/algorithms/get_longest_path_lengths_from_root.h
line 9 at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Maybe add a quick doxygen comment to explain what these functions do? Up to you, only do it if you can come up with a good prose explanation/docstring
done
lib/utils/include/utils/graph/serial_parallel/graph_generation.h
line 19 at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why were these added?
Needed them for building the sample graphs for the benchmarking
lib/utils/include/utils/graph/serial_parallel/serial_parallel_metrics.h
line 11 at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Might be more clear if named
get_node_num_occurences_map
, orget_node_frequency_map
?
changed to get_node_frequency_map.
lib/utils/include/utils/graph/serial_parallel/serial_parallel_metrics.h
line 13 at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Move to
serial_parallel_decomposition.h
Done. for get_nodes
is it relevant that duplicate nodes are not considered? I use duplicate nodes fairly extensively within Sp-ization to track which nodes are duplicate of each other
lib/utils/include/utils/graph/serial_parallel/serial_parallel_splits.h
line 15 at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I'm not a big fan of default-constructibility, why should this have a default constructor?
Makes it less annoying to deal with unordered_map
s (see https://stackoverflow.com/questions/695645/why-does-the-c-map-type-argument-require-an-empty-constructor-when-using?noredirect=1&lq=1), prefer the []
syntax over things like emplace
and find
. Semantically, I'd say for SerialSplit and ParallelSplit it makes sense.
lib/utils/include/utils/graph/serial_parallel/sp_ization/critical_path_preserving_sp_ization.h
line 10 at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
A couple brief doxygen comments would be rather nice here if you can come up with some
I've added probably too much information and not enough at the same time, let me know how deep I should go into explaining it (general principle behind it, just what it outputs, how it differs from the others, do I need to explain what SPization is etc...)
lib/utils/src/utils/graph/digraph/algorithms/get_longest_path_lengths_from_root.cc
line 53 at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Probably better to just merge the
checked
andunchecked
versions here, unless you have a strong counterargument
done.
lib/utils/src/utils/graph/digraph/algorithms/is_acyclic.cc
line 33 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
This should either be simplified or additional explanation added. Why are there three states? What invariants are expected to hold? Collapsing
dfs_is_acyclic
down to a lambda could also help. Also a better/clearer name fordfs_is_acylic
would likely help.
cleaned up and added some comments / correction. I can also change the implementation to be simpler but with quadratic runtime (current one is something like O(E+V) )
lib/utils/src/utils/graph/serial_parallel/serial_parallel_metrics.cc
line 14 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
This function needs simplifying
broken up with overloads
lib/utils/src/utils/graph/serial_parallel/serial_parallel_metrics.cc
line 67 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
overload
andvisit
would probably clean up this code
cleanup by overloading, let me know if you want me to do it with directly withoverload
.
lib/utils/src/utils/graph/serial_parallel/serial_parallel_metrics.cc
line 81 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Avoid the widen calls, preferably add
critical_path_cost
overloads forParallelSplit
andSerialSplit
not sure I can avoid widen
without making the code far more verbose (messy because we are pulling fomr the children, which are variants of Node
and Parallel/SerialSplit
lib/utils/src/utils/graph/serial_parallel/sp_ization/critical_path_preserving_sp_ization.cc
line at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
A bunch of these functions could use some simplifying, in addition to ideally some docstrings as their names are rather technical (e.g.,
critical_path_preserving_sp_ization_unchecked_with_coalescing
is a bit hard to understand unless you're Pietro 😛)
added docstrings / comments, changed variable names and simplified the code a bit, let me know if it need further changes.
lib/utils/src/utils/graph/serial_parallel/sp_ization/work_preserving_sp_ization.cc
line 33 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Try not to overload
layer
--maybetier
orstratum
or something derived from the timeline analogy?
I'll use stratum from now on to refer to all the work preserving syncing stuff (prefer it over tier
, which gives me a hierarchical impression, and makes sense since we are stratifying things).
lib/utils/src/utils/graph/serial_parallel/sp_ization/work_preserving_sp_ization.cc
line 33 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
get_strata_assuming_unit_cost
?
Done.
lib/utils/src/utils/graph/serial_parallel/sp_ization/work_preserving_sp_ization.cc
line 43 at r13 (raw file):
} return layer_to_nodes; }
done (not sure what I was thinking)
lib/utils/src/utils/graph/serial_parallel/sp_ization/work_preserving_sp_ization.cc
line 46 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Usually we use
a_to_b
for maps, it's a bit weird to have it for avector
?
changed
lib/utils/src/utils/graph/serial_parallel/sp_ization/work_preserving_sp_ization.cc
line 55 at r13 (raw file):
sp.children.push_back(layer); } return normalize(SerialParallelDecomposition(sp));
Done (added extra initializers to serial_parallel_splits.cc
to avoid ugly casting patterns)
lib/utils/src/utils/graph/serial_parallel/sp_ization/work_preserving_sp_ization.cc
line at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Try to simplify these functions--right now they're unnecessarily hard to read.
Also, while I appreciate the effort to use functions from
utils/containers
, they're not always the most readable version of the code--often they are, but this does need to be determined on a case-by-case basis
simplified / made things clearer, should be more understandable
lib/utils/test/src/test_algorithms.cc
line 248 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Move these tests of
get_longest_path_lengths_from_root
tolib/utils/test/src/utils/graph/digraph/algorithms/get_longest_path_lengths_from_root.cc
Done.
lib/utils/test/src/utils/graph/digraph/algorithms/is_acyclic.cc
line 26 at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Apply to all subcases below
done. putCHECK(is_acyclic(g))
instead of result = ...; expected=...
felt unnecessarily verbose
lib/utils/test/src/utils/graph/serial_parallel/serial_parallel_splits.cc
line 12 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer bounds-checked accesses
Done.
lib/utils/test/src/utils/graph/serial_parallel/serial_parallel_splits.cc
line 21 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Rather than using a bunch of complicated test cases each time, try to have small/minimal test cases for each of the properties, and then if you want add a big end-to-end test case at the end. For example, it might be better to have a test case checking that
ParallelSplit{n.at(0), n.at(1)} == ParallelSplit{n.at(1), n.at(0)}
instead all of these big splits where it's difficult to tell what's getting tested
changed the test cases (they were overly complicated and unnecessary)
lib/utils/test/src/utils/graph/serial_parallel/serial_parallel_splits.cc
line 53 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
How are these nested test cases any different than the test case above?
Done.
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/critical_path_preserving_sp_ization.cc
line 20 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer bounds-checked accessors
Done.
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/critical_path_preserving_sp_ization.cc
line 33 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why are there two
work_cost
checks?
I check work_cost
, critical_path_cost
both for the original graph (to show what the metrics are for the graph) and for the sp decomposition. For the work invariant algorithms, the work_cost
remains the same, while the critical_path_cost
(generally) increases. Viceversa for the critical path cost invariant algorithm.
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/critical_path_preserving_sp_ization.cc
line 92 at r13 (raw file):
} SUBCASE("work cost") { float expected = 16;
Done.
lib/utils/include/utils/containers.h
line 26 at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why?
now removed
lib/utils/include/utils/containers/as_unordered_set.h
line 9 at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Use
unordered_set_of
(yes the naming is a bit irregular, but let's not try to fix that in this PR)
removed as_unordered_set
lib/utils/include/utils/graph/digraph/algorithms/get_topological_ordering.h
line 11 at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Move to its own header file
Done.
lib/utils/include/utils/graph/serial_parallel/parallel_composition.h
line 8 at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Probably can just be part of
serial_parallel_decomposition.h
as it feels more like a primitive operation
Done.
lib/utils/include/utils/graph/serial_parallel/serial_composition.h
line 8 at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Probably should just be part of
serial_parallel_decomposition.h
as this feels like a rather primitive operation
Done.
lib/utils/include/utils/graph/serial_parallel/serial_parallel_decomposition.variant.toml
line 8 at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Use
explicit_constructors
unless you have a strong reason--it's a bit more verbose but the code is usually a lot clearer and less bug prone
removed
lib/utils/include/utils/graph/serial_parallel/serial_parallel_isempty.h
line 8 at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Should probably be moved to
serial_parallel_decomposition.h
, this feels like a rather primitive operation
Done.
lib/utils/src/utils/graph/serial_parallel/parallel_composition.cc
line 8 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What if
sp_compositions
is empty?
Returns an empty ParallelSplit
, wrapped around aSerialParallelDecomposition
(can make it more explicit if you want to).
lib/utils/src/utils/graph/serial_parallel/parallel_composition.cc
line 14 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Might be cleaner with
overload
?
cleaned it up, not sure overloading would make it cleaner.
lib/utils/src/utils/graph/serial_parallel/serial_composition.cc
line 8 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What if
sp_compositions
is empty?
Returns an empty SerialSplit
, wrapped around aSerialParallelDecomposition
(can make it more explicit if you want to).
lib/utils/src/utils/graph/serial_parallel/serial_composition.cc
line 14 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Might be cleaner with
overload
cleaned it up, not sure overloading would make it cleaner.
lib/utils/src/utils/graph/serial_parallel/serial_parallel_isempty.cc
line 9 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Likely simpler to just add overloads for
SerialSplit
,ParallelSplit
, etc. and then you can also avoid all thewiden
ingAlso probably cleaner with
overload
Changed, now cleaner (couldn't avoid the widening since the children are variant<Node, Serial/Parallel>
).
lib/utils/src/utils/graph/serial_parallel/serial_parallel_normalize.cc
line 27 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why is this function so complicated? Is there not a cleaner way of doing this? Is it due to the alternating Serial-Parallel invariant? At the very least it seems like you could transform
sp
back into a binary decomposition tree and then useget_final_ast
to re-flatten it which I think should handle normalization?
I've simplified it a fair bit, let me know if this is sufficient.
template <typename SplitType>
SerialParallelDecomposition normalize_sp_decomposition(SplitType const &split) {
auto normalized_children = transform(filter_empty(split.children), [](auto const &child) {
return normalize_sp_decomposition(widen<SerialParallelDecomposition>(child));
});
if (normalized_children.size() == 1) {
return get_only(normalized_children);
}
if (std::is_same_v<SplitType, SerialSplit>) {
return serial_composition(as_vector(normalized_children));
} else if (std::is_same_v<SplitType, ParallelSplit>) {
return parallel_composition(unordered_set_of(normalized_children));
}
}
We can also add this in place of overloads for SerialSplit
, ParallelSplit
, though it's a bit ugly imo.
lib/utils/test/CMakeLists.txt
line 6 at r13 (raw file):
SRC_PATTERNS src/utils/*.cc # src/*.cc
What do you wanna do with these tests? Low priority, I can make a PR down the line to sort them out.
lib/utils/test/src/utils/graph/serial_parallel/serial_parallel_isempty.cc
line 8 at r11 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("isempty function") {
Done.
lib/utils/test/src/utils/graph/serial_parallel/serial_parallel_normalize.cc
line 8 at r11 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("normalize function") {
Done.
lib/utils/test/src/utils/graph/serial_parallel/serial_parallel_normalize.cc
line 16 at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Apply this pattern to all of the
SUBCASE
s below as well
Done.
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/benchmarking/sp_ization_benchmarking.cc
line 1 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What's happening with all of this commented-out code?
moved the benchmarking stuff to bin/
.
lib/utils/include/utils/graph/serial_parallel/serial_parallel_normalize.h
line 13 at r11 (raw file):
* @details This function performs the following semantic substitutions: * - Deletes every empty SerialSplit and ParallelSplit item: * S(P(S()), Node(1), Node(2)) -> S(Node(1), Node(2))
Done.
lib/utils/include/utils/graph/serial_parallel/serial_parallel_normalize.h
line 16 at r11 (raw file):
* * - Replaces SerialSplit and ParallelSplit of size 1 with their content: * S(S(Node(1)), P(Node(2))) -> S(Node(1), Node(2))
Done.
lib/utils/include/utils/graph/serial_parallel/serial_parallel_normalize.h
line 19 at r11 (raw file):
* */ SerialParallelDecomposition normalize(SerialParallelDecomposition const &sp);
Done.
lib/utils/include/utils/graph/serial_parallel/serial_parallel_normalize.h
line at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Rename to
normalize_sp_decomposition.h
Done.
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/benchmarking/distributions.h
line 4 at r13 (raw file):
#define _FLEXFLOW_DISTRIBUTIONS_H #include "utils/graph/node/graph.h"
Done.
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/benchmarking/distributions.h
line 29 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Move implementations to
.cc
file
Done.
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/benchmarking/sample_graphs.h
line 29 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer bounds-checked variants
Done.
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/benchmarking/sample_graphs.h
line 243 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer bracket initialization for consistency
any reason to prefer {}
over ()
other than avoiding narrowing?
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/benchmarking/sample_graphs.h
line 264 at r13 (raw file):
// From the TASO paper, pg 57 DiGraph g = DiGraph::create<AdjacencyDiGraph>(); Node root = get_only(add_nodes(g, 1));
Done.
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/benchmarking/sample_graphs.h
line 270 at r13 (raw file):
std::vector<Node> avg = add_nodes(g, 3); std::vector<Node> add = add_nodes(g, 5); Node concat = get_only(add_nodes(g, 1));
Done.
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/benchmarking/sample_graphs.h
line 289 at r13 (raw file):
add_edges(g, edges); for (auto const &a : add) {
Done.
lib/utils/test/src/utils/graph/serial_parallel/sp_ization/benchmarking/sample_graphs.h
line 306 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer actual type names, i.e.,
std::unordered_set<Node>
Done.
…m SerialParallelDecomposition
Description of changes:
Added basic sp-ization algorithm and associated tests
Related Issues:
Linked Issues:
Issues closed by this PR:
This change is