-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NNVM] Enhance operator fusion for more element wise patterns #1548
Conversation
3020877
to
494b3b9
Compare
please resync the changes against the current master as in #1394. |
ee45b8f
to
371d2eb
Compare
@tqchen done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments. The topi changes looks fine to me, maybe we can bring the topi changes in first
nnvm/src/compiler/graph_fuse.cc
Outdated
const auto& group_ids = children_group_ids[nid]; | ||
if (group_ids.size() <= 1) continue; | ||
const auto child_group_id = group_ids[0]; | ||
// fuse this node with children if all children belong to the same group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a stronger rule, currently, simply merges all of the children may not be good for future operations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a check to see if all children are elem wise ops.
But the only case I fuse a parent node with children is when the parent has multiple children, and all children nodes are already fused by the existing algorithm. I think this only occurs when all children are elem wise ops.
Do you have a counter example that fails my logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine for now, but we still want to be careful and explicit here, in case there are future updates to the code that brings the rule
nnvm/src/compiler/graph_fuse.cc
Outdated
@@ -161,6 +161,44 @@ nnvm::Graph GraphFusePartition(nnvm::Graph g) { | |||
} | |||
} | |||
} | |||
|
|||
if (opt_level >= 1) { | |||
std::vector<std::vector<uint32_t>> children_group_ids(idx.num_nodes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to document more on the algorithm here. So far I am not certain about the entire logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explained my logic in the description of this PR, is there something not clear?
But I agree that code alone is not enough to understand what I am trying to do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, can you also add comment block about the logic being used in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will do.
371d2eb
to
7b12ec1
Compare
@tqchen Check for op pattern and doc added. |
nnvm/src/compiler/graph_fuse.cc
Outdated
propagate the new group id to a grand parent and upward | ||
*/ | ||
if (opt_level >= 1) { | ||
std::vector<std::vector<uint32_t>> children_group_ids(idx.num_nodes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> > > to avoid problem in old compilers
@srkreddy1238 @kevinthesun @zhiics can you also take a look at this PR? |
Please let me know if there is a counter example that refutes this PR !! |
To be clear, I think it is a great PR to introduce the logic, since this logic is important, we want to make sure that there is more than one reviewer in the community who can read it and appreciate what is going on:) Thanks for adding the detailed comments |
_, params2 = utils.create_workload(sym2, 1, dshape[1:], seed=0) | ||
output1 = build_and_run(sym1, params1, data, oshape, target, ctx, opt_level=2) | ||
output2 = build_and_run(sym2, params2, data, oshape, target, ctx, opt_level=0) | ||
np.testing.assert_allclose(output1, output2, rtol=1e-5, atol=1e-5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check graph.index.num_nodes
here to make sure correct fusion happens when opt_level=2, like the above test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I couldn't find a way to check the number of nodes other than eyeballing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* Any number of child nodes from the top node | ||
* The path from the top node to bottom node can contain any number of element wise ops. | ||
|
||
The only restriction is that in-between nodes cannot have more than one child. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if in-between nodes have parents not belonging to this block, such as skip-connection block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My rule only applies when all in-between children nodes and the bottom node are fused by existing fusion algorithm (see my figure above). If some in-between node has more than one parent, existing fusion algorithm cannot fuse this node with other in-between nodes.
For example, if you add another parent node to the right in-between node 'ReLU' in my figure, nothing can be fused. Hope this makes sense.
Just to be sure, I check that in-between node has only one parent, here.
So for example, my rule does nothing to Resnet.
I have a comment in your topi PR #1556 |
b6b35f7
to
e0dff61
Compare
Thanks @masahi @merrymercy @kevinthesun This is merged! |
Relevant discussion is here.
@tqchen @srkreddy1238 @PariksheetPinjari909 @merrymercy please review.
This PR enhances NNVM operator fusion to support certain element wise patterns, like the one below. This pattern occurs when exponential linear unit is used, for example (see the link above for more info).
This PR enables to fuse all operators above into a single operator. The followings patterns are supported.
The restrictions:
NNVM can already fuse operators in red below. I added a simple post processing that
TOPI schedules for all backend are updated to prevent scheduling the same op more than once. This is required because the same op can be reached from more than one route during output-to-input traversal.
A test cast added at nnvm/tests/python/compiler/test_op_fusion.py.