Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NNVM] Enhance operator fusion for more element wise patterns #1548

Merged
merged 11 commits into from
Aug 8, 2018

Conversation

masahi
Copy link
Member

@masahi masahi commented Aug 3, 2018

Relevant discussion is here.

@tqchen @srkreddy1238 @PariksheetPinjari909 @merrymercy please review.

This PR enhances NNVM operator fusion to support certain element wise patterns, like the one below. This pattern occurs when exponential linear unit is used, for example (see the link above for more info).

abf7821fe5d8169fee908f4891279189ff60ea9b

This PR enables to fuse all operators above into a single operator. The followings patterns are supported.

  • Any number of child nodes from the top node
  • The path from the top node to bottom node can contain any number of element wise ops.

The restrictions:

  • All branches must meet at the bottom node.
  • 'In-between' nodes cannot have more than one child.

NNVM can already fuse operators in red below. I added a simple post processing that

  • Check if all children nodes are fused into a single op by the existing fusion algorithm
  • Fuse the parent node to children nodes. and update its group id to be the children's group id
  • If the parent node originally belongs to another group (for example, conv + batch norm), propagate the new group id to grand parent and upward

image

TOPI schedules for all backend are updated to prevent scheduling the same op more than once. This is required because the same op can be reached from more than one route during output-to-input traversal.

A test cast added at nnvm/tests/python/compiler/test_op_fusion.py.

@masahi masahi force-pushed the improve-op-fusion branch 3 times, most recently from 3020877 to 494b3b9 Compare August 4, 2018 00:30
@tqchen
Copy link
Member

tqchen commented Aug 4, 2018

please resync the changes against the current master as in #1394.

@masahi masahi force-pushed the improve-op-fusion branch 2 times, most recently from ee45b8f to 371d2eb Compare August 4, 2018 17:56
@masahi
Copy link
Member Author

masahi commented Aug 4, 2018

@tqchen done

Copy link
Member

@tqchen tqchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments. The topi changes looks fine to me, maybe we can bring the topi changes in first

const auto& group_ids = children_group_ids[nid];
if (group_ids.size() <= 1) continue;
const auto child_group_id = group_ids[0];
// fuse this node with children if all children belong to the same group
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a stronger rule, currently, simply merges all of the children may not be good for future operations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a check to see if all children are elem wise ops.

But the only case I fuse a parent node with children is when the parent has multiple children, and all children nodes are already fused by the existing algorithm. I think this only occurs when all children are elem wise ops.

Do you have a counter example that fails my logic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine for now, but we still want to be careful and explicit here, in case there are future updates to the code that brings the rule

@@ -161,6 +161,44 @@ nnvm::Graph GraphFusePartition(nnvm::Graph g) {
}
}
}

if (opt_level >= 1) {
std::vector<std::vector<uint32_t>> children_group_ids(idx.num_nodes());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to document more on the algorithm here. So far I am not certain about the entire logic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explained my logic in the description of this PR, is there something not clear?
But I agree that code alone is not enough to understand what I am trying to do here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, can you also add comment block about the logic being used in here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will do.

@masahi
Copy link
Member Author

masahi commented Aug 4, 2018

@tqchen TOPI update is split to #1556. After that PR is merged, I'll rebase this PR.

@masahi masahi force-pushed the improve-op-fusion branch from 371d2eb to 7b12ec1 Compare August 4, 2018 23:19
@masahi
Copy link
Member Author

masahi commented Aug 5, 2018

@tqchen Check for op pattern and doc added.

propagate the new group id to a grand parent and upward
*/
if (opt_level >= 1) {
std::vector<std::vector<uint32_t>> children_group_ids(idx.num_nodes());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> > > to avoid problem in old compilers

@tqchen
Copy link
Member

tqchen commented Aug 5, 2018

@srkreddy1238 @kevinthesun @zhiics can you also take a look at this PR?

@masahi
Copy link
Member Author

masahi commented Aug 5, 2018

Please let me know if there is a counter example that refutes this PR !!

@tqchen
Copy link
Member

tqchen commented Aug 5, 2018

To be clear, I think it is a great PR to introduce the logic, since this logic is important, we want to make sure that there is more than one reviewer in the community who can read it and appreciate what is going on:) Thanks for adding the detailed comments

_, params2 = utils.create_workload(sym2, 1, dshape[1:], seed=0)
output1 = build_and_run(sym1, params1, data, oshape, target, ctx, opt_level=2)
output2 = build_and_run(sym2, params2, data, oshape, target, ctx, opt_level=0)
np.testing.assert_allclose(output1, output2, rtol=1e-5, atol=1e-5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check graph.index.num_nodes here to make sure correct fusion happens when opt_level=2, like the above test cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I couldn't find a way to check the number of nodes other than eyeballing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* Any number of child nodes from the top node
* The path from the top node to bottom node can contain any number of element wise ops.

The only restriction is that in-between nodes cannot have more than one child.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if in-between nodes have parents not belonging to this block, such as skip-connection block?

Copy link
Member Author

@masahi masahi Aug 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rule only applies when all in-between children nodes and the bottom node are fused by existing fusion algorithm (see my figure above). If some in-between node has more than one parent, existing fusion algorithm cannot fuse this node with other in-between nodes.

For example, if you add another parent node to the right in-between node 'ReLU' in my figure, nothing can be fused. Hope this makes sense.

Just to be sure, I check that in-between node has only one parent, here.

So for example, my rule does nothing to Resnet.

@merrymercy
Copy link
Member

I have a comment in your topi PR #1556

@masahi masahi force-pushed the improve-op-fusion branch from b6b35f7 to e0dff61 Compare August 8, 2018 01:57
@tqchen tqchen merged commit 1ed28ae into apache:master Aug 8, 2018
@tqchen
Copy link
Member

tqchen commented Aug 8, 2018

Thanks @masahi @merrymercy @kevinthesun This is merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants