This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Pointwise fusion for GPU #15167
Merged
Merged
Pointwise fusion for GPU #15167
Changes from 61 commits
Commits
Show all changes
112 commits
Select commit
Hold shift + click to select a range
9653b67
Beginning of RTC of pointwise ops
ptrendx 0e1774f
Code generation from the given JSON
ptrendx 8bf2945
add initial simple_partition_pass and use it for pointwise fusion
Caenorst 5cbb50d
fix the fusion, use a symbol.Copy() at the beginning of binding funct…
Caenorst fcf23c7
Fixes
Caenorst 892c18f
Adding support for attribute inference for backward nodes when fusing
ptrendx 0a342a0
keep proper input ordering for fused Op
Caenorst 07de800
instantiate the indexed_graph before starting the subgraph replacemen…
Caenorst 975e8a6
Fuse backward
ptrendx 6d9c0bf
fix ordering of subgraph node inputs using subgraph topological order…
Caenorst 384fbb0
excluse forward node fusion during the fusion of the nodes in the bac…
Caenorst b9506ff
Dealing with fused backward nodes inferattr
ptrendx f30fbbb
use subgraph.indexed_graph() instead of main for _FusedOpHelper nodes…
Caenorst 1a2e30d
Adding support for other reqs in codegen
ptrendx 15fbed5
Fix
ptrendx 506b126
Cleaning
ptrendx cf88753
Change the TVM submodule
ptrendx b861af9
More cleaning
ptrendx d001b5d
Making linter happy
ptrendx 48f1b94
Do fusion only if default context is GPU
ptrendx 37d4bbf
Fixes for tests
ptrendx 616b932
Fix the TVM commit
ptrendx 56303c8
Fix lint
ptrendx 00e61cf
Guard fusion with MXNET_USE_CUDA
ptrendx 204ab30
Fix
ptrendx 0e89f8c
Fix clang-tidy
ptrendx 73a2a5c
Add erf and erfinv backward
ptrendx 4d0f1c9
Gluon support for fusion
ptrendx 3dddad7
Cleaning
ptrendx 5067fa6
Cleaning and allow shape/type change in FusedOp
ptrendx b27a369
Fixing Gluon bugs
ptrendx f18847c
Fixing after rebase
ptrendx 9a05327
Fixing race condition and guarding against races when using NVRTC
ptrendx 309f9a7
Cleaning and renaming FusedOp to _FusedOp
ptrendx 9617b03
Going easy on Windows compiler
ptrendx d730027
Merge branch 'upstream' into pr_fusion
ptrendx de9027b
Disable fusion on Windows for now
ptrendx 3d2d715
Refactor InferAttr and InferShapeAttr
ptrendx 5221677
Added slice and half2 support to FusedOp
nvchai f3e4f7a
Fix lint errors
nvchai 84822e1
Added multiple types support for vector loading/storing
nvchai 2896258
add slice fusion when it's at the beginning of subgraphs
Caenorst eb0151c
Removed constant ndim assumption in fused op
nvchai 935342f
Fix memory alignment issue in slice for FusedOp
nvchai ffa6c63
Fixes
nvchai 803fd2a
Fix lint errors
nvchai 3ed3aef
Do not include cuda_fp16.h
ptrendx 84c2df5
Refactor fused op op lists
ptrendx 1d94365
Make linter happy
ptrendx 844cb9f
Changes from review
ptrendx 204b127
Fixes after rebase
ptrendx 56eb99d
Expand FusedOp support for slice
nvchai e31b586
Fix for fp16 _zeros and _ones
ptrendx c611b56
Fix
ptrendx d0d0fcf
Moving aux functions to unnamed namespace and detail namespace -> fusion
ptrendx 39e309f
Merge branch 'upstream' into pr_fusion
ptrendx 7f12eac
Disabling fusion if it alters topological order of inputs
ptrendx 654a358
Print code only when env variable is set
ptrendx 32b690a
Fix
ptrendx 39bfcf6
Fix lint and 2 tests that specify the same names for multiple inputs
ptrendx b109a38
Fixes from review and disabling fusion of slice with non-default step
ptrendx f1a14fd
Add amp_cast to fusion, fixes
ptrendx a72b980
Add amp_multicast and its backward to the list of support ops
ptrendx e4e674e
Apply wording suggestions from code review
ptrendx 5766481
Apply wording suggestions from code review
ptrendx 62513e6
Make clearer comment
ptrendx dd651d3
Adding punctuation and capitalization to \brief descriptions
ptrendx 7974888
Fix
ptrendx 2aa8950
Fix
ptrendx a96e778
Add backward_cast to fusion
ptrendx 9ea5464
Adding unittests for fusion. Fix for erfinv_grad
ptrendx 6c3a75a
Adding slice ops and add_n to tests
ptrendx 6d0eaf3
Fixes from review
ptrendx 70735f2
Setting inplace option
ptrendx 9049086
Fix lint
ptrendx 6f56a8b
Storing double in half
ptrendx 171c24f
Retrigger CI
ptrendx 26b19ed
Slight relaxing of the relative tolerance in the test
ptrendx 551c3b7
Merge branch 'upstream' into pr_fusion
ptrendx 912e831
Move the env variable check to the end
ptrendx 052576e
Fix a race condition between InferShape and scheduled Forward
ptrendx 0e1918f
Fix flakey test_fusion test involving fp32 erfinv op.
DickJC123 1bbdba6
Merge branch 'upstream' into pr_fusion
ptrendx 7e1df6a
Fix from review
ptrendx 7a92738
Added broadcast_like and slice_like to fused op
nvchai a1dee58
Minor fix and cleanup
nvchai 36201fe
Added negative axis support in slice_axis, temporarily disabled fusio…
nvchai c077e97
Added axes support to slice_like
nvchai 3f0bfb4
Added axis support to broadcast_like
nvchai 1e20339
Add fast_load_slice function to fused op code
nvchai 13b3076
Added runtime switch for choosing fast and slow slice kernel
nvchai e5649e1
Fix lint and warning
ptrendx 868bcf6
Going easy on Windows compiler (again)
ptrendx 1608d6a
Fix slice_like
ptrendx 037a5de
Debug broadcast_like fusion
ptrendx e501bc9
Fix lint
ptrendx e0ca7d0
Fix lint
ptrendx 8d3dc77
Trigger CI
ptrendx 786b071
Get rid of the initializer list
ptrendx 0720f66
Fix backward calls with different gradient type
ptrendx da8bfe3
avoid cycle when adding node specific for inputs of subgraph for poin…
Caenorst ed03595
Fix lint
ptrendx 69facdc
Add namespace to the fusion implementations
ptrendx a5ee989
Merge branch 'upstream' into pr_fusion
ptrendx e26770b
Set launch bounds on the fused kernel
ptrendx 80e36ba
Fix NumPy tests
ptrendx 36e5ce8
Test showcasing an issue fixed in PR #16553
ptrendx f77fe5b
Cast scalarts to FP32 and perform (a*1.0/b) instead of (a/b)
MoisesHer fdf710e
Merge branch 'upstream' into pr_fusion
ptrendx 76aa154
Fix a bug in cycle detection for inputs only op in pointwise fusion
Caenorst 929b8e9
Merge branch 'upstream' into pr_fusion
ptrendx 3d1b5af
Add comments to simple_partition_pass.h file
ptrendx File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Submodule tvm
updated
from 21935d to 88163e
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
/*! | ||
* \file exec_utils.cc | ||
* \brief implementation of executor util functions | ||
*/ | ||
|
||
#include "exec_utils.h" | ||
#include <unordered_set> | ||
#include <unordered_map> | ||
#include <string> | ||
|
||
namespace mxnet { | ||
namespace common { | ||
|
||
void CopyGraph(nnvm::Graph *dst, const nnvm::Graph &src, bool copy_variables) { | ||
using nnvm::Node; | ||
using nnvm::NodePtr; | ||
using nnvm::NodeEntry; | ||
std::unordered_map<Node*, NodePtr> old_new; | ||
// use DFSVisit to copy all the nodes | ||
DFSVisit(src.outputs, [&old_new, copy_variables](const NodePtr& node) { | ||
NodePtr np; | ||
if (copy_variables || !node->is_variable()) { | ||
np = Node::Create(); | ||
np->attrs = node->attrs; | ||
} else { | ||
np = node; | ||
} | ||
old_new[node.get()] = std::move(np); | ||
}); | ||
// connect nodes of new graph | ||
for (const auto &kv : old_new) { | ||
for (const NodeEntry& e : kv.first->inputs) { | ||
Node *ptr = e.node.get(); | ||
kv.second->inputs.emplace_back(NodeEntry{old_new[ptr], e.index, e.version}); | ||
} | ||
for (const NodePtr& p : kv.first->control_deps) { | ||
kv.second->control_deps.emplace_back(old_new[p.get()]); | ||
} | ||
} | ||
// set the head | ||
for (const NodeEntry &e : src.outputs) { | ||
(*dst).outputs.emplace_back(NodeEntry{old_new[e.node.get()], e.index, e.version}); | ||
} | ||
} | ||
|
||
bool CheckForInputNameDuplicates(const nnvm::IndexedGraph &idx) { | ||
std::unordered_set<std::string> names; | ||
for (const auto& nid : idx.input_nodes()) { | ||
const std::string &name = idx[nid].source->attrs.name; | ||
if (names.count(name)) { | ||
LOG(WARNING) << "Variable name " << name << " is used more than once!"; | ||
return false; | ||
} | ||
names.insert(name); | ||
} | ||
return true; | ||
} | ||
|
||
} // namespace common | ||
} // namespace mxnet |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I suggested in dev@, could we align the variable to MXNET_SUBGRAPH_BACKEND to make the easy usage for the user? Currently, this env is for CPU operator fusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it but I'm not sure - this fusion does not really use subgraph API because of its limitations (like Gluon support, separate forward and backward fusion), and would not work with the "get_optimized_symbol" API (which using this env variable would imply).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense.
@aaronmarkham do you have any suggestion?
Simple background, @ptrendx please correct me if anything is not correct.
Currently, two environments deliver similar functionality where
MXNET_SUBGRAPH_BACKEND=MKLDNN
is for CPU fusion andMXNET_USE_FUSION
is for GPU.I suggest aligning to
MXNET_SUBGRAPH_BACKEN={MKLDNN, CUDNN}
because this is already widely used for operator fusion but the name ofSUBGRAPH
doesn't match with technical details inside GPU implementation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there also automatic fusion for CPU ops? @PatricZhao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ptrendx what's the plan to support fused CONV-BN-RELU pass? Will that also go with the subgraph API?