-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate fusion from compilation/lowering as a standalone pass #1540
Comments
+1 in the sense of Fusing is target independent and Lowering not. Separation allow flexibility to add target specific implementations after Fusing. |
This is a reasonable decision to make. We can roll out something in this direction around next release cycle |
We (@tqchen and I) are planning to ship the first version of the new high level IR we have been designing in the next release. I think it would be good to design the new fusion machinery with this in mind, and possibly implement it on top of the new IR. We have been talking informally about porting fusion to the IR. My view is we ideally want most passes (including fusion) to be functions on graphs/programs. This means we can choose ordering according to different criteria, and apply AutoTVM style techniques to explore pass ordering, and fusion settings. My observation from talking with of users is that fusion decisions have a huge effect on end-to-end performance of the system, and is something we probably want fine grained control over for tuning programs for specific platforms. It would be great to be able to apply fusion multiple times. I could imagine applying one around of fusion, then a set of platform specific graph transformations, and then trying fusion again. |
Interesting discussion. I have a PR at #1548 which adds another stage of fusion. Since I didn't modify existing fusion algorithm at all, it can be implemented as a separate function/pass on graph. It should be cleaner that way. If later someone comes up with another fusion rule, we don't want to add another big block of code after existing one. My fusion rule assumes that the existing fusion algorithm has already been applied, so it introduces dependency between fusion rules there. I am also interested in applying fusion multiple times. My fusion rule has a restriction that, children nodes which branch from a single parent node cannot have multiple child nodes (or branch). I think I can get around this problem by some kind of iterative, or progressive fusion strategy, where first round fuses the children nodes at depth 2 and the second round fuses the children nodes at depth 1 and a group of nodes fused by the first round , etc. |
@jroesch I totally agree that most passes should be implemented as functions just as optimization passes in compilers like LLVM. The ordering of different graph level and tensor level optimizations may also have notable performance impact which is similar to the phase-ordering problem in compilers. |
+1 for separating fusion from compiling and enabling multiple passes of fusion. |
Having the PR merged, close for now. |
Currently, TVM implements fusion in two steps: GraphFuseParition and GraphFuseCompile. The second pass actually does both fusion and lowering together.
Would it be more appropriate to make fusion as a separate pass so that both the input and output are a graph? Compilation/lowering is decoupled as a separate pass as well. This way, other graph level optimization(s) might still be applicable.
Furthermore, we could probably make fusion more extensible, i.e. operators could be fused with certain given rules. I think it can probably be even more aggressive to apply fusion multiple times where each time a rule is used.
The text was updated successfully, but these errors were encountered: