Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AOT] Enable A-Normal Form in the AOT executor #11091

Merged
merged 4 commits into from
May 7, 2022

Conversation

lhutton1
Copy link
Contributor

@lhutton1 lhutton1 commented Apr 21, 2022

The sequence of calls produced by the AOT executor codegen is arbitrary, especially in the presence of 'branchy' networks. This makes it difficult to analyze memory usage for each call. By running the ToANormalForm pass to insert a series of let bindings before the lowering and codegen stages, we can establish an ordering for the evaluation of the external calls, thus allowing reliable analysis of memory usage.

Additionally contains a couple of error message improvements and comment corrections.

cc @manupa-arm @Mousius @areusch @mbs-octoml @ashutosh-arm @ekalda @NicolaLancellotti

@github-actions github-actions bot requested review from Mousius, areusch and manupak April 21, 2022 16:07
@mbs-octoml
Copy link
Contributor

mbs-octoml commented Apr 21, 2022

In an ideal world we'd convert to ANF early on. In addition to making eval order explicit, it would also mean we could retire MixedModeVisitor since the deep recursion will be in lets which are trivial to deal with. However we can't do that without interrupting all uses of DFPatterns since the matcher does not 'see through' vars. You thankfully stay well out of that mess, so I'd support this.

That said the visit order is deterministic, right? Or pehaps we're accidentally depending on arbitrary undordered_set iteration order somewhere in the visitor machinery?

@lhutton1
Copy link
Contributor Author

Thanks for taking a look @mbs-octoml, yeah our plan was to put our foot-in-the-door at this stage as it doesn't require too much modification and then enable ANF at an earlier stage in the future if need be.

I believe your thinking that the visit order is deterministic is correct, so then enabling ANF would just make the evaluation order explicit, like you said. Making the lowering pipeline to adhere to an explicit ordering at this stage can be helpful for passes which rely on an explicit ordering such as liveness analysis, rather than solely relying on the implementation of the visitor to provide such guarantees. @manupa-arm may be able to comment further here :)

Copy link
Contributor

@mbs-octoml mbs-octoml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but please do see what you can do with my suggestion in the storage allocator.

@MarisaKirisame
Copy link
Contributor

Not much comment about the code, as I didnt know this part of the codebase well, but -
as the author of the ToANF pass very happy to see it replacing the graph mode :)
making recursive tuple possible is also great - it is a feature needed for reverse mode ad which is needed for training.

What do you guys want to do with analyzing memory for example? What branchy network do you guys have in mind? I was working on saving memory for deep learning and think it is very relevant to Dynamic Tensor Rematerialization - it can work even for full inference, but most 'straight-forward' network dont really have saving, only on network with a more complex structure.

@manupak
Copy link
Contributor

manupak commented Apr 26, 2022

@MarisaKirisame, as @lhutton1 mentioned before, we are trying to bring explicit ordering to middle-end of the stack for which we can rely on to make certain decisions in the backend passes. I suppose we dont have any special model in mind, but generally looking out for ones that would be generated from NAS's and common models such as Inception models. In future, I suppose it opens up possibilities to explore different variations of sequentialization given the topology of model.

Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lhutton1 ! looking good.

Just a question/suggestion to see if can get rid of custom let-chain unrolling..

Expr true_expr = expr;

// Don't get storage for let nodes.
while (const auto* let_node = true_expr.as<LetNode>()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't we achieve the same effect by using something from here ? :

/*!
* \brief Visit a let expression after the let-bound value and body have been visited.
* Default implementation is a no-op.
*/
virtual void PostVisitLet_(const LetNode* let_node) { /* no-op */
}
/*!
* \brief Visit the first let in a chain of let expressions after it has been visited.
* Default implementation is a no-op.
*/
virtual void PostVisitLetBlock_(const LetNode* let_node) {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I can't see a way to make use of these here, perhaps I've missed something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking instead of :

void PreVisitLetBinding_(const Var& var, const Expr& value) final {
    LOG(FATAL) << "let is not supported.";
    VisitExpr(value);
    StorageInfo si = GetStorage(value);
    storage_device_map_[var] = si;
  }

if we do PostVisitLet, wont we have the same effect (i.e. the body is visited)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objection from me if you want to just do the loop directly as I find that much clearer. In my travels I found switching to pre- and post-let-binding visitor overloads made the most sense when you find you need to accumulate intermediate rewritten bindings on a local stack in order to reconstruct the final let chain. ExpandANormalForm also achieves the same effect all be it by relying on the implicit memoization to capture the intermediate results (which hurts my head every time I come across it!)

Copy link
Contributor Author

@lhutton1 lhutton1 May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manupa-arm and I discussed this offline and it turns out we can just avoid let's being used in GetStorage

lhutton1 added 3 commits May 5, 2022 16:13
The sequence of calls produced by the AOT executor codegen is arbitrary,
especially in the presence of 'branchy' networks. This makes it
difficult to analyze memory usage for each call. By running the
ToANormalForm pass to insert a series of let bindings before the
lowering and codegen stages, we can establish an ordering for the
evaluation of the external calls, thus allowing reliable analysis of
memory usage.

Change-Id: Ic320b68cde83c96b228a8d1d2829a0e8ac7b768f
Change-Id: Id40b70f67a3e37f75b8331aa89f1819072e4d48e
Change-Id: I8de2bd19c7c17057e2bc89f6a68595780c2e9433
Change-Id: I74c080e2a09e84a75400db5c3395d508697d5d0f
Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@manupak
Copy link
Contributor

manupak commented May 7, 2022

Thanks @lhutton1 @mbs-octoml @ashutosh-arm !

This is merged now!

@manupak manupak merged commit bc7f45e into apache:main May 7, 2022
@lhutton1 lhutton1 deleted the anf-in-aot branch May 9, 2022 21:42
shtinsa pushed a commit to Deelvin/tvm that referenced this pull request May 17, 2022
* [AOT] Enable A-Normal Form in the AOT executor

The sequence of calls produced by the AOT executor codegen is arbitrary,
especially in the presence of 'branchy' networks. This makes it
difficult to analyze memory usage for each call. By running the
ToANormalForm pass to insert a series of let bindings before the
lowering and codegen stages, we can establish an ordering for the
evaluation of the external calls, thus allowing reliable analysis of
memory usage.

Change-Id: Ic320b68cde83c96b228a8d1d2829a0e8ac7b768f

* Maintain GetStorage(var) == GetStorage(value) invariant for lets

Change-Id: Id40b70f67a3e37f75b8331aa89f1819072e4d48e

* Add check to ensure ANF runs in AOT

Change-Id: I8de2bd19c7c17057e2bc89f6a68595780c2e9433

* Avoid let block traversal and don't visit var in let visitation

Change-Id: I74c080e2a09e84a75400db5c3395d508697d5d0f
SebastianBoblest pushed a commit to SebastianBoblest/tvm that referenced this pull request May 27, 2022
* [AOT] Enable A-Normal Form in the AOT executor

The sequence of calls produced by the AOT executor codegen is arbitrary,
especially in the presence of 'branchy' networks. This makes it
difficult to analyze memory usage for each call. By running the
ToANormalForm pass to insert a series of let bindings before the
lowering and codegen stages, we can establish an ordering for the
evaluation of the external calls, thus allowing reliable analysis of
memory usage.

Change-Id: Ic320b68cde83c96b228a8d1d2829a0e8ac7b768f

* Maintain GetStorage(var) == GetStorage(value) invariant for lets

Change-Id: Id40b70f67a3e37f75b8331aa89f1819072e4d48e

* Add check to ensure ANF runs in AOT

Change-Id: I8de2bd19c7c17057e2bc89f6a68595780c2e9433

* Avoid let block traversal and don't visit var in let visitation

Change-Id: I74c080e2a09e84a75400db5c3395d508697d5d0f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants