-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AOT] Enable A-Normal Form in the AOT executor #11091
Conversation
In an ideal world we'd convert to ANF early on. In addition to making eval order explicit, it would also mean we could retire MixedModeVisitor since the deep recursion will be in lets which are trivial to deal with. However we can't do that without interrupting all uses of DFPatterns since the matcher does not 'see through' vars. You thankfully stay well out of that mess, so I'd support this. That said the visit order is deterministic, right? Or pehaps we're accidentally depending on arbitrary undordered_set iteration order somewhere in the visitor machinery? |
Thanks for taking a look @mbs-octoml, yeah our plan was to put our foot-in-the-door at this stage as it doesn't require too much modification and then enable ANF at an earlier stage in the future if need be. I believe your thinking that the visit order is deterministic is correct, so then enabling ANF would just make the evaluation order explicit, like you said. Making the lowering pipeline to adhere to an explicit ordering at this stage can be helpful for passes which rely on an explicit ordering such as liveness analysis, rather than solely relying on the implementation of the visitor to provide such guarantees. @manupa-arm may be able to comment further here :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but please do see what you can do with my suggestion in the storage allocator.
Not much comment about the code, as I didnt know this part of the codebase well, but - What do you guys want to do with analyzing memory for example? What branchy network do you guys have in mind? I was working on saving memory for deep learning and think it is very relevant to Dynamic Tensor Rematerialization - it can work even for full inference, but most 'straight-forward' network dont really have saving, only on network with a more complex structure. |
@MarisaKirisame, as @lhutton1 mentioned before, we are trying to bring explicit ordering to middle-end of the stack for which we can rely on to make certain decisions in the backend passes. I suppose we dont have any special model in mind, but generally looking out for ones that would be generated from NAS's and common models such as Inception models. In future, I suppose it opens up possibilities to explore different variations of sequentialization given the topology of model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lhutton1 ! looking good.
Just a question/suggestion to see if can get rid of custom let-chain unrolling..
Expr true_expr = expr; | ||
|
||
// Don't get storage for let nodes. | ||
while (const auto* let_node = true_expr.as<LetNode>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't we achieve the same effect by using something from here ? :
tvm/src/relay/transforms/device_aware_visitors.h
Lines 247 to 258 in f0854ae
/*! | |
* \brief Visit a let expression after the let-bound value and body have been visited. | |
* Default implementation is a no-op. | |
*/ | |
virtual void PostVisitLet_(const LetNode* let_node) { /* no-op */ | |
} | |
/*! | |
* \brief Visit the first let in a chain of let expressions after it has been visited. | |
* Default implementation is a no-op. | |
*/ | |
virtual void PostVisitLetBlock_(const LetNode* let_node) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I can't see a way to make use of these here, perhaps I've missed something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking instead of :
void PreVisitLetBinding_(const Var& var, const Expr& value) final {
LOG(FATAL) << "let is not supported.";
VisitExpr(value);
StorageInfo si = GetStorage(value);
storage_device_map_[var] = si;
}
if we do PostVisitLet, wont we have the same effect (i.e. the body is visited)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objection from me if you want to just do the loop directly as I find that much clearer. In my travels I found switching to pre- and post-let-binding visitor overloads made the most sense when you find you need to accumulate intermediate rewritten bindings on a local stack in order to reconstruct the final let chain. ExpandANormalForm also achieves the same effect all be it by relying on the implicit memoization to capture the intermediate results (which hurts my head every time I come across it!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@manupa-arm and I discussed this offline and it turns out we can just avoid let's being used in GetStorage
The sequence of calls produced by the AOT executor codegen is arbitrary, especially in the presence of 'branchy' networks. This makes it difficult to analyze memory usage for each call. By running the ToANormalForm pass to insert a series of let bindings before the lowering and codegen stages, we can establish an ordering for the evaluation of the external calls, thus allowing reliable analysis of memory usage. Change-Id: Ic320b68cde83c96b228a8d1d2829a0e8ac7b768f
Change-Id: Id40b70f67a3e37f75b8331aa89f1819072e4d48e
Change-Id: I8de2bd19c7c17057e2bc89f6a68595780c2e9433
Change-Id: I74c080e2a09e84a75400db5c3395d508697d5d0f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @lhutton1 @mbs-octoml @ashutosh-arm ! This is merged now! |
* [AOT] Enable A-Normal Form in the AOT executor The sequence of calls produced by the AOT executor codegen is arbitrary, especially in the presence of 'branchy' networks. This makes it difficult to analyze memory usage for each call. By running the ToANormalForm pass to insert a series of let bindings before the lowering and codegen stages, we can establish an ordering for the evaluation of the external calls, thus allowing reliable analysis of memory usage. Change-Id: Ic320b68cde83c96b228a8d1d2829a0e8ac7b768f * Maintain GetStorage(var) == GetStorage(value) invariant for lets Change-Id: Id40b70f67a3e37f75b8331aa89f1819072e4d48e * Add check to ensure ANF runs in AOT Change-Id: I8de2bd19c7c17057e2bc89f6a68595780c2e9433 * Avoid let block traversal and don't visit var in let visitation Change-Id: I74c080e2a09e84a75400db5c3395d508697d5d0f
* [AOT] Enable A-Normal Form in the AOT executor The sequence of calls produced by the AOT executor codegen is arbitrary, especially in the presence of 'branchy' networks. This makes it difficult to analyze memory usage for each call. By running the ToANormalForm pass to insert a series of let bindings before the lowering and codegen stages, we can establish an ordering for the evaluation of the external calls, thus allowing reliable analysis of memory usage. Change-Id: Ic320b68cde83c96b228a8d1d2829a0e8ac7b768f * Maintain GetStorage(var) == GetStorage(value) invariant for lets Change-Id: Id40b70f67a3e37f75b8331aa89f1819072e4d48e * Add check to ensure ANF runs in AOT Change-Id: I8de2bd19c7c17057e2bc89f6a68595780c2e9433 * Avoid let block traversal and don't visit var in let visitation Change-Id: I74c080e2a09e84a75400db5c3395d508697d5d0f
The sequence of calls produced by the AOT executor codegen is arbitrary, especially in the presence of 'branchy' networks. This makes it difficult to analyze memory usage for each call. By running the ToANormalForm pass to insert a series of let bindings before the lowering and codegen stages, we can establish an ordering for the evaluation of the external calls, thus allowing reliable analysis of memory usage.
Additionally contains a couple of error message improvements and comment corrections.
cc @manupa-arm @Mousius @areusch @mbs-octoml @ashutosh-arm @ekalda @NicolaLancellotti