-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Opt] Global variable optimizations #857
Conversation
What is this for? taichi/taichi/transforms/lower_access.cpp Line 155 in df1c2fa
|
In vectorized cases we still have to load the vector lanes one by one. Then we need this line to merge the lanes into a vector. |
It seems to me that there are 3 classes of statements referring to a global variable: Which classes are loading global variables, and which classes are modifying them? |
We need to do the optimization before Then we only have to treat |
So are there only |
Yes |
Is Maybe I should |
please assert for now |
# Conflicts: # taichi/ir/ir.h
I found that in mpm99.py, |
for (int i = 0; i < (int)ptr1->indices.size(); i++) { | ||
if (!irpass::analysis::same_statements(ptr1->indices[i], | ||
ptr2->indices[i])) { | ||
if (ptr1->indices[i]->is<ConstStmt>() && | ||
ptr2->indices[i]->is<ConstStmt>()) { | ||
// different constants | ||
return false; | ||
} | ||
} | ||
} | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to use value_diff
to make this function analyze better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do that in a future PR. value_diff
has not been touched for at least 6 months...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I won't have time to go through every detail in variable_optimization.cpp
before the group meeting, but I trust your impl. Please fix the nits.
@@ -18,6 +18,41 @@ IRBuilder ¤t_ast_builder() { | |||
return context->builder(); | |||
} | |||
|
|||
bool maybe_same_address(Stmt *var1, Stmt *var2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe_same_global_address
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included the Alloca
case (i.e. this function can check if two allocas are the same by directly checking if var1 == var2
) although it is not used in that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If both Stmt*
are allocas, they have the same address iff var1 == var2
.
If only one of them is an alloca, they can never share the same address.
If both of them are global temps, they have the same address iff they have the same offset. (Is checking ret_type unnecessary?)
If only one of them is a global temp, they can never share the same address.
If both of them are global ptrs (or GetChStmt
s), we can check by SNode::id
.
In other cases (probably after lower_access
), we don't know if the two statements share the same address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is checking ret_type unnecessary?
A well-formed IR should not have cases with different ret_type but it's good to be safe here.
The logic here sounds good. Maybe we should copy-paste these as comments to the code :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that offset
is a sufficient condition of "might be the same address".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. In case you are interested, there's the LLVM's design of alias analysis: https://llvm.org/docs/AliasAnalysis.html
for (int i = 0; i < (int)ptr1->indices.size(); i++) { | ||
if (!irpass::analysis::same_statements(ptr1->indices[i], | ||
ptr2->indices[i])) { | ||
if (ptr1->indices[i]->is<ConstStmt>() && | ||
ptr2->indices[i]->is<ConstStmt>()) { | ||
// different constants | ||
return false; | ||
} | ||
} | ||
} | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do that in a future PR. value_diff
has not been touched for at least 6 months...
ptr2->indices[i]->is<ConstStmt>()) { | ||
// different constants | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused here: shouldn't we just return false here whenever two statements are not the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same_statements
is checking if two statements are guaranteed to be the same.
For here, however, we want to check if two statements are possible to be the same. For example, arg[0]
and arg[1]
can be the same while same_statements
tell us that they're different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I understand. Thanks. Let's briefly document: return true when two statements might be the same address; false when two statements cannot be the same address.
@@ -117,12 +117,16 @@ void StateMachine::store(Stmt *store_stmt) { | |||
} | |||
|
|||
void StateMachine::load(Stmt *load_stmt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's briefly document the behavior when load_stmt == nullptr
.
@@ -1162,7 +1162,6 @@ void full_simplify(IRNode *root, const CompileConfig &config, Kernel *kernel) { | |||
alg_simp(root, config); | |||
die(root); | |||
whole_kernel_cse(root); | |||
variable_optimization(root); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, avoid doing some variable optimizations leads to final optimization on some tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, now you need something like https://ieeexplore.ieee.org/document/1611550
May I merge this into master now? |
Yes, please. I trust your impl. |
Oh, I forgot to use the |
Related issue = #656
Related PR = #859
Also dive into container statements to eliminate identical global loads.
Benchmark:
Before this PR:
After this PR:
After this PR, before 17fec6a:
After this PR, before fc14b95:
After this PR, before 5837637:
mpm99.py: 2900 -> 2875 statements. Before fc14b95, it's 2898 statements.
[Click here for the format server]