Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Opt] Global variable optimizations #857

Merged
merged 15 commits into from
Apr 28, 2020

Conversation

xumingkuan
Copy link
Contributor

@xumingkuan xumingkuan commented Apr 23, 2020

Related issue = #656
Related PR = #859

Also dive into container statements to eliminate identical global loads.

Benchmark:
Before this PR:
benchmark20200427
After this PR:
benchmark20200427_5

After this PR, before 17fec6a:
benchmark20200427_4
After this PR, before fc14b95:
benchmark20200427_3_with_other_variable_opt
After this PR, before 5837637:
benchmark20200427_2

mpm99.py: 2900 -> 2875 statements. Before fc14b95, it's 2898 statements.

[Click here for the format server]

@xumingkuan
Copy link
Contributor Author

xumingkuan commented Apr 23, 2020

What is this for?

auto merge = Stmt::make<ElementShuffleStmt>(lanes, true);

@yuanming-hu
Copy link
Member

What is this for?

auto merge = Stmt::make<ElementShuffleStmt>(lanes, true);

In vectorized cases we still have to load the vector lanes one by one. Then we need this line to merge the lanes into a vector.

@xumingkuan
Copy link
Contributor Author

It seems to me that there are 3 classes of statements referring to a global variable: GlobalTemporaryStmt, GlobalPtrStmt, GetChStmt. Is this correct?

Which classes are loading global variables, and which classes are modifying them?

@yuanming-hu
Copy link
Member

We need to do the optimization before lower_access, where pointers still have forms like x[i, j, k]. This will make the aliasing analysis much easier.

Then we only have to treat GlobalTempraryStmt and GlobalPtrStmt, since GetChStmt will only be generated after lower_access.

@xumingkuan
Copy link
Contributor Author

So are there only GlobalLoadStmt, GlobalStoreStmt, AtomicOpStmt that load or store GlobalTemporaryStmt, GlobalPtrStmt?

@yuanming-hu
Copy link
Member

Yes

@xumingkuan
Copy link
Contributor Author

Is GlobalPtrStmt accessing several SNodes with the same indices?

Maybe I should TI_ASSERT(width() == 1)...

@yuanming-hu
Copy link
Member

please assert for now

@xumingkuan
Copy link
Contributor Author

I found that in mpm99.py, lower_access is actually emitting many statements that seems optimizable, but we can't do the optimization without aliasing analysis...

Comment on lines +41 to +51
for (int i = 0; i < (int)ptr1->indices.size(); i++) {
if (!irpass::analysis::same_statements(ptr1->indices[i],
ptr2->indices[i])) {
if (ptr1->indices[i]->is<ConstStmt>() &&
ptr2->indices[i]->is<ConstStmt>()) {
// different constants
return false;
}
}
}
return true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to use value_diff to make this function analyze better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do that in a future PR. value_diff has not been touched for at least 6 months...

@xumingkuan xumingkuan requested a review from yuanming-hu April 28, 2020 02:12
@xumingkuan xumingkuan marked this pull request as ready for review April 28, 2020 02:29
Copy link
Member

@yuanming-hu yuanming-hu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I won't have time to go through every detail in variable_optimization.cpp before the group meeting, but I trust your impl. Please fix the nits.

@@ -18,6 +18,41 @@ IRBuilder &current_ast_builder() {
return context->builder();
}

bool maybe_same_address(Stmt *var1, Stmt *var2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe_same_global_address?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included the Alloca case (i.e. this function can check if two allocas are the same by directly checking if var1 == var2) although it is not used in that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both Stmt* are allocas, they have the same address iff var1 == var2.

If only one of them is an alloca, they can never share the same address.

If both of them are global temps, they have the same address iff they have the same offset. (Is checking ret_type unnecessary?)

If only one of them is a global temp, they can never share the same address.

If both of them are global ptrs (or GetChStmts), we can check by SNode::id.

In other cases (probably after lower_access), we don't know if the two statements share the same address.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is checking ret_type unnecessary?

A well-formed IR should not have cases with different ret_type but it's good to be safe here.

The logic here sounds good. Maybe we should copy-paste these as comments to the code :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that offset is a sufficient condition of "might be the same address".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. In case you are interested, there's the LLVM's design of alias analysis: https://llvm.org/docs/AliasAnalysis.html

taichi/ir/ir.cpp Show resolved Hide resolved
Comment on lines +41 to +51
for (int i = 0; i < (int)ptr1->indices.size(); i++) {
if (!irpass::analysis::same_statements(ptr1->indices[i],
ptr2->indices[i])) {
if (ptr1->indices[i]->is<ConstStmt>() &&
ptr2->indices[i]->is<ConstStmt>()) {
// different constants
return false;
}
}
}
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do that in a future PR. value_diff has not been touched for at least 6 months...

ptr2->indices[i]->is<ConstStmt>()) {
// different constants
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused here: shouldn't we just return false here whenever two statements are not the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same_statements is checking if two statements are guaranteed to be the same.

For here, however, we want to check if two statements are possible to be the same. For example, arg[0] and arg[1] can be the same while same_statements tell us that they're different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I understand. Thanks. Let's briefly document: return true when two statements might be the same address; false when two statements cannot be the same address.

@@ -117,12 +117,16 @@ void StateMachine::store(Stmt *store_stmt) {
}

void StateMachine::load(Stmt *load_stmt) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's briefly document the behavior when load_stmt == nullptr.

@@ -1162,7 +1162,6 @@ void full_simplify(IRNode *root, const CompileConfig &config, Kernel *kernel) {
alg_simp(root, config);
die(root);
whole_kernel_cse(root);
variable_optimization(root);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, avoid doing some variable optimizations leads to final optimization on some tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, now you need something like https://ieeexplore.ieee.org/document/1611550

@xumingkuan
Copy link
Contributor Author

May I merge this into master now?

@xumingkuan xumingkuan requested a review from yuanming-hu April 28, 2020 03:57
@yuanming-hu
Copy link
Member

Yes, please. I trust your impl.

@xumingkuan xumingkuan mentioned this pull request Apr 28, 2020
18 tasks
@xumingkuan xumingkuan merged commit c3168f9 into taichi-dev:master Apr 28, 2020
@xumingkuan
Copy link
Contributor Author

xumingkuan commented Apr 28, 2020

Oh, I forgot to use the advanced_optimization variable for variable_optimization...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants