Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Opt] Global variable optimizations #857

Merged
merged 15 commits into from
Apr 28, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions taichi/ir/ir.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,67 @@ IRBuilder &current_ast_builder() {
return context->builder();
}

bool maybe_same_address(Stmt *var1, Stmt *var2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe_same_global_address?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included the Alloca case (i.e. this function can check if two allocas are the same by directly checking if var1 == var2) although it is not used in that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both Stmt* are allocas, they have the same address iff var1 == var2.

If only one of them is an alloca, they can never share the same address.

If both of them are global temps, they have the same address iff they have the same offset. (Is checking ret_type unnecessary?)

If only one of them is a global temp, they can never share the same address.

If both of them are global ptrs (or GetChStmts), we can check by SNode::id.

In other cases (probably after lower_access), we don't know if the two statements share the same address.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is checking ret_type unnecessary?

A well-formed IR should not have cases with different ret_type but it's good to be safe here.

The logic here sounds good. Maybe we should copy-paste these as comments to the code :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that offset is a sufficient condition of "might be the same address".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. In case you are interested, there's the LLVM's design of alias analysis: https://llvm.org/docs/AliasAnalysis.html

// Return true when two statements might be the same address;
// false when two statements cannot be the same address.

// If both stmts are allocas, they have the same address iff var1 == var2.
// If only one of them is an alloca, they can never share the same address.
if (var1 == var2)
return true;
if (var1->is<AllocaStmt>() || var2->is<AllocaStmt>())
yuanming-hu marked this conversation as resolved.
Show resolved Hide resolved
return false;

// If both statements are global temps, they have the same address iff they
// have the same offset. If only one of them is a global temp, they can never
// share the same address.
if (var1->is<GlobalTemporaryStmt>() || var2->is<GlobalTemporaryStmt>()) {
if (!var1->is<GlobalTemporaryStmt>() || !var2->is<GlobalTemporaryStmt>())
return false;
return var1->as<GlobalTemporaryStmt>()->offset ==
var2->as<GlobalTemporaryStmt>()->offset;
}

// If both statements are GlobalPtrStmts or GetChStmts, we can check by
// SNode::id.
TI_ASSERT(var1->width() == 1);
TI_ASSERT(var2->width() == 1);
auto get_snode_id = [](Stmt *s) {
if (auto ptr = s->cast<GlobalPtrStmt>())
return ptr->snodes[0]->id;
else if (auto get_child = s->cast<GetChStmt>())
return get_child->output_snode->id;
else
return -1;
};
int snode1 = get_snode_id(var1);
int snode2 = get_snode_id(var2);
if (snode1 != -1 && snode2 != -1 && snode1 != snode2)
return false;

// GlobalPtrStmts with guaranteed different indices cannot share the same
// address.
if (var1->is<GlobalPtrStmt>() && var2->is<GlobalPtrStmt>()) {
auto ptr1 = var1->as<GlobalPtrStmt>();
auto ptr2 = var2->as<GlobalPtrStmt>();
for (int i = 0; i < (int)ptr1->indices.size(); i++) {
if (!irpass::analysis::same_statements(ptr1->indices[i],
ptr2->indices[i])) {
if (ptr1->indices[i]->is<ConstStmt>() &&
ptr2->indices[i]->is<ConstStmt>()) {
// different constants
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused here: shouldn't we just return false here whenever two statements are not the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same_statements is checking if two statements are guaranteed to be the same.

For here, however, we want to check if two statements are possible to be the same. For example, arg[0] and arg[1] can be the same while same_statements tell us that they're different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I understand. Thanks. Let's briefly document: return true when two statements might be the same address; false when two statements cannot be the same address.

}
}
return true;
Comment on lines +64 to +74
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to use value_diff to make this function analyze better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do that in a future PR. value_diff has not been touched for at least 6 months...

}

// In other cases (probably after lower_access), we don't know if the two
// statements share the same address.
return true;
}

std::string VectorType::pointer_suffix() const {
if (is_pointer()) {
return "*";
Expand Down
4 changes: 3 additions & 1 deletion taichi/ir/ir.h
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ void die(IRNode *root);
void simplify(IRNode *root, Kernel *kernel = nullptr);
void alg_simp(IRNode *root, const CompileConfig &config);
void whole_kernel_cse(IRNode *root);
void variable_optimization(IRNode *root);
void variable_optimization(IRNode *root, bool after_lower_access);
void full_simplify(IRNode *root,
const CompileConfig &config,
Kernel *kernel = nullptr);
Expand Down Expand Up @@ -144,6 +144,8 @@ void verify(IRNode *root);

IRBuilder &current_ast_builder();

bool maybe_same_address(Stmt *var1, Stmt *var2);

struct VectorType {
private:
bool _is_pointer;
Expand Down
18 changes: 14 additions & 4 deletions taichi/ir/state_machine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ TLANG_NAMESPACE_BEGIN

std::unique_ptr<std::unordered_set<AtomicOpStmt *>> StateMachine::used_atomics;

StateMachine::StateMachine(Stmt *var)
StateMachine::StateMachine(Stmt *var, bool zero_initialized)
: var(var),
stored(never),
stored_in_this_if_or_loop(never),
Expand All @@ -16,8 +16,8 @@ StateMachine::StateMachine(Stmt *var)
last_atomic(nullptr),
last_atomic_eliminable(false),
maybe_loaded_before_first_definite_store_in_this_if_or_loop(false) {
TI_ASSERT(var->is<AllocaStmt>() || var->is<GlobalTemporaryStmt>() ||
var->is<GlobalPtrStmt>());
if (!zero_initialized)
stored = stored_in_this_if_or_loop = maybe;
}

bool StateMachine::same_data(Stmt *store_stmt1, Stmt *store_stmt2) {
Expand Down Expand Up @@ -117,12 +117,18 @@ void StateMachine::store(Stmt *store_stmt) {
}

void StateMachine::load(Stmt *load_stmt) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's briefly document the behavior when load_stmt == nullptr.

TI_ASSERT(load_stmt->is<LocalLoadStmt>() || load_stmt->is<GlobalLoadStmt>());
// The load_stmt == nullptr case is only for an offloaded range_for loading
// global temps via begin_offset and end_offset.
if (load_stmt)
TI_ASSERT(load_stmt->is<LocalLoadStmt>() ||
load_stmt->is<GlobalLoadStmt>());
if (stored_in_this_if_or_loop != definitely)
maybe_loaded_before_first_definite_store_in_this_if_or_loop = true;
loaded = loaded_in_this_if_or_loop = definitely;
last_store_eliminable = false;
last_atomic_eliminable = false;
if (!load_stmt)
return;

if (stored == never) {
auto zero = load_stmt->insert_after_me(Stmt::make<ConstStmt>(
Expand Down Expand Up @@ -423,4 +429,8 @@ void StateMachine::finalize() {
}
}

Stmt *StateMachine::get_var() const {
return var;
}

TLANG_NAMESPACE_END
6 changes: 4 additions & 2 deletions taichi/ir/state_machine.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ class StateMachine {
StateMachine() {
TI_ERROR("StateMachine constructor invoked with no parameters.")
}
explicit StateMachine(Stmt *var);
explicit StateMachine(Stmt *var, bool zero_initialized);

// This must be called before using StateMachine to eliminate AtomicOpStmts.
static void rebuild_atomics_usage(IRNode *root);
Expand All @@ -54,7 +54,7 @@ class StateMachine {

void atomic_op(AtomicOpStmt *stmt);
void store(Stmt *store_stmt);
void load(Stmt *load_stmt);
void load(Stmt *load_stmt = nullptr);

void continue_or_break();

Expand All @@ -71,6 +71,8 @@ class StateMachine {
void merge_from_loop(const StateMachine &loop);

void finalize();

Stmt *get_var() const;
};

TLANG_NAMESPACE_END
5 changes: 5 additions & 0 deletions taichi/transforms/compile_to_offloads.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,10 @@ void compile_to_offloads(IRNode *ir,
irpass::analysis::verify(ir);
}

irpass::variable_optimization(ir, false);
print("Store forwarded");
irpass::analysis::verify(ir);

if (lower_global_access) {
irpass::lower_access(ir, true);
print("Access lowered");
Expand Down Expand Up @@ -101,6 +105,7 @@ void compile_to_offloads(IRNode *ir,
irpass::demote_atomics(ir);
print("Atomics demoted");

irpass::variable_optimization(ir, true);
irpass::full_simplify(ir, config);
print("Simplified III");

Expand Down
34 changes: 25 additions & 9 deletions taichi/transforms/simplify.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -494,13 +494,30 @@ class BasicBlockSimplify : public IRVisitor {
// no store to the var?
bool has_store = false;
for (int j = i + 1; j < current_stmt_id; j++) {
if (block->statements[j]
->is_container_statement()) { // no if, while, etc..
has_store = true;
break;
if (!advanced_optimization) {
if (block->statements[j]
->is_container_statement()) { // no if, while, etc..
has_store = true;
break;
}
if (block->statements[j]->is<GlobalStoreStmt>()) {
has_store = true;
}
continue;
}
if (block->statements[j]->is<GlobalStoreStmt>()) {
if (!irpass::analysis::gather_statements(
block->statements[j].get(),
[&](Stmt *s) {
if (auto store = s->cast<GlobalStoreStmt>())
return maybe_same_address(store->ptr, stmt->ptr);
else if (auto atomic = s->cast<AtomicOpStmt>())
return maybe_same_address(atomic->dest, stmt->ptr);
else
return false;
})
.empty()) {
has_store = true;
break;
}
}
if (!has_store) {
Expand Down Expand Up @@ -1141,12 +1158,11 @@ void simplify(IRNode *root, Kernel *kernel) {

void full_simplify(IRNode *root, const CompileConfig &config, Kernel *kernel) {
constant_fold(root);
if (advanced_optimization)
if (advanced_optimization) {
alg_simp(root, config);
if (advanced_optimization)
die(root);
whole_kernel_cse(root);
if (advanced_optimization)
variable_optimization(root);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, avoid doing some variable optimizations leads to final optimization on some tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, now you need something like https://ieeexplore.ieee.org/document/1611550

}
die(root);
simplify(root, kernel);
die(root);
Expand Down
Loading