Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[async] Partial SFG node GC (keep latest state writers/readers) #1915

Merged
merged 25 commits into from
Oct 4, 2020

Conversation

xumingkuan
Copy link
Contributor

@xumingkuan xumingkuan commented Oct 1, 2020

Related issue = #742

This PR keeps the latest state writers/readers in the SFG when all nodes are extracted to execute.

Most functions of the SFG need to be modified. This PR modifies

  • fuse()
  • optimize_dead_store()
  • optimize_listgen()
  • topo_sort_nodes()
  • verify()

This PR doesn't modify

  • print()
  • dump_dot()

These functions don't need to be modified:

  • demote_activation()

This PR also tried to modify fuse() so that only tasks that are "near" in the topological order (i.e., position in nodes_ differ by < 512) are fused. This improves the complexity of a single invocation of fuse() from O(nm/64) to O(min(n, 512*4)*m/64 + nm/512)...

Benchmark: async_mgpcg.py, time of fuse() on kun: 0.746s (0.355s unaccounted) -> 1.334s (0.800s rebuild graph, 0.559s insert task)
(2.760s at commit c691248, 1.387s at commit da2a0f2 (0.920s rebuild graph), 1.243s at commit bc6f17f (0.704s rebuild graph))

[Click here for the format server]


@xumingkuan
Copy link
Contributor Author

Do we need bool executed in the SFGNode?

@yuanming-hu
Copy link
Member

Do we need bool executed in the SFGNode?

Sounds like a good idea!

@xumingkuan xumingkuan marked this pull request as ready for review October 2, 2020 08:53
Comment on lines 851 to 972
for (int i = 1; i < (int)nodes_.size(); i++) {
for (int i = first_pending_task_index_; i < (int)nodes_.size(); i++) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to make use of get_pending_tasks(), but I make the changeset minimal here to avoid conflicts with #1907.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, i merged #1907 and it has created mrege conflict now.. But is it possible to use get_pending_tasks() now?

@xumingkuan xumingkuan changed the title [async] Partial SFG node GC (keep latest state writers/readers) [async] Partial SFG node GC and improve fusion speed Oct 2, 2020
@xumingkuan
Copy link
Contributor Author

async_mgpcg.py fails in optimize_listgen now. Will debug later.

@codecov
Copy link

codecov bot commented Oct 2, 2020

Codecov Report

Merging #1915 into master will increase coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1915      +/-   ##
==========================================
+ Coverage   43.83%   43.86%   +0.02%     
==========================================
  Files          45       45              
  Lines        6194     6199       +5     
  Branches     1100     1101       +1     
==========================================
+ Hits         2715     2719       +4     
- Misses       3310     3311       +1     
  Partials      169      169              
Impacted Files Coverage Δ
python/taichi/lang/impl.py 66.84% <0.00%> (+0.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1c6d7c...2b1b903. Read the comment docs.

@xumingkuan xumingkuan changed the title [async] Partial SFG node GC and improve fusion speed [async] Partial SFG node GC (keep latest state writers/readers) Oct 2, 2020
@xumingkuan xumingkuan requested review from yuanming-hu and k-ye October 2, 2020 15:56
Copy link
Member

@k-ye k-ye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, most parts LGTM!

I have one suggestion. It took me some amount of time to wrap my head around how first_pending_task_index_ and pending_node_ids are maintained. IMHO, we only need to re-id pending node IDs when first_pending_task_index_ has been modified:

  • rebuild_graph(). Because we could remove empty tasks, first_pending_task_index_ might shrink. (* if first_pending_task_index_ doesn't change, does that imply we don't even have to re-id?)
  • mark_pending_tasks_as_executed(). After this, all the nodes_ should be in the executed state. (We don't have to do another re-id pass if we just do pending_node_id = -1 or mark_execution() as L112's for-loop goes.)

For other places that currently updates pending_node_id, i.e. get_pending_tasks(begin, end) and topo_sort_nodes(), I think we should instead verifies/asserts on these IDs. If they don't meet the verification, that could very well indicate a bug. WDYT?

taichi/program/state_flow_graph.cpp Outdated Show resolved Hide resolved
taichi/program/state_flow_graph.cpp Outdated Show resolved Hide resolved
taichi/program/state_flow_graph.cpp Show resolved Hide resolved
taichi/program/state_flow_graph.h Outdated Show resolved Hide resolved
@xumingkuan xumingkuan requested a review from k-ye October 3, 2020 17:28
Copy link
Member

@k-ye k-ye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, LGTM!

taichi/program/state_flow_graph.cpp Show resolved Hide resolved
@@ -783,17 +916,16 @@ void StateFlowGraph::delete_nodes(

nodes_ = std::move(new_nodes_);
reid_nodes();
reid_pending_nodes();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, can we assert that all the nodes in indices_to_delete are pending?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can refactor indices_to_delete such that the pending_node_id instead of node_id is passed into the unordered_map?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, that should be better :) I saw that you have explicitly converted a pending index into a "global" one when populating indices_to_delete..

@xumingkuan
Copy link
Contributor Author

Note: async_mgpcg.py doesn't need to delete any nodes in optimize_dead_store.

@yuanming-hu yuanming-hu merged commit 16e6bc3 into taichi-dev:master Oct 4, 2020
@yuanming-hu yuanming-hu mentioned this pull request Oct 7, 2020
@xumingkuan xumingkuan deleted the sfg-gc branch October 12, 2020 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants