-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[async] Partial SFG node GC (keep latest state writers/readers) #1915
Conversation
Do we need |
Sounds like a good idea! |
…ng_node_id = -1 for other nodes
taichi/program/state_flow_graph.cpp
Outdated
for (int i = 1; i < (int)nodes_.size(); i++) { | ||
for (int i = first_pending_task_index_; i < (int)nodes_.size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to make use of get_pending_tasks()
, but I make the changeset minimal here to avoid conflicts with #1907.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, i merged #1907 and it has created mrege conflict now.. But is it possible to use get_pending_tasks()
now?
async_mgpcg.py fails in optimize_listgen now. Will debug later. |
Codecov Report
@@ Coverage Diff @@
## master #1915 +/- ##
==========================================
+ Coverage 43.83% 43.86% +0.02%
==========================================
Files 45 45
Lines 6194 6199 +5
Branches 1100 1101 +1
==========================================
+ Hits 2715 2719 +4
- Misses 3310 3311 +1
Partials 169 169
Continue to review full report at Codecov.
|
# Conflicts: # taichi/program/state_flow_graph.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, most parts LGTM!
I have one suggestion. It took me some amount of time to wrap my head around how first_pending_task_index_
and pending_node_id
s are maintained. IMHO, we only need to re-id pending node IDs when first_pending_task_index_
has been modified:
rebuild_graph()
. Because we could remove empty tasks,first_pending_task_index_
might shrink. (* iffirst_pending_task_index_
doesn't change, does that imply we don't even have to re-id?)mark_pending_tasks_as_executed()
. After this, all thenodes_
should be in the executed state. (We don't have to do another re-id pass if we just dopending_node_id = -1
ormark_execution()
as L112's for-loop goes.)
For other places that currently updates pending_node_id
, i.e. get_pending_tasks(begin, end)
and topo_sort_nodes()
, I think we should instead verifies/asserts on these IDs. If they don't meet the verification, that could very well indicate a bug. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, LGTM!
@@ -783,17 +916,16 @@ void StateFlowGraph::delete_nodes( | |||
|
|||
nodes_ = std::move(new_nodes_); | |||
reid_nodes(); | |||
reid_pending_nodes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, can we assert that all the nodes in indices_to_delete
are pending?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can refactor indices_to_delete
such that the pending_node_id
instead of node_id
is passed into the unordered_map
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, that should be better :) I saw that you have explicitly converted a pending index into a "global" one when populating indices_to_delete
..
Note: async_mgpcg.py doesn't need to delete any nodes in |
Related issue = #742
This PR keeps the latest state writers/readers in the SFG when all nodes are extracted to execute.
Most functions of the SFG need to be modified. This PR modifies
This PR doesn't modify
These functions don't need to be modified:
This PR also tried to modify
fuse()
so that only tasks that are "near" in the topological order (i.e., position innodes_
differ by< 512
) are fused. This improves the complexity of a single invocation offuse()
from O(nm/64) to O(min(n, 512*4)*m/64 + nm/512)...Benchmark: async_mgpcg.py, time of
fuse()
on kun: 0.746s (0.355s unaccounted) -> 1.334s (0.800s rebuild graph, 0.559s insert task)(2.760s at commit c691248, 1.387s at commit da2a0f2 (0.920s rebuild graph), 1.243s at commit bc6f17f (0.704s rebuild graph))
[Click here for the format server]