model speedup problem using pytorch when multi-output passed across model #2756

LovPe · 2020-07-31T13:08:46Z

Hi, thanks for amazing work
I use nni speedup on a model with 2 submodel, the first result passed to second like this:
modelA --> modelB,
modelA has 2 outputs and passed to modelB as input, like this:
modelA.output1 --> convOP --> modelB.output1
modelA.output2 --> convOP --> modelB.output2
The problem is that after building the graph using TorchModuleGraph in _graph_utils.py, The convOP Node input can not find
correctly in modelA since there exist a prim::TupleConstruct between modelA and modelB , TorchModuleGraph can not go over this op.
I wonder if there are any solution for this problem. wish for your reply. Thanks a lot

QuanluZhang · 2020-08-03T05:09:30Z

hi @LovPe , this issue has been fixed by #2609 , will be included in the next release.

LovPe · 2020-08-04T04:32:24Z

Thanks for reply, I tried new version in TorchModuleGraph and applied unpack_manually() method before speed up model.
but i find a problem when there exist 2 successive pack&unpack pairs,
for example:
A─1─>(pack1─>unpack1)─2─>(pack2─>unpack2)─3─>B
1, 2, 3 means edges， after unpack, the result is ：
┏───────────2─────────↓
A (pack1-->unpack1) (pack2-->unpack2) B
┗ ─────1─────↑ ↑
┗ ───────── 3 ──────────┛
which i think the input to node of 1 should also point to node B. May this be a bug?

zheng-ningxin · 2020-08-04T05:17:55Z

Hi~ @LovPe Could you please show the code snippet of the connecting part of the two models？I'll build a similar example and see if we can handle this scenario. Thanks

LovPe · 2020-08-04T09:39:28Z

@zheng-ningxin The test code:

from torch import nn
import torch
from nni._graph_utils import TorchModuleGraph


class CBR(nn.Module):
    def __init__(self, i, o):
        super(CBR, self).__init__()
        self.conv1 = nn.Conv2d(i, o, kernel_size=1)
        self.bn1 = nn.BatchNorm2d(o)
        self.act1 = nn.ReLU()

    def forward(self, x):
        return self.act1(self.bn1(self.conv1(x)))


class A(nn.Module):
    def __init__(self):
        super(A, self).__init__()
        self.conv1 = CBR(3, 6, )
        self.conv2 = CBR(6, 8, )
        self.conv3 = CBR(6, 12)

    def forward(self, x):
        x1 = self.conv1(x)
        x2 = self.conv2(x1)
        x3 = self.conv3(x1)
        return [x2, x3]


class B1(nn.Module):
    def __init__(self):
        super(B1, self).__init__()
        self.conv1 = CBR(12, 32)
        self.conv2 = CBR(32, 32)
        self.conv3 = CBR(32, 32)

    def forward(self, x):
        ret = list()
        x = self.conv1(x)
        ret.append(x)
        x = self.conv2(x)
        ret.append(x)
        x = self.conv3(x)
        ret.append(x)
        return ret


class B(nn.Module):
    def __init__(self):
        super(B, self).__init__()
        self.b = B1()

    def forward(self, x):
        return self.b(x[-1])


class C(nn.Module):
    def __init__(self):
        super(C, self).__init__()
        self.conv1 = CBR(8, 32)
        self.conv2 = CBR(12, 32)
        self.conv3 = CBR(32, 32)
        self.conv4 = CBR(32, 32)
        self.conv5 = CBR(32, 32)

    def forward(self, x):
        out = list()
        out.append(self.conv1(x[0]))
        out.append(self.conv2(x[1]))
        out.append(self.conv3(x[2]))
        out.append(self.conv4(x[3]))
        out.append(self.conv5(x[4]))
        return out


class TestMod(nn.Module):
    def __init__(self):
        super(TestMod, self).__init__()
        self.a = A()
        self.b = B()
        self.dummy = Dummy()
        self.c = C()

    def forward(self, x):
        x_a = self.a(x)
        x_b = self.b(x_a)
        x_a.extend(x_b)
        xc = self.c(x_a)
        return xc


if __name__ == "__main__":
    dummy_input = torch.randn([1, 3, 28, 28])
    m_test = TestMod()
    graph = TorchModuleGraph(m_test, dummy_input)
    for ii in graph.name_to_node:
        print("node {}, successor: {}".format(ii, graph.find_successors(ii)))
    print("=" * 100)
    graph.unpack_manually()
    for ii in graph.name_to_node:
        print("node {}, successor: {}".format(ii, graph.find_successors(ii)))

After running the code, you can get result like this:

Left is parameters before unpack_manually(), Right is After..
I think the output of unpack_manually() should not contain TupleUnpack node if it is not final node, since the unpack_manually() method is doing so. Can this be solved? Thanks

zheng-ningxin · 2020-08-05T05:42:13Z

Hi, ~ @LovPe Currently, NNI cannot handle the multiple successive pack&unpack pairs. We will support it as soon as possible, Thanks for the feedback~

In addition, for other users who see this issue to understand the network structure more conveniently, I drew the network topology:

LovPe · 2020-08-05T06:23:38Z

@zheng-ningxin Thanks for quick reply~
The graph you draw is a version after unpack_manually(), maybe the graph before unpack_manually() is also useful?
And By the way, how to draw a graph as you provide? Can you share the tools? Thanks a lot~

zheng-ningxin · 2020-08-05T06:40:48Z

@LovPe Sure~ Here is the visualization tool:

import graphviz
import torch
import torchvision
from nni._graph_utils import TorchModuleGraph

M = {}
nodecount = 0
visited = set()


def traverse(mg, curnode, lastnode, graph):
    global nodecount
    print('Visiting {} , from {}'.format(curnode, lastnode))
    if curnode in visited:
        if lastnode is not None:
            graph.edge(M[lastnode], M[curnode])
        return
    visited.add(curnode)
    M[curnode] = str(nodecount)
    nodecount += 1
    render_cfg = {'shape': 'ellipse', 'style': 'solid'}
    nodestring = mg.name_to_node[curnode].name + \
        '\n'+mg.name_to_node[curnode].op_type
    graph.node(M[curnode], nodestring, **render_cfg)
    if lastnode is not None:
        graph.edge(M[lastnode], M[curnode])
    nexts = mg.find_successors(curnode)

    for _next in nexts:
        traverse(mg, _next, curnode, graph)


def visualize(modulegraph, savepath):
    graph = graphviz.Digraph(format='jpg')

    for name, nodeio in mg.nodes_py.nodes_io.items():

        if nodeio.input_or_output == 'input':
            nodes = mg.input_to_node[name]
            for node in nodes:
                traverse(mg, node.name, None, graph)

    graph.render(savepath)


if __name__ == '__main__':
    modelname = 'resnet18'
    model = getattr(torchvision.models, modelname)
    net = model()

    net.cuda()
    data = torch.ones(1, 3, 224, 224).cuda()

    mg = TorchModuleGraph(net, data)
    mg.unpack_manually()
    visualize(mg, './test_resnet18')

LovPe · 2020-08-05T07:02:37Z

Thanks a lot!
Here is the graph before unpack_manually() method

zheng-ningxin · 2020-08-06T05:19:58Z

@LovPe Hi~ I have submitted a pr to support the scenario that may have multiple successive tuple-unpack operations in the graph. #2768. Now the graph will look like
.
You can try your model again after this pr is merged. Thanks~

LovPe · 2020-08-06T09:45:13Z

Hi @zheng-ningxin Thanks for quick reply. The change can be solved for this case, there still a issue for me 🤣
demo code like this:

from torch import nn
import torch
from nni._graph_utils import TorchModuleGraph


class CBR(nn.Module):
    def __init__(self, i, o):
        super(CBR, self).__init__()
        self.conv1 = nn.Conv2d(i, o, kernel_size=1)
        self.bn1 = nn.BatchNorm2d(o)
        self.act1 = nn.ReLU()

    def forward(self, x):
        return self.act1(self.bn1(self.conv1(x)))


class A(nn.Module):
    def __init__(self):
        super(A, self).__init__()
        self.conv1 = CBR(3, 6, )
        self.conv2 = CBR(6, 8, )
        self.conv3 = CBR(6, 12)

    def forward(self, x):
        x1 = self.conv1(x)
        x2 = self.conv2(x1)
        x3 = self.conv3(x1)
        return [x2, x3]


class B1(nn.Module):
    def __init__(self):
        super(B1, self).__init__()
        self.conv1 = CBR(12, 32)
        self.conv2 = CBR(32, 32)
        self.conv3 = CBR(32, 32)

    def forward(self, x):
        ret = list()
        x = self.conv1(x)
        ret.append(x)
        x = self.conv2(x)
        ret.append(x)
        x = self.conv3(x)
        ret.append(x)
        return ret


class B(nn.Module):
    def __init__(self):
        super(B, self).__init__()
        self.b = B1()

    def forward(self, x):
        return self.b(x[-1])


class C(nn.Module):
    def __init__(self):
        super(C, self).__init__()
        self.conv1_1 = CBR(8, 32)
        self.conv1_2 = CBR(8, 32)
        self.conv2_1 = CBR(12, 32)
        self.conv2_2 = CBR(12, 32)
        self.conv3_1 = CBR(32, 32)
        self.conv3_2 = CBR(32, 32)
        self.conv4_1 = CBR(32, 32)
        self.conv4_2 = CBR(32, 32)
        self.conv5_1 = CBR(32, 32)
        self.conv5_2 = CBR(32, 32)

    def forward(self, x):
        out_a = list()
        out_b = list()
        out_a.append(self.conv1_1(x[0]))
        out_a.append(self.conv2_1(x[1]))
        out_a.append(self.conv3_1(x[2]))
        out_a.append(self.conv4_1(x[3]))
        out_a.append(self.conv5_1(x[4]))
        out_b.append(self.conv1_2(x[0]))
        out_b.append(self.conv2_2(x[1]))
        out_b.append(self.conv3_2(x[2]))
        out_b.append(self.conv4_2(x[3]))
        out_b.append(self.conv5_2(x[4]))
        return out_a, out_b


class TestMod(nn.Module):
    def __init__(self):
        super(TestMod, self).__init__()
        self.a = A()
        self.b = B()
        self.c = C()

    def forward(self, x):
        x_a = self.a(x)
        x_b = self.b(x_a)
        x_a.extend(x_b)
        xc = self.c(x_a)
        return xc


if __name__ == "__main__":
    from experiment.vis_nni_graph import visualize
    dummy_input = torch.randn([1, 3, 28, 28])
    m_test = TestMod()
    graph = TorchModuleGraph(m_test, dummy_input)
    for ii in graph.name_to_node:
        print("node {}, successor: {}".format(ii, graph.find_successors(ii)))
    visualize(graph, "./test_before")
    print("=" * 100)
    graph.unpack_manually()
    for ii in graph.name_to_node:
        print("node {}, successor: {}".format(ii, graph.find_successors(ii)))
    visualize(graph, "./test_after")

assertion trigerd:
File "/home/server_gpu/anaconda3/envs/py36_ch/lib/python3.6/site-packages/nni/_graph_utils.py", line 566, in unpack_manually
assert len(node.inputs) == len(list(last_cpp.inputs()))
The output graph before unpacking like this:

Maybe the code should like this（line 584-588 in _graph_utils.py）: 🤔

                        if _debug_output in self.input_to_node:
                            for following_node in self.input_to_node[_debug_output]:
                                following_node.inputs.remove(_debug_output)
                                # Do not go into if, since there may exist 2 same tensor in pack&unpack pair
                                # if _debug_input not in following_node.inputs:   
                                following_node.inputs.append(_debug_input)

zheng-ningxin · 2020-08-10T06:03:55Z

@LovPe Sorry for the late reply. I have updated the pr (#2768), please try it again, thanks~
It turns out that the assertion was triggered by an error elsewhere, and I have fixed it.

The unpacked topology currently looks like:

LovPe · 2020-08-11T02:36:19Z

Thanks for help. The new implementation is much clear and it works well for me~
This issue will close after it merged.

tigert1998 · 2020-12-09T15:18:06Z

It seems that speeding up (nni.compression.pytorch.speedup.ModelSpeedup) a model with list/tuple pack/unpack is still not supported up until now (current master). List/tuple unpack is not supported in infer_shape.

RuntimeError: Has not supported infering output shape from input shape for module/function: `prim::TupleUnpack`, .prim::TupleUnpack.220

@QuanluZhang

QuanluZhang · 2020-12-10T02:30:00Z

@tigert1998 we only supported prim::TupleUnpack for resolving mask conflict for now. have not supported it the main speedup process. Thanks for reporting your requirement, we will support it soon.

xuezu29 · 2021-04-06T02:57:17Z

@tigert1998 we only supported prim::TupleUnpack for resolving mask conflict for now. have not supported it the main speedup process. Thanks for reporting your requirement, we will support it soon.
Looking forward to support for the prim::TupleUnpack in speedup process. very thankful ！

scarlett2018 · 2021-06-09T08:49:12Z

@tigert1998 we only supported prim::TupleUnpack for resolving mask conflict for now. have not supported it the main speedup process. Thanks for reporting your requirement, we will support it soon.

@QuanluZhang @zheng-ningxin - ping to check the latest status of this fix.

zheng-ningxin · 2021-06-10T02:11:44Z

@scarlett2018 Same issue with this one, please refer it for more details.

@xuezu29 I have fix this in the refactored speedup, please have a try when this pr merged, or you can clone the corresponding branch and compile the NNI manually. Please let me know if there is more scenarios to support. Thanks.

LovPe changed the title ~~model speedup problem when multi-output passed across model~~ model speedup problem using pytorch when multi-output passed across model Aug 1, 2020

scarlett2018 assigned QuanluZhang Aug 3, 2020

scarlett2018 added ModelSpeedup user raised question Further information is requested labels Aug 3, 2020

QuanluZhang assigned zheng-ningxin Aug 4, 2020

LovPe closed this as completed Aug 14, 2020

QuanluZhang reopened this Dec 10, 2020

tczhangzhi mentioned this issue Feb 1, 2021

Add TupleConstruct to allow (a, b)=SomeModule() #3357

Merged

scarlett2018 added support tobe-overdued and removed question Further information is requested labels Jun 9, 2021

zheng-ningxin closed this as completed Jun 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model speedup problem using pytorch when multi-output passed across model #2756

model speedup problem using pytorch when multi-output passed across model #2756

LovPe commented Jul 31, 2020

QuanluZhang commented Aug 3, 2020

LovPe commented Aug 4, 2020 •

edited

Loading

zheng-ningxin commented Aug 4, 2020 •

edited

Loading

LovPe commented Aug 4, 2020 •

edited

Loading

zheng-ningxin commented Aug 5, 2020

LovPe commented Aug 5, 2020

zheng-ningxin commented Aug 5, 2020

LovPe commented Aug 5, 2020

zheng-ningxin commented Aug 6, 2020 •

edited

Loading

LovPe commented Aug 6, 2020 •

edited

Loading

zheng-ningxin commented Aug 10, 2020

LovPe commented Aug 11, 2020

tigert1998 commented Dec 9, 2020 •

edited

Loading

QuanluZhang commented Dec 10, 2020

xuezu29 commented Apr 6, 2021

scarlett2018 commented Jun 9, 2021

zheng-ningxin commented Jun 10, 2021 •

edited

Loading

model speedup problem using pytorch when multi-output passed across model #2756

model speedup problem using pytorch when multi-output passed across model #2756

Comments

LovPe commented Jul 31, 2020

QuanluZhang commented Aug 3, 2020

LovPe commented Aug 4, 2020 • edited Loading

zheng-ningxin commented Aug 4, 2020 • edited Loading

LovPe commented Aug 4, 2020 • edited Loading

zheng-ningxin commented Aug 5, 2020

LovPe commented Aug 5, 2020

zheng-ningxin commented Aug 5, 2020

LovPe commented Aug 5, 2020

zheng-ningxin commented Aug 6, 2020 • edited Loading

LovPe commented Aug 6, 2020 • edited Loading

zheng-ningxin commented Aug 10, 2020

LovPe commented Aug 11, 2020

tigert1998 commented Dec 9, 2020 • edited Loading

QuanluZhang commented Dec 10, 2020

xuezu29 commented Apr 6, 2021

scarlett2018 commented Jun 9, 2021

zheng-ningxin commented Jun 10, 2021 • edited Loading

LovPe commented Aug 4, 2020 •

edited

Loading

zheng-ningxin commented Aug 4, 2020 •

edited

Loading

LovPe commented Aug 4, 2020 •

edited

Loading

zheng-ningxin commented Aug 6, 2020 •

edited

Loading

LovPe commented Aug 6, 2020 •

edited

Loading

tigert1998 commented Dec 9, 2020 •

edited

Loading

zheng-ningxin commented Jun 10, 2021 •

edited

Loading