Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

model speedup problem using pytorch when multi-output passed across model #2756

Closed
LovPe opened this issue Jul 31, 2020 · 17 comments
Closed

Comments

@LovPe
Copy link

LovPe commented Jul 31, 2020

Hi, thanks for amazing work
I use nni speedup on a model with 2 submodel, the first result passed to second like this:
modelA --> modelB,
modelA has 2 outputs and passed to modelB as input, like this:
modelA.output1 --> convOP --> modelB.output1
modelA.output2 --> convOP --> modelB.output2
The problem is that after building the graph using TorchModuleGraph in _graph_utils.py, The convOP Node input can not find
correctly in modelA since there exist a prim::TupleConstruct between modelA and modelB , TorchModuleGraph can not go over this op.
I wonder if there are any solution for this problem. wish for your reply. Thanks a lot

@LovPe LovPe changed the title model speedup problem when multi-output passed across model model speedup problem using pytorch when multi-output passed across model Aug 1, 2020
@scarlett2018 scarlett2018 added ModelSpeedup user raised question Further information is requested labels Aug 3, 2020
@QuanluZhang
Copy link
Contributor

hi @LovPe , this issue has been fixed by #2609 , will be included in the next release.

@LovPe
Copy link
Author

LovPe commented Aug 4, 2020

Thanks for reply, I tried new version in TorchModuleGraph and applied unpack_manually() method before speed up model.
but i find a problem when there exist 2 successive pack&unpack pairs,
for example:
A─1─>(pack1─>unpack1)─2─>(pack2─>unpack2)─3─>B
1, 2, 3 means edges, after unpack, the result is :
┏───────────2─────────↓
A (pack1-->unpack1) (pack2-->unpack2)           B
┗ ─────1─────↑                                   ↑
┗ ───────── 3 ──────────┛
which i think the input to node of 1 should also point to node B. May this be a bug?

@zheng-ningxin
Copy link
Contributor

zheng-ningxin commented Aug 4, 2020

Hi~ @LovPe Could you please show the code snippet of the connecting part of the two models?I'll build a similar example and see if we can handle this scenario. Thanks

@LovPe
Copy link
Author

LovPe commented Aug 4, 2020

@zheng-ningxin The test code:

from torch import nn
import torch
from nni._graph_utils import TorchModuleGraph


class CBR(nn.Module):
    def __init__(self, i, o):
        super(CBR, self).__init__()
        self.conv1 = nn.Conv2d(i, o, kernel_size=1)
        self.bn1 = nn.BatchNorm2d(o)
        self.act1 = nn.ReLU()

    def forward(self, x):
        return self.act1(self.bn1(self.conv1(x)))


class A(nn.Module):
    def __init__(self):
        super(A, self).__init__()
        self.conv1 = CBR(3, 6, )
        self.conv2 = CBR(6, 8, )
        self.conv3 = CBR(6, 12)

    def forward(self, x):
        x1 = self.conv1(x)
        x2 = self.conv2(x1)
        x3 = self.conv3(x1)
        return [x2, x3]


class B1(nn.Module):
    def __init__(self):
        super(B1, self).__init__()
        self.conv1 = CBR(12, 32)
        self.conv2 = CBR(32, 32)
        self.conv3 = CBR(32, 32)

    def forward(self, x):
        ret = list()
        x = self.conv1(x)
        ret.append(x)
        x = self.conv2(x)
        ret.append(x)
        x = self.conv3(x)
        ret.append(x)
        return ret


class B(nn.Module):
    def __init__(self):
        super(B, self).__init__()
        self.b = B1()

    def forward(self, x):
        return self.b(x[-1])


class C(nn.Module):
    def __init__(self):
        super(C, self).__init__()
        self.conv1 = CBR(8, 32)
        self.conv2 = CBR(12, 32)
        self.conv3 = CBR(32, 32)
        self.conv4 = CBR(32, 32)
        self.conv5 = CBR(32, 32)

    def forward(self, x):
        out = list()
        out.append(self.conv1(x[0]))
        out.append(self.conv2(x[1]))
        out.append(self.conv3(x[2]))
        out.append(self.conv4(x[3]))
        out.append(self.conv5(x[4]))
        return out


class TestMod(nn.Module):
    def __init__(self):
        super(TestMod, self).__init__()
        self.a = A()
        self.b = B()
        self.dummy = Dummy()
        self.c = C()

    def forward(self, x):
        x_a = self.a(x)
        x_b = self.b(x_a)
        x_a.extend(x_b)
        xc = self.c(x_a)
        return xc


if __name__ == "__main__":
    dummy_input = torch.randn([1, 3, 28, 28])
    m_test = TestMod()
    graph = TorchModuleGraph(m_test, dummy_input)
    for ii in graph.name_to_node:
        print("node {}, successor: {}".format(ii, graph.find_successors(ii)))
    print("=" * 100)
    graph.unpack_manually()
    for ii in graph.name_to_node:
        print("node {}, successor: {}".format(ii, graph.find_successors(ii)))

After running the code, you can get result like this:
截屏2020-08-04 下午5 26 51
Left is parameters before unpack_manually(), Right is After..
I think the output of unpack_manually() should not contain TupleUnpack node if it is not final node, since the unpack_manually() method is doing so. Can this be solved? Thanks

@zheng-ningxin
Copy link
Contributor

Hi, ~ @LovPe Currently, NNI cannot handle the multiple successive pack&unpack pairs. We will support it as soon as possible, Thanks for the feedback~

In addition, for other users who see this issue to understand the network structure more conveniently, I drew the network topology:
issue2756

@LovPe
Copy link
Author

LovPe commented Aug 5, 2020

@zheng-ningxin Thanks for quick reply~
The graph you draw is a version after unpack_manually(), maybe the graph before unpack_manually() is also useful?
And By the way, how to draw a graph as you provide? Can you share the tools? Thanks a lot~

@zheng-ningxin
Copy link
Contributor

@LovPe Sure~ Here is the visualization tool:

import graphviz
import torch
import torchvision
from nni._graph_utils import TorchModuleGraph

M = {}
nodecount = 0
visited = set()


def traverse(mg, curnode, lastnode, graph):
    global nodecount
    print('Visiting {} , from {}'.format(curnode, lastnode))
    if curnode in visited:
        if lastnode is not None:
            graph.edge(M[lastnode], M[curnode])
        return
    visited.add(curnode)
    M[curnode] = str(nodecount)
    nodecount += 1
    render_cfg = {'shape': 'ellipse', 'style': 'solid'}
    nodestring = mg.name_to_node[curnode].name + \
        '\n'+mg.name_to_node[curnode].op_type
    graph.node(M[curnode], nodestring, **render_cfg)
    if lastnode is not None:
        graph.edge(M[lastnode], M[curnode])
    nexts = mg.find_successors(curnode)

    for _next in nexts:
        traverse(mg, _next, curnode, graph)


def visualize(modulegraph, savepath):
    graph = graphviz.Digraph(format='jpg')

    for name, nodeio in mg.nodes_py.nodes_io.items():

        if nodeio.input_or_output == 'input':
            nodes = mg.input_to_node[name]
            for node in nodes:
                traverse(mg, node.name, None, graph)

    graph.render(savepath)


if __name__ == '__main__':
    modelname = 'resnet18'
    model = getattr(torchvision.models, modelname)
    net = model()

    net.cuda()
    data = torch.ones(1, 3, 224, 224).cuda()

    mg = TorchModuleGraph(net, data)
    mg.unpack_manually()
    visualize(mg, './test_resnet18')

@LovPe
Copy link
Author

LovPe commented Aug 5, 2020

Thanks a lot!
Here is the graph before unpack_manually() method
test_before

@zheng-ningxin
Copy link
Contributor

zheng-ningxin commented Aug 6, 2020

@LovPe Hi~ I have submitted a pr to support the scenario that may have multiple successive tuple-unpack operations in the graph. #2768. Now the graph will look like
issue2756.
You can try your model again after this pr is merged. Thanks~

@LovPe
Copy link
Author

LovPe commented Aug 6, 2020

Hi @zheng-ningxin Thanks for quick reply. The change can be solved for this case, there still a issue for me 🤣
demo code like this:

from torch import nn
import torch
from nni._graph_utils import TorchModuleGraph


class CBR(nn.Module):
    def __init__(self, i, o):
        super(CBR, self).__init__()
        self.conv1 = nn.Conv2d(i, o, kernel_size=1)
        self.bn1 = nn.BatchNorm2d(o)
        self.act1 = nn.ReLU()

    def forward(self, x):
        return self.act1(self.bn1(self.conv1(x)))


class A(nn.Module):
    def __init__(self):
        super(A, self).__init__()
        self.conv1 = CBR(3, 6, )
        self.conv2 = CBR(6, 8, )
        self.conv3 = CBR(6, 12)

    def forward(self, x):
        x1 = self.conv1(x)
        x2 = self.conv2(x1)
        x3 = self.conv3(x1)
        return [x2, x3]


class B1(nn.Module):
    def __init__(self):
        super(B1, self).__init__()
        self.conv1 = CBR(12, 32)
        self.conv2 = CBR(32, 32)
        self.conv3 = CBR(32, 32)

    def forward(self, x):
        ret = list()
        x = self.conv1(x)
        ret.append(x)
        x = self.conv2(x)
        ret.append(x)
        x = self.conv3(x)
        ret.append(x)
        return ret


class B(nn.Module):
    def __init__(self):
        super(B, self).__init__()
        self.b = B1()

    def forward(self, x):
        return self.b(x[-1])


class C(nn.Module):
    def __init__(self):
        super(C, self).__init__()
        self.conv1_1 = CBR(8, 32)
        self.conv1_2 = CBR(8, 32)
        self.conv2_1 = CBR(12, 32)
        self.conv2_2 = CBR(12, 32)
        self.conv3_1 = CBR(32, 32)
        self.conv3_2 = CBR(32, 32)
        self.conv4_1 = CBR(32, 32)
        self.conv4_2 = CBR(32, 32)
        self.conv5_1 = CBR(32, 32)
        self.conv5_2 = CBR(32, 32)

    def forward(self, x):
        out_a = list()
        out_b = list()
        out_a.append(self.conv1_1(x[0]))
        out_a.append(self.conv2_1(x[1]))
        out_a.append(self.conv3_1(x[2]))
        out_a.append(self.conv4_1(x[3]))
        out_a.append(self.conv5_1(x[4]))
        out_b.append(self.conv1_2(x[0]))
        out_b.append(self.conv2_2(x[1]))
        out_b.append(self.conv3_2(x[2]))
        out_b.append(self.conv4_2(x[3]))
        out_b.append(self.conv5_2(x[4]))
        return out_a, out_b


class TestMod(nn.Module):
    def __init__(self):
        super(TestMod, self).__init__()
        self.a = A()
        self.b = B()
        self.c = C()

    def forward(self, x):
        x_a = self.a(x)
        x_b = self.b(x_a)
        x_a.extend(x_b)
        xc = self.c(x_a)
        return xc


if __name__ == "__main__":
    from experiment.vis_nni_graph import visualize
    dummy_input = torch.randn([1, 3, 28, 28])
    m_test = TestMod()
    graph = TorchModuleGraph(m_test, dummy_input)
    for ii in graph.name_to_node:
        print("node {}, successor: {}".format(ii, graph.find_successors(ii)))
    visualize(graph, "./test_before")
    print("=" * 100)
    graph.unpack_manually()
    for ii in graph.name_to_node:
        print("node {}, successor: {}".format(ii, graph.find_successors(ii)))
    visualize(graph, "./test_after")

assertion trigerd:
File "/home/server_gpu/anaconda3/envs/py36_ch/lib/python3.6/site-packages/nni/_graph_utils.py", line 566, in unpack_manually
assert len(node.inputs) == len(list(last_cpp.inputs()))
The output graph before unpacking like this:
test_before

Maybe the code should like this(line 584-588 in _graph_utils.py): 🤔

                        if _debug_output in self.input_to_node:
                            for following_node in self.input_to_node[_debug_output]:
                                following_node.inputs.remove(_debug_output)
                                # Do not go into if, since there may exist 2 same tensor in pack&unpack pair
                                # if _debug_input not in following_node.inputs:   
                                following_node.inputs.append(_debug_input)

@zheng-ningxin
Copy link
Contributor

@LovPe Sorry for the late reply. I have updated the pr (#2768), please try it again, thanks~
It turns out that the assertion was triggered by an error elsewhere, and I have fixed it.

The unpacked topology currently looks like:
issue_2756_v2_after

@LovPe
Copy link
Author

LovPe commented Aug 11, 2020

Thanks for help. The new implementation is much clear and it works well for me~
This issue will close after it merged.

@LovPe LovPe closed this as completed Aug 14, 2020
@tigert1998
Copy link
Contributor

tigert1998 commented Dec 9, 2020

It seems that speeding up (nni.compression.pytorch.speedup.ModelSpeedup) a model with list/tuple pack/unpack is still not supported up until now (current master). List/tuple unpack is not supported in infer_shape.

RuntimeError: Has not supported infering output shape from input shape for module/function: `prim::TupleUnpack`, .prim::TupleUnpack.220

@QuanluZhang

@QuanluZhang
Copy link
Contributor

@tigert1998 we only supported prim::TupleUnpack for resolving mask conflict for now. have not supported it the main speedup process. Thanks for reporting your requirement, we will support it soon.

@xuezu29
Copy link

xuezu29 commented Apr 6, 2021

@tigert1998 we only supported prim::TupleUnpack for resolving mask conflict for now. have not supported it the main speedup process. Thanks for reporting your requirement, we will support it soon.
Looking forward to support for the prim::TupleUnpack in speedup process. very thankful !

@scarlett2018
Copy link
Member

@tigert1998 we only supported prim::TupleUnpack for resolving mask conflict for now. have not supported it the main speedup process. Thanks for reporting your requirement, we will support it soon.

@QuanluZhang @zheng-ningxin - ping to check the latest status of this fix.

@scarlett2018 scarlett2018 added support tobe-overdued and removed question Further information is requested labels Jun 9, 2021
@zheng-ningxin
Copy link
Contributor

zheng-ningxin commented Jun 10, 2021

@scarlett2018 Same issue with this one, please refer it for more details.

@xuezu29 I have fix this in the refactored speedup, please have a try when this pr merged, or you can clone the corresponding branch and compile the NNI manually. Please let me know if there is more scenarios to support. Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants