[Backend] Chainer support #720

BarclayII · 2019-07-19T17:24:08Z

Description

Chainer support as per request in #668. Turns out to be a lot more effort that I expected.

In particular, zerocopy of Chainer variables to DGL tensors is not very good. Maybe I missed the correct way to implement this, but here are my issues:

Chainer CPU variables directly wrap around numpy NDArrays, and I could not make zerocopying between numpy and DGL tensors work.
As a result I have to resort to old-style Chainer custom layers, which is not a big deal since the primary benefit of new-style layers is second-order gradients, which is OK for now since we fuse the gradient operators anyway.
Another consequence is that I could not get shared memory to work: DGL frees the node features and edge features in shared memory right after zerocopy_from_dgl_ndarray.

The issues are commented in python/dgl/backend/chainer/tensor.py.

Other changes include (lots of) rewrites in unit tests, replacing tensor operators with backend functions and numpy functions.

@cmpute I may include one or two examples for Chainer in the (potentially far) future, but it would be greatly appreciated if you could try it out after this PR is merged and maybe contribute an example.

@yzh119 I suspect that this PR would be merged earlier than your pooling PR, so please ping me if you need any help to implement the Chainer equivalents.

Minor change

Added a return_edge_id optional argument in DGLGraph.adjacency_matrix_scipy() method which defaults to True. If False, then 1's are returned.

The reason to do this is that if edge IDs are returned, then one of the nonzero entries will be actually 0, which may be confusing.

Checklist

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (such as [Model], [Doc], [Feature]])
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented
To the my best knowledge, examples are either not affected by this change,
or have been fixed to be compatible with this change
Related issue is referred in this PR
New CI image is pushed

This reverts commit 236c5bc.

yzh119 · 2019-07-19T17:45:57Z

I understand, btw you are crazy...

cmpute · 2019-07-19T18:06:57Z

Thanks for you wonderful work!.. I will see whether I can test it

yzh119 · 2019-07-22T06:14:08Z

python/dgl/backend/backend.py

@@ -866,8 +866,8 @@ def zerocopy_from_dgl_ndarray(input):
 # kernels (see kernel.py), and plug into tensor framework using custom op
 # extensions.

-def binary_reduce(reducer, binary_op, graph, lhs, rhs, lhs_data, rhs_data,
-                  out_size, lhs_map, rhs_map, out_map):
+def binary_reduce(reducer, op, G, A_target, B_target, A, B,


I think the argument names are the same as c++ backend, should we refactor c++ impls too?

It could be another separate PR I think... Lots of stuff to change there.

yzh119 · 2019-07-22T06:17:33Z

python/dgl/backend/chainer/tensor.py

+    return grad.reshape(shape)
+
+
+def sync():


It looks like a unique operation just for chainer, should we incorporate this?

You mean MXNet?

yzh119 · 2019-07-22T06:18:28Z

python/dgl/backend/chainer/utils.py

+import numpy as np
+from chainer.backend import get_array_module as _get_array_module
+try:
+    import cupy


Would this affect the requirements of DGL?

Yes; Chainer could install with or without CuPy.

yzh119 · 2019-07-22T06:25:13Z

python/dgl/graph.py

@@ -3045,7 +3045,7 @@ def edge_subgraph(self, edges, preserve_nodes=False):
        sgi = self._graph.edge_subgraph(induced_edges, preserve_nodes=preserve_nodes)
        return subgraph.DGLSubGraph(self, sgi)

-    def adjacency_matrix_scipy(self, transpose=False, fmt='csr'):
+    def adjacency_matrix_scipy(self, transpose=False, fmt='csr', return_edge_ids=True):


I agree, and I think it should defaults to False.

Do we have existing code that depend on this behavior?

My transformer depend on this behavior and I would refactor the code.
Some unittests may depend on this function but I guess they expect this function to return all 1s.

yzh119 · 2019-07-22T06:25:39Z

python/dgl/graph_index.py

@@ -562,7 +562,7 @@ def edge_subgraph(self, e, preserve_nodes=False):
        return SubgraphIndex(gidx, self, induced_nodes, e)

    @utils.cached_member(cache='_cache', prefix='scipy_adj')
-    def adjacency_matrix_scipy(self, transpose, fmt):
+    def adjacency_matrix_scipy(self, transpose, fmt, return_edge_ids=True):


The same as in graph.py.

tests/backend/chainer/__init__.py

yzh119 · 2019-07-22T06:30:32Z

tests/compute/test_basics.py

@@ -124,6 +127,8 @@ def _pfc(x):
    v = F.tensor([3, 4, 5])
    assert _pfc(g.edges[u, v].data['l']) == [1., 1., 1.]

+@unittest.skipIf(dgl.env.get_backend() == 'chainer',
+                 'Chainer does not support head gradients')


What do you mean by head gradients?

tensor.backward(blah)

yzh119 · 2019-07-22T06:37:07Z

tests/compute/test_kernel.py

@@ -69,7 +74,7 @@ def _test(red):
            g.update_all(fn.copy_src(src='u', out='m'),
                         builtin[red](msg='m', out='r1'))
            r1 = g.ndata['r1']
-            F.backward(r1.sum())
+            F.backward(F.reduce_sum(r1))


I suppose reduce_sum means taking the sum of all elements?

Yes and it's only available in tests

OK, then I would refactor my code for pooling layers following the same naming style.

yzh119 · 2019-07-22T06:45:06Z

tests/compute/test_nodeflow.py

@@ -98,7 +98,7 @@ def check_basic(g, nf):
    for i in range(1, nf.num_layers):
        in_deg = nf.layer_in_degree(i)
        out_deg = nf.layer_out_degree(i - 1)
-        assert F.asnumpy(F.sum(in_deg, 0) == F.sum(out_deg, 0))
+        assert F.asnumpy(in_deg).sum(0) == F.asnumpy(out_deg).sum(0)


It seesm these two impls are equivalent, why converting to numpy first?

Chainer variables don't support things like a == b.

jermainewang · 2019-07-22T20:02:16Z

python/dgl/backend/chainer/tensor.py

+# Currently I could not unify the NumPy-DLPack and CuPy-DLPack interaction
+# logic, so I resorted to old-style Chainer functions.
+# The new array backend ChainerX currently does not support DLPack.
+class BinaryReduce(chainer.Function):


@ylfdq1118 I highly suggest we could further separate the logic into frame-specific and frame-agnostic part. For example, I feel what we need is an adaptor class backend.Function which unifies torch.autograd.Function, mxnet.autograd.Function, chainer.Function. By doing so, we could put this BinaryReduce class outside of backend, otherwise it's just too much of copy-pasting.

jermainewang · 2019-07-22T20:27:25Z

Chainer CPU variables directly wrap around numpy NDArrays, and I could not make zerocopying between numpy and DGL tensors work.

Investigated some solutions. Although numpy has a FFI to support from/to ctype array, it does not handle memory ownership like DLPack does. Thus, in order to support zerocopy between numpy and DLPack, we need to use numpy's C library (see pytorch codes for usage). It adds an extra dependency during compilation but it looks manageable. What do you think? Shall we take that route?

Another consequence is that I could not get shared memory to work: DGL frees the node features and edge features in shared memory right after zerocopy_from_dgl_ndarray.

Could you be more specific about this issue? One side note. @zheng-da and I had a discussion before and we both think that shared memory should be treated as one of the device types. We also raised an issue (dmlc/dlpack#41) in DLPack to collect some feedback.

BarclayII · 2019-07-25T03:42:19Z

Investigated some solutions. Although numpy has a FFI to support from/to ctype array, it does not handle memory ownership like DLPack does. Thus, in order to support zerocopy between numpy and DLPack, we need to use numpy's C library (see pytorch codes for usage). It adds an extra dependency during compilation but it looks manageable. What do you think? Shall we take that route?

Sounds fine to me. By asking "Shall we take that route" you mean you have other solutions available?

Could you be more specific about this issue? One side note. @zheng-da and I had a discussion before and we both think that shared memory should be treated as one of the device types. We also raised an issue (dmlc/dlpack#41) in DLPack to collect some feedback.

The issue is in _move_data_to_shared_mem_array in python/dgl/contrib/graph_store.py.
The function is supposed to return a framework-specific view of the DGL array in shared memory, meaning that the object it returns is a, say, PyTorch tensor, with its contents residing in shared memory. However, since zerocopying from DGL arrays to numpy does not work yet, the resulting Chainer variable would have its memory independent of the DGL array in shared memory. So after returning, the DGL array would be recycled.

As per device type, I think a better generalization would be having file as a device type. Because technically we can store tensors on any storage as long as it support random access (e.g. shared memory, file on a disk (impractical usually), or other devices that supports direct (p)read(2) and (p)write(2)). Just my two cents though.

BarclayII · 2019-12-05T06:17:04Z

Closing this since the Chainer team is migrating to PyTorch.

BarclayII added 10 commits July 18, 2019 19:37

chainer initial commit, not working

aecd224

fix ndarray view bug

236c5bc

Revert "fix ndarray view bug"

90838c7

This reverts commit 236c5bc.

various fixes

e22a0a6

lots of compatibility fixes

9b727b6

Merge branch 'master' into chainer

1ccc541

default argument

75f8a27

fixes

27f15b6

chainer fixes & docker setup

e472818

change test

b35e1c9

BarclayII added 10 commits July 20, 2019 11:38

typo

4f01350

disabling shared memory test for chainer

1c8dc58

docstring

de63c3c

oops

fce3787

oops x2

fa18411

fix cupy contiguous array issue

b6130bc

better fix

576d244

specify chainer version in ci

165d2d7

rewrite assert allclose and array_equal to numpy tests

5455da8

fix

3846dce

yzh119 reviewed Jul 22, 2019

View reviewed changes

remove garbage

787f676

BarclayII changed the title ~~[WIP][Backend] Chainer support~~ [Backend] Chainer support Jul 22, 2019

jermainewang reviewed Jul 22, 2019

View reviewed changes

BarclayII added 2 commits July 25, 2019 12:06

Merge branch 'master' into chainer

dfc47cb

Merge branch 'master' into chainer

9da635e

BarclayII mentioned this pull request Jul 29, 2019

[Misc] Provide a "return 1s instead of edge IDs" option in scipy adjacency matrix #730

Merged

5 tasks

BarclayII closed this Dec 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Backend] Chainer support #720

[Backend] Chainer support #720

BarclayII commented Jul 19, 2019 •

edited

Loading

yzh119 commented Jul 19, 2019

cmpute commented Jul 19, 2019

yzh119 Jul 22, 2019

BarclayII Jul 22, 2019

yzh119 Jul 22, 2019

BarclayII Jul 22, 2019

yzh119 Jul 22, 2019

BarclayII Jul 22, 2019

yzh119 Jul 22, 2019

BarclayII Jul 22, 2019

yzh119 Jul 22, 2019

yzh119 Jul 22, 2019

yzh119 Jul 22, 2019

BarclayII Jul 22, 2019 •

edited

Loading

yzh119 Jul 22, 2019

BarclayII Jul 22, 2019 •

edited

Loading

yzh119 Jul 22, 2019

yzh119 Jul 22, 2019

BarclayII Jul 22, 2019

yzh119 Jul 22, 2019

jermainewang Jul 22, 2019

jermainewang commented Jul 22, 2019

BarclayII commented Jul 25, 2019 •

edited

Loading

BarclayII commented Dec 5, 2019

[Backend] Chainer support #720

[Backend] Chainer support #720

Conversation

BarclayII commented Jul 19, 2019 • edited Loading

Description

Minor change

Checklist

yzh119 commented Jul 19, 2019

cmpute commented Jul 19, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BarclayII Jul 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BarclayII Jul 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jermainewang commented Jul 22, 2019

BarclayII commented Jul 25, 2019 • edited Loading

BarclayII commented Dec 5, 2019

BarclayII commented Jul 19, 2019 •

edited

Loading

BarclayII Jul 22, 2019 •

edited

Loading

BarclayII Jul 22, 2019 •

edited

Loading

BarclayII commented Jul 25, 2019 •

edited

Loading