-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Backend] Chainer support #720
Conversation
This reverts commit 236c5bc.
I understand, btw you are crazy... |
Thanks for you wonderful work!.. I will see whether I can test it |
@@ -866,8 +866,8 @@ def zerocopy_from_dgl_ndarray(input): | |||
# kernels (see kernel.py), and plug into tensor framework using custom op | |||
# extensions. | |||
|
|||
def binary_reduce(reducer, binary_op, graph, lhs, rhs, lhs_data, rhs_data, | |||
out_size, lhs_map, rhs_map, out_map): | |||
def binary_reduce(reducer, op, G, A_target, B_target, A, B, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the argument names are the same as c++ backend, should we refactor c++ impls too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be another separate PR I think... Lots of stuff to change there.
return grad.reshape(shape) | ||
|
||
|
||
def sync(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a unique operation just for chainer, should we incorporate this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean MXNet?
import numpy as np | ||
from chainer.backend import get_array_module as _get_array_module | ||
try: | ||
import cupy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this affect the requirements of DGL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; Chainer could install with or without CuPy.
@@ -3045,7 +3045,7 @@ def edge_subgraph(self, edges, preserve_nodes=False): | |||
sgi = self._graph.edge_subgraph(induced_edges, preserve_nodes=preserve_nodes) | |||
return subgraph.DGLSubGraph(self, sgi) | |||
|
|||
def adjacency_matrix_scipy(self, transpose=False, fmt='csr'): | |||
def adjacency_matrix_scipy(self, transpose=False, fmt='csr', return_edge_ids=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, and I think it should defaults to False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have existing code that depend on this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My transformer depend on this behavior and I would refactor the code.
Some unittests may depend on this function but I guess they expect this function to return all 1s.
@@ -562,7 +562,7 @@ def edge_subgraph(self, e, preserve_nodes=False): | |||
return SubgraphIndex(gidx, self, induced_nodes, e) | |||
|
|||
@utils.cached_member(cache='_cache', prefix='scipy_adj') | |||
def adjacency_matrix_scipy(self, transpose, fmt): | |||
def adjacency_matrix_scipy(self, transpose, fmt, return_edge_ids=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as in graph.py
.
@@ -124,6 +127,8 @@ def _pfc(x): | |||
v = F.tensor([3, 4, 5]) | |||
assert _pfc(g.edges[u, v].data['l']) == [1., 1., 1.] | |||
|
|||
@unittest.skipIf(dgl.env.get_backend() == 'chainer', | |||
'Chainer does not support head gradients') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by head gradients?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tensor.backward(blah)
@@ -69,7 +74,7 @@ def _test(red): | |||
g.update_all(fn.copy_src(src='u', out='m'), | |||
builtin[red](msg='m', out='r1')) | |||
r1 = g.ndata['r1'] | |||
F.backward(r1.sum()) | |||
F.backward(F.reduce_sum(r1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose reduce_sum
means taking the sum of all elements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and it's only available in tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, then I would refactor my code for pooling layers following the same naming style.
@@ -98,7 +98,7 @@ def check_basic(g, nf): | |||
for i in range(1, nf.num_layers): | |||
in_deg = nf.layer_in_degree(i) | |||
out_deg = nf.layer_out_degree(i - 1) | |||
assert F.asnumpy(F.sum(in_deg, 0) == F.sum(out_deg, 0)) | |||
assert F.asnumpy(in_deg).sum(0) == F.asnumpy(out_deg).sum(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seesm these two impls are equivalent, why converting to numpy first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chainer variables don't support things like a == b
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see...
# Currently I could not unify the NumPy-DLPack and CuPy-DLPack interaction | ||
# logic, so I resorted to old-style Chainer functions. | ||
# The new array backend ChainerX currently does not support DLPack. | ||
class BinaryReduce(chainer.Function): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ylfdq1118 I highly suggest we could further separate the logic into frame-specific and frame-agnostic part. For example, I feel what we need is an adaptor class backend.Function
which unifies torch.autograd.Function
, mxnet.autograd.Function
, chainer.Function
. By doing so, we could put this BinaryReduce
class outside of backend, otherwise it's just too much of copy-pasting.
Investigated some solutions. Although numpy has a FFI to support from/to ctype array, it does not handle memory ownership like DLPack does. Thus, in order to support zerocopy between numpy and DLPack, we need to use numpy's C library (see pytorch codes for usage). It adds an extra dependency during compilation but it looks manageable. What do you think? Shall we take that route?
Could you be more specific about this issue? One side note. @zheng-da and I had a discussion before and we both think that shared memory should be treated as one of the device types. We also raised an issue (dmlc/dlpack#41) in DLPack to collect some feedback. |
Sounds fine to me. By asking "Shall we take that route" you mean you have other solutions available?
The issue is in As per device type, I think a better generalization would be having file as a device type. Because technically we can store tensors on any storage as long as it support random access (e.g. shared memory, file on a disk (impractical usually), or other devices that supports direct |
Closing this since the Chainer team is migrating to PyTorch. |
Description
Chainer support as per request in #668. Turns out to be a lot more effort that I expected.
In particular, zerocopy of Chainer variables to DGL tensors is not very good. Maybe I missed the correct way to implement this, but here are my issues:
zerocopy_from_dgl_ndarray
.The issues are commented in
python/dgl/backend/chainer/tensor.py
.Other changes include (lots of) rewrites in unit tests, replacing tensor operators with backend functions and numpy functions.
@cmpute I may include one or two examples for Chainer in the (potentially far) future, but it would be greatly appreciated if you could try it out after this PR is merged and maybe contribute an example.
@yzh119 I suspect that this PR would be merged earlier than your pooling PR, so please ping me if you need any help to implement the Chainer equivalents.
Minor change
Added a
return_edge_id
optional argument inDGLGraph.adjacency_matrix_scipy()
method which defaults to True. If False, then 1's are returned.The reason to do this is that if edge IDs are returned, then one of the nonzero entries will be actually 0, which may be confusing.
Checklist
Please feel free to remove inapplicable items for your PR.
or have been fixed to be compatible with this change