[Relay] Add gradient operator tutorial docs #2751

weberlo · 2019-03-08T17:46:00Z

Another rough draft. Roast me.

I'm interested in the answer to the TODO at the end of the patch.

@jroesch @MarisaKirisame @slyubomirsky @ZihengJiang

ZihengJiang · 2019-03-08T18:46:06Z

collapse_sum_like is only for operators which have broadcast behaviors

slyubomirsky · 2019-03-08T19:40:36Z

docs/dev/relay_add_op.rst

+Gradient Operators
+------------------
+
+Gradient operators are important for writing differentiable programs in


Specifically you should say that gradient operators are necessary for the AD algorithm; the AD algorithm can differentiate first-class language constructs, but because operators are opaque, it needs an explicit differentiation rule for them.

Nits:

Do we have any documentation of Relay's AD? It would be good to link to that

It would be good to say that the differentiation rule is opaque to Relay as well

I think the sentence in which you describe operators as "opaque" should also contain the clause clarifying that "opaque" means that Relay cannot look into the implementation

slyubomirsky · 2019-03-08T19:41:55Z

docs/dev/relay_add_op.rst

+
+Before we further analyze this definition, first we should recall the partial
+derivatives for multiplication. Given a function f(x, y) = x * y, we have
+that df/dx = y and df/dy = x. The definition above looks similar to the math


Do you know any way to use the proper partial derivative notation in RST? ∂f/∂x? (I don't know if just using the unicode will work.)

Wasn't sure if we wanted unicode in there. I guess I could just add it and see if the docs break.

slyubomirsky · 2019-03-08T19:43:40Z

docs/dev/relay_add_op.rst

+
+We're not just interested in how to compute the gradient of this function.
+We're interested in composing this gradient with other gradients, so we can
+accumulate the gradient across an entire program. This is where the ``grad *


We may want a different choice of variable names because grad * x could be misinterpreted as ∇ * x (div). Also I think you should use :code: in front of the backticks

Are you sure? I've looked at the source for some of the docs, and all they use are double backticks for code formatting.

Also, I'm not sure what you mean when you say "∇ * x (div)".

https://en.wikipedia.org/wiki/Divergence

I don't know if other docs use something different (some kind of styling) but I have stuck to using :code: for Relay docs and found when I built docs that backticks without it ended up italicized, not with typewriter font.

MarisaKirisame · 2019-03-09T03:32:52Z

docs/dev/relay_add_op.rst

+differentiating with respect to. We only need to do this for operators with
+broadcasting behaviors.
+
+TODO: Why do we only have ``collapse_sum_like`` on some of the gradient


collapse is only needed when you broadcast, which most binary operation implicitly do

weberlo · 2019-03-10T06:23:54Z

Adding more reviewers!

@junrushao1994 @antinucleon

weberlo · 2019-03-13T18:46:29Z

Ping @jroesch @MarisaKirisame @slyubomirsky @ZihengJiang @junrushao1994 @antinucleon

slyubomirsky · 2019-03-13T20:41:04Z

Could you try building the docs to make sure the inline ∂ symbols work? Also to make sure that the statements in backticks show up correctly in typewriter font. I haven't tried myself but I'd be concerned about those.

slyubomirsky · 2019-03-13T20:44:03Z

docs/dev/relay_add_op.rst

+provided.
+
+Adding a gradient operator is slightly different from adding a normal
+operator, in that you only need to touch Python code. A collection of


It can be done in C++ too (but generally has not been done this way). Is there any example of a gradient registered in C++? As far as I know, it can be done using the operator registry APIs

With a quick grep, I couldn't find any instances of the "FPrimalGradient" attr being set in C++, so I suspect there aren't any C++ gradient examples. I could still mention the procedure for adding one in C++ though.

Yeah, that's what I meant, just mention it. I don't think there are presently examples, but mention that it can be done and show how in case that proves to be convenient for others.

slyubomirsky · 2019-03-13T20:45:23Z

docs/dev/relay_add_op.rst

+of the inputs, we use ``collapse_sum_like`` to take the contents of the
+``grad * <var>`` terms and make the shape match the input we're
+differentiating with respect to. We only need to do this for operators with
+broadcasting behaviors.


I think you should have another example that does not have the complexity of collapse_sum_like because this seems like a fairly confusing point (at least, I'm left confused). It would also benefit maybe from some lower-level explanation (or having some math more explicitly written out).

Good point. As far as I understand, it's to handle cases where you have a tensor x with shape (4, 4) and a tensor y with shape (16, 1), and you add them. When you differentiate wrt. x, you want the shape to match x, and you want the same for y. I could be totally wrong though.

If one of the other reviewers can confirm that that's what's going on, then I can add that explanation to the docs. Otherwise, adding another example is a good idea too.

weberlo · 2019-03-13T21:31:25Z

Could you try building the docs to make sure the inline ∂ symbols work? Also to make sure that the statements in backticks show up correctly in typewriter font. I haven't tried myself but I'd be concerned about those.

It seems that building the docs (assuming I'm doing it correctly) requires a CUDA-enabled GPU for some reason, which my machine doesn't have. Not sure what to do about that.

slyubomirsky · 2019-03-15T19:23:25Z

Are you sure the build completely fails without CUDA enabled? I think some of the docs actually run TVM code and use it to generate example output, but those that don't do that should build. (Also I'm pretty sure there's a LaTeX-like math mode in RST that some of the docs use, so perhaps look around for those to see a better way to encode mathematical expressions.)

weberlo · 2019-03-18T23:44:32Z

Are you sure the build completely fails without CUDA enabled? I think some of the docs actually run TVM code and use it to generate example output, but those that don't do that should build. (Also I'm pretty sure there's a LaTeX-like math mode in RST that some of the docs use, so perhaps look around for those to see a better way to encode mathematical expressions.)

They seem to be failing, because the build directory for the docs is empty when it finishes.

slyubomirsky · 2019-03-19T01:19:37Z

Have you tried a more targeted build that avoids the parts that require Cuda?

weberlo · 2019-03-19T01:47:24Z

Have you tried a more targeted build that avoids the parts that require Cuda?

Didn't know about the TVM_TUTORIAL_EXEC_PATTERN environment variable. Thanks for the offline help, Steven.

It seems that the unicode for partial derivatives shows up correctly, and you don't need :code: for inline snippets. See the image below:

weberlo · 2019-03-19T01:51:48Z

I'll add another gradient example when I get a chance. Please feel free to review the current state of the PR though.

weberlo · 2019-03-22T04:15:29Z

src/relay/pass/pattern_util.h

+  static const Op& op = Op::Get("collapse_sum_like");
+  return CallNode::make(op, {e});
+}
+


Should this definition be in a separate PR?

weberlo · 2019-03-22T18:00:25Z

Ping @ZihengJiang @antinucleon

* Add gradient operator tutorial docs * Incorporate Steven's and Ziheng's feedback * Remove TODO about `collapse_sum_like` * Add more examples

Add gradient operator tutorial docs

4eb4c40

slyubomirsky reviewed Mar 8, 2019

View reviewed changes

Incorporate Steven's and Ziheng's feedback

7a53ceb

MarisaKirisame reviewed Mar 9, 2019

View reviewed changes

Remove TODO about collapse_sum_like

bbd509b

tqchen added the status: need review label Mar 10, 2019

slyubomirsky reviewed Mar 13, 2019

View reviewed changes

Add more examples

40be544

weberlo commented Mar 22, 2019

View reviewed changes

Merge branch 'master' into add-grad-op-docs

7f1c1b0

jroesch approved these changes Apr 12, 2019

View reviewed changes

jroesch merged commit b9349cb into apache:master Apr 12, 2019

wweic pushed a commit to wweic/tvm that referenced this pull request May 13, 2019

[Relay] Add gradient operator tutorial docs (apache#2751)

f7438f5

* Add gradient operator tutorial docs * Incorporate Steven's and Ziheng's feedback * Remove TODO about `collapse_sum_like` * Add more examples

wweic pushed a commit to neo-ai/tvm that referenced this pull request May 13, 2019

[Relay] Add gradient operator tutorial docs (apache#2751)

59b72a7

* Add gradient operator tutorial docs * Incorporate Steven's and Ziheng's feedback * Remove TODO about `collapse_sum_like` * Add more examples

tqchen mentioned this pull request Nov 8, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relay] Add gradient operator tutorial docs #2751

[Relay] Add gradient operator tutorial docs #2751

weberlo commented Mar 8, 2019

ZihengJiang commented Mar 8, 2019

slyubomirsky Mar 8, 2019

slyubomirsky Mar 13, 2019

slyubomirsky Mar 8, 2019 •

edited

Loading

weberlo Mar 9, 2019

slyubomirsky Mar 8, 2019

weberlo Mar 9, 2019

weberlo Mar 9, 2019

slyubomirsky Mar 10, 2019

MarisaKirisame Mar 9, 2019

weberlo commented Mar 10, 2019

weberlo commented Mar 13, 2019

slyubomirsky commented Mar 13, 2019

slyubomirsky Mar 13, 2019

weberlo Mar 13, 2019

slyubomirsky Mar 15, 2019

slyubomirsky Mar 13, 2019

weberlo Mar 13, 2019

weberlo commented Mar 13, 2019

slyubomirsky commented Mar 15, 2019 •

edited

Loading

weberlo commented Mar 18, 2019

slyubomirsky commented Mar 19, 2019

weberlo commented Mar 19, 2019

weberlo commented Mar 19, 2019

weberlo Mar 22, 2019

weberlo commented Mar 22, 2019

[Relay] Add gradient operator tutorial docs #2751

[Relay] Add gradient operator tutorial docs #2751

Conversation

weberlo commented Mar 8, 2019

ZihengJiang commented Mar 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slyubomirsky Mar 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

weberlo commented Mar 10, 2019

weberlo commented Mar 13, 2019

slyubomirsky commented Mar 13, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

weberlo commented Mar 13, 2019

slyubomirsky commented Mar 15, 2019 • edited Loading

weberlo commented Mar 18, 2019

slyubomirsky commented Mar 19, 2019

weberlo commented Mar 19, 2019

weberlo commented Mar 19, 2019

Choose a reason for hiding this comment

weberlo commented Mar 22, 2019

slyubomirsky Mar 8, 2019 •

edited

Loading

slyubomirsky commented Mar 15, 2019 •

edited

Loading