Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay] Add gradient operator tutorial docs #2751

Merged
merged 5 commits into from
Apr 12, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions docs/dev/relay_add_op.rst
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,110 @@ before producing the call node:
tup = Tuple(list(args))
return _make.concat(tup)

Gradient Operators
------------------

Gradient operators are important for writing differentiable programs in
Relay. While it is the case that Relay's autodiff algorithm can differentiate
first-class language constructs, operators are opaque. Because Relay can't
look into the implementation, an explicit differentiation rule must be
provided.

Both Python and C++ can be used to write gradient operators, but we focus our
examples on Python, as it is more commonly used.

Adding a Gradient in Python
~~~~~~~~~~~~~~~~~~~~~~~~~~~

A collection of Python gradient operators can be found in
``python/tvm/relay/op/_tensor_grad.py``. We will walk through two
representative examples: ``sigmoid`` and ``multiply``.

.. code:: python

@register_gradient("sigmoid")
def sigmoid_grad(orig, grad):
"""Returns [grad * sigmoid(x) * (1 - sigmoid(x))]."""
return [grad * orig * (ones_like(orig) - orig)]

The inputs here are the original operator ``orig`` and a gradient ``grad`` to
accumulate into. What we return is a list, where the element at the i'th
index is the derivative of the operator with respect to the operator's i'th
input. In general, the gradient will return a list with as many elements as
there are inputs to the base operator.

Before we further analyze this definition, first we should recall the
derivative of the sigmoid function: :math:`\frac{\partial \sigma}{\partial x}
= \sigma(x)(1 - \sigma(x))`. The definition above looks similar to the
mathematical definition, but there is one important addition, which we
describe below.

The term ``orig * (ones_like(orig) - orig)`` directly matches the derivative,
because ``orig`` here is the sigmoid function, but we're not just interested
in how to compute the gradient of this function. We're interested in
composing this gradient with other gradients, so we can accumulate the
gradient across an entire program. This is where the ``grad`` term comes in.
In the expression ``grad * orig * (ones_like(orig) - orig)``, multiplying by
``grad`` specifies how to compose the derivative with the gradient thus far.

Now, we consider ``multiply``, a slightly more interesting example:

.. code:: python

@register_gradient("multiply")
def multiply_grad(orig, grad):
"""Returns [grad * y, grad * x]"""
x, y = orig.args
return [collapse_sum_like(grad * y, x),
collapse_sum_like(grad * x, y)]

In this example, there are two elements in the returned list, because
``multiply`` is a binary operator. And to recall, if :math:`f(x, y) = xy`, the
partial derivatives are :math:`\frac{\partial f}{\partial x} = y` and
:math:`\frac{\partial f}{\partial y} = x`.

There is one required step for ``multiply`` that is not required for
``sigmoid``, because ``multiply`` has broadcasting semantics. Since the shape
of ``grad`` might not match the shape of the inputs, we use
``collapse_sum_like`` to take the contents of the ``grad * <var>`` terms and
make the shape match the shape of the input we're differentiating with
respect to.

Adding a Gradient in C++
~~~~~~~~~~~~~~~~~~~~~~~~

Adding a gradient in C++ is similar to adding one in Python, but the
interface for registering is slightly different.

First, make sure ``src/relay/pass/pattern_util.h`` is included. It provides
helper functions for creating nodes in the Relay AST. Then, define the
gradient in a similar fashion as in the Python example:

.. code:: c

tvm::Array<Expr> MultiplyGrad(const Expr& orig_call, const Expr& output_grad) {
const Call& call = orig_call.Downcast<Call>();
return { CollapseSumLike(Multiply(output_grad, call.args[1]), call.args[0]),
CollapseSumLike(Multiply(output_grad, call.args[0]), call.args[1]) };
}

Notice that in C++ we can't use the same operator overloading that we have in
Python, and we need to downcast, so the implementation is more verbose. Even
so, we can easily verify that this definition mirrors the earlier example in
Python.

Now, instead of using a Python decorator, we need to tack a ``set_attr`` call
for "FPrimalGradient" onto the end of the base operator's registration, in
order to register the gradient.

.. code:: c

RELAY_REGISTER_OP("multiply")
// ...
// Set other attributes
// ...
.set_attr<FPrimalGradient>("FPrimalGradient", MultiplyGrad);

Summary
-------

Expand Down
5 changes: 5 additions & 0 deletions src/relay/pass/pattern_util.h
Original file line number Diff line number Diff line change
Expand Up @@ -328,6 +328,11 @@ inline Expr OnesLike(Expr e) {
return CallNode::make(op, {e});
}

inline Expr CollapseSumLike(Expr e) {
static const Op& op = Op::Get("collapse_sum_like");
return CallNode::make(op, {e});
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this definition be in a separate PR?

inline Expr Power(Expr lhs, Expr rhs) {
static const Op& op = Op::Get("power");
return CallNode::make(op, {lhs, rhs}, Attrs(), {});
Expand Down