scale gradient for backward pass #521

enpasos · 2021-01-13T09:48:14Z

Question or maybe Enhancement

I'm missing a feature to scale the gradient for backward pass (as e.g. used in MuZero) ... something like
tensor * scale + stop_gradient(tensor) * (1 - scale)
I'm not sure if the feature is missing or I'm simply not seeing the proper way how to do it.

Workaround

I worked around it by adding an additional forward pass, keeping the tensor as outputs and putting them in on training forward as "stop_gradient(tensor)"-inputs. This works functionally, but comes at the cost of

memory consumption on the training device (rare on my gpu)
lower performance
higher complexity

roywei · 2021-01-16T00:29:57Z

You can use block.getParameters() to get the parameters and getArray to get the param value, then getGradient to access the gradient. You can do inplace update on the gradient value. (e.g. grad.muli(scale))

Not sure if this is what you want, if not, please provide some python code in TF, PyTorch or MXNet so we can take a look.
Thanks!

enpasos · 2021-01-16T09:17:09Z

Thank you very much for your reply.
I think the methods you mentioned are useful for some use cases.
For use cases where the concerned node on the graph is passed many times I do not see a clever way where the methods you mentioned lead to a simple solution.

I would like to give some more information about the use case I am looking at:

MuZero use case: Java implementation of MuZero based on DJL (MXNet as Framework).

Need: The MuZero paper comes with Python-Pseudocode (see inside the suplimentary data). The pseudocode uses this function

def scale_gradient(tensor: Any, scale):
    """Scales the gradient for the backward pass."""
    return tensor * scale + tf.stop_gradient(tensor) * (1 - scale)

to scale down the error backpropagation from the recurrently called dynamic function.

Support in the frameworks
In tensorflow I see the function stop_gradient on the python api.
As I am using MXNet I searched for the support in MXNet and found this.

enpasos · 2021-01-17T17:13:14Z

I think I found the function in the MXNet-Python API: BlockGrad

enpasos · 2021-01-17T17:14:12Z

It would be great to have it in Java, too.

enpasos · 2021-01-17T21:55:38Z

I'll test this

    public static NDArray stopGradient(NDArray in) {
        MxNDManager manager = (MxNDManager)in.getManager();
        MxOpParams params = new MxOpParams();
        return manager.invoke("stop_gradient", in, params);
    }

enpasos · 2021-01-18T09:32:20Z

The stopGradient works well for me: I could remove my workaround and therefore gained gpu memory ... enough to double the batchsize.

As it is a general functionality (e.g. used in MuZero) it would be very useful to add the functionality on the Java API, too. e.g. in the NDArray interface and its implementations.

enpasos added the question Further information is requested label Jan 13, 2021

zachgk mentioned this issue Jan 20, 2021

Add NDArray stopGradient and scaleGradient #548

Merged

zachgk closed this as completed in #548 Jan 21, 2021

Lokiiiiii pushed a commit to Lokiiiiii/djl that referenced this issue Oct 10, 2023

[python] Format python code (deepjavalibrary#521)

b9d3e26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scale gradient for backward pass #521

scale gradient for backward pass #521

enpasos commented Jan 13, 2021

roywei commented Jan 16, 2021 •

edited

Loading

enpasos commented Jan 16, 2021

enpasos commented Jan 17, 2021

enpasos commented Jan 17, 2021

enpasos commented Jan 17, 2021

enpasos commented Jan 18, 2021

scale gradient for backward pass #521

scale gradient for backward pass #521

Comments

enpasos commented Jan 13, 2021

Question or maybe Enhancement

Workaround

roywei commented Jan 16, 2021 • edited Loading

enpasos commented Jan 16, 2021

enpasos commented Jan 17, 2021

enpasos commented Jan 17, 2021

enpasos commented Jan 17, 2021

enpasos commented Jan 18, 2021

roywei commented Jan 16, 2021 •

edited

Loading