Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing more flexible cost functions for optimizers #959

Merged
merged 49 commits into from
Jan 6, 2021

Conversation

albi3ro
Copy link
Contributor

@albi3ro albi3ro commented Dec 9, 2020

As discussed in Issue #683

This allows cost functions to accept any combination of parameters and data. The Optimizer will determine which variables to increment based on the trait requires_grad.

This change will allow users to do something like:

def f(x,y,z):
     return x+y+z

x = np.array([1], requires_grad=True)
y = np.array([1], requires_grad=False)
z = np.array([1], requires_grad=True)

opt = qml.GradientDescentOptimizer()
x_new, y_old, z_new = opt.step(f,x,y,z)

To Do:

  • improve flow and readability in GradientDescent
  • Implement in the other optimizers
  • Tests
  • Changelog

@albi3ro albi3ro added the WIP 🚧 Work-in-progress label Dec 9, 2020
@albi3ro albi3ro removed the WIP 🚧 Work-in-progress label Dec 16, 2020
@albi3ro albi3ro changed the title [WIP] Allowing more flexible cost functions for optimizers Allowing more flexible cost functions for optimizers Dec 16, 2020
@codecov
Copy link

codecov bot commented Dec 16, 2020

Codecov Report

Merging #959 (f7e9d67) into master (09a6239) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #959      +/-   ##
==========================================
+ Coverage   97.86%   97.88%   +0.01%     
==========================================
  Files         150      150              
  Lines       10876    10964      +88     
==========================================
+ Hits        10644    10732      +88     
  Misses        232      232              
Impacted Files Coverage Δ
pennylane/optimize/adagrad.py 100.00% <100.00%> (ø)
pennylane/optimize/adam.py 100.00% <100.00%> (ø)
pennylane/optimize/gradient_descent.py 100.00% <100.00%> (ø)
pennylane/optimize/momentum.py 100.00% <100.00%> (ø)
pennylane/optimize/nesterov_momentum.py 100.00% <100.00%> (ø)
pennylane/optimize/qng.py 100.00% <100.00%> (ø)
pennylane/optimize/rms_prop.py 100.00% <100.00%> (ø)
pennylane/optimize/rotoselect.py 95.65% <100.00%> (ø)
pennylane/optimize/rotosolve.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09a6239...f7e9d67. Read the comment docs.

@albi3ro albi3ro added the review-ready 👌 PRs which are ready for review by someone from the core team. label Dec 16, 2020
Copy link
Member

@josh146 josh146 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @albi3ro 🎉

I've left a few comments, mostly relating to comments, rearranging of logic, docstrings, and tests, but I didn't spot anything major.

In general, I really like your solution here.

  • opt.step(cost, x, y, z, data) is a nice extension of the existing syntax (opt.step(cost, x)). The logic extension is also quite straightforward; we simply loop over all *args, and update those that are trainable. I think this is very intuitive, since a non-trainable parameter has a 'gradient' of 0, and you wouldn't expect the parameter to be updated anyway.

  • It is also backwards compatible with the old approach! Which is really nice. I've created a PR over in the QML repo (Test new optimizers qml#189) to run all the demonstrations using this branch, to see if there are any issues (the QML demo is a good real-world test case of user-facing changes). As I write this, 73% of all demos have passed, with no failures 🤞

I only have two questions regarding the custom grad_fn:

  • I'm curious why the tests passed before, since the test data did not previously conform to the same output shape as qml.grad.

  • I'm curious if there is a path to retaining support for custom grad_fns with scalar outputs. I don't think this is too important, since use of custom grad functions is rare. In the future, however, we might want to write the optimizers to support Torch/TF, so it might be something we re-consider down the line.

.github/CHANGELOG.md Outdated Show resolved Hide resolved
.github/CHANGELOG.md Outdated Show resolved Hide resolved
.github/CHANGELOG.md Outdated Show resolved Hide resolved
pennylane/optimize/gradient_descent.py Outdated Show resolved Hide resolved
pennylane/optimize/gradient_descent.py Outdated Show resolved Hide resolved
tests/test_optimize.py Show resolved Hide resolved
Comment on lines 675 to 688
@pytest.mark.parametrize(
"opt",
[
GradientDescentOptimizer(stepsize),
MomentumOptimizer(stepsize, momentum=gamma),
NesterovMomentumOptimizer(stepsize, momentum=gamma),
AdagradOptimizer(stepsize),
RMSPropOptimizer(stepsize, decay=gamma),
AdamOptimizer(stepsize, beta1=gamma, beta2=delta),
RotosolveOptimizer(),
],
)
class Test_over_opts:
def test_kwargs(self, opt, tol):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an indentation issue here; this class/decorator should be at the module level (no indentation).

Actually, I'm a bit surprised this works! I didn't think pytest would find/test nested classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I should take it out of the TestOptimizer class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think what @josh146 meant was to make the new class be another stand-alone class, just as TestOptimizer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

if getattr(opt, "reset", None):
opt.reset()

assert x_new_one != pytest.approx(x_new_two, abs=tol)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test!

Perhaps we can be a bit more explicit though in checking that the cost function was called with the correct arguments. The mocker pytest fixture can help us do this:

def test_kwargs(self, opt, tol, mocker):
    """Test that the keywords get passed and alter the function"""

    class Cost:
        @staticmethod
        def kwargs_func(x, c=1.0):
            return (x - c) ** 2

    cost = Cost()
    spy = mocker.spy(cost, "kwargs_func")

    x_new_one = opt.step(cost.kwargs_func, 1.0, c=1.0)
    assert spy.call_args_list[-1][1]["c"] == 1.0

    x_new_two = opt.step(cost.kwargs_func, 1.0, c=2.0)
    assert spy.call_args_list[-1][1]["c"] == 2.0

    if getattr(opt, "reset", None):
        opt.reset()

    assert x_new_one != pytest.approx(x_new_two, abs=tol)

mocker.spy creates a new Spy object, that we can use to check the arguments/keyword arguments it was called with.

The reason I wrapped the cost function in a class is that I suddenly realised I don't know how to spy on a pure function 🤔 @antalszava @thisac do you guys have any idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not quite sure actually! It seems to be good when there's a module object to pass as the 1st argument, so e.g., mocker.spy(qml, "about") should be fine (see pytest-dev/pytest-mock#97 (comment)).

Comment on lines 710 to 711
"""Test multiple arguments to function"""
x_new, y_new = opt.step(func, *args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spy could also be used here to ensure that the function was called with the args you expect


assert len(args_new) == pytest.approx(2, abs=tol)
assert args_new[0] != pytest.approx(x, abs=tol)
assert args_new[1] == pytest.approx(data, abs=tol)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

An additional test you could add is one like in the changelog example; a cost function with trainable, non-trainable, and keyword arguments.

albi3ro and others added 4 commits December 17, 2020 08:16
Co-authored-by: Josh Izaac <josh146@gmail.com>
Co-authored-by: Josh Izaac <josh146@gmail.com>
Co-authored-by: Josh Izaac <josh146@gmail.com>
Co-authored-by: Josh Izaac <josh146@gmail.com>
Copy link
Contributor

@antalszava antalszava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @albi3ro, a great change for improving how optimizers can work! 💯 Mostly picking up details here, so commenting for now. Left a couple of comments and questions for my own understanding.

pennylane/optimize/gradient_descent.py Outdated Show resolved Hide resolved
pennylane/optimize/gradient_descent.py Show resolved Hide resolved
pennylane/optimize/gradient_descent.py Outdated Show resolved Hide resolved
pennylane/optimize/gradient_descent.py Outdated Show resolved Hide resolved
pennylane/optimize/gradient_descent.py Outdated Show resolved Hide resolved
grad_uni_fns = [np.cos,
lambda x: np.exp(x / 10.) / 10.,
lambda x: 2 * x]
grad_uni_fns = [lambda x: (np.cos(x), ),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better to make these more explicit by passing a single element list to tuple, just because using the (x,) syntax for creating single element tuples could be missed. Would just help perhaps for future reference with readability.

Suggested change
grad_uni_fns = [lambda x: (np.cos(x), ),
grad_uni_fns = [lambda x: tuple([np.cos(x)]),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. that does look a lot clearer

tests/test_optimize.py Outdated Show resolved Hide resolved
Comment on lines 675 to 688
@pytest.mark.parametrize(
"opt",
[
GradientDescentOptimizer(stepsize),
MomentumOptimizer(stepsize, momentum=gamma),
NesterovMomentumOptimizer(stepsize, momentum=gamma),
AdagradOptimizer(stepsize),
RMSPropOptimizer(stepsize, decay=gamma),
AdamOptimizer(stepsize, beta1=gamma, beta2=delta),
RotosolveOptimizer(),
],
)
class Test_over_opts:
def test_kwargs(self, opt, tol):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think what @josh146 meant was to make the new class be another stand-alone class, just as TestOptimizer.

.github/CHANGELOG.md Outdated Show resolved Hide resolved
.github/CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Contributor

@thisac thisac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @albi3ro! Looks really nice, and these changes are fantastic to have. 💯

I mainly have a few comments on code, conventions and docstrings, but overall it looks great!

.github/CHANGELOG.md Show resolved Hide resolved
Comment on lines 88 to 91
# Due to a bug in unflatten, input PennyLane tensors
# are being unwrapped. Here, we cast them back to PennyLane
# tensors. Long term, we should fix this bug in unflatten.
# https://github.com/PennyLaneAI/pennylane/issues/966
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

pennylane/optimize/adagrad.py Show resolved Hide resolved
Comment on lines 52 to 53
"""Update x with one step of the optimizer and return the corresponding objective
function value prior to the step.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few xs in some docstrings that should be updated to args with requires_grad=True.

Suggested change
"""Update x with one step of the optimizer and return the corresponding objective
function value prior to the step.
"""Update differentiable arguments with one step of the optimizer and return the corresponding objective
function value prior to the step.

pennylane/optimize/gradient_descent.py Show resolved Hide resolved
pennylane/optimize/adam.py Show resolved Hide resolved
@@ -47,86 +48,116 @@ def update_stepsize(self, stepsize):
"""
self._stepsize = stepsize

def step_and_cost(self, objective_fn, x, grad_fn=None):
def step_and_cost(self, objective_fn, *args, grad_fn=None, **kwargs):
"""Update x with one step of the optimizer and return the corresponding objective
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Update x with one step of the optimizer and return the corresponding objective
"""Update args with one step of the optimizer and return the corresponding objective

grad_fn (function): Optional gradient function of the
objective function with respect to the variables ``x``.
If ``None``, the gradient function is computed automatically.
Must match shape of autograd derivative.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nicer to clarify a bit better what shape means in this case. E.g. "will always be a tuple containing ...".

pennylane/optimize/gradient_descent.py Outdated Show resolved Hide resolved
Comment on lines 62 to 75
self.accumulation = [None] * len(args)

trained_index = 0
for index, arg in enumerate(args):
if getattr(arg, "requires_grad", True):
x_flat = _flatten(arg)
grad_flat = _flatten(grad[trained_index])
trained_index += 1

self._update_momentum(index, grad_flat)

x_new_flat = [e - a for a, e in zip(self.accumulation[index], x_flat)]

args_new[index] = unflatten(x_new_flat, arg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems mildly annoying. I agree that there's probably a better way of rearranging the optimizers making this a bit easier for future development.

@thisac
Copy link
Contributor

thisac commented Dec 17, 2020

@albi3ro I think both I and @antalszava reviewed this at almost the exact same time. 😆 It seems like we commented on the same things in quite a few places (so don't get too confused when reading our comments/suggestions).

@albi3ro albi3ro requested a review from josh146 December 22, 2020 20:22
@albi3ro albi3ro self-assigned this Dec 22, 2020
Copy link
Member

@josh146 josh146 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work @albi3ro! 💯

This is ready to merge now from my side, I just have one or two questions about the tests.

pennylane/optimize/rotosolve.py Outdated Show resolved Hide resolved
# list and tuple unpacking is also supported
params = (x, y, data)
params = opt.step(cost, *params)
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to add your name to contributors! (unless you have done that already, and I missed it)

assert cost == pytest.approx(func(x, data, y, c=0.5), abs=tol)

def test_steps_the_same(self, opt, opt_name, tol):
"""Tests optimizing single parameter same as with several at a time"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure I follow this docstring. I find the 'same as with several' slightly hard to parse?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be good to expand this docstring to make the purpose of this test slightly clearer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From other tests, we know that the step for a single argument is correct. I want to make sure it has the same correct step regardless of position in the function call.

I will try to come up with a better way of explaining it concisely.

Comment on lines 738 to 739
def test_kwargs(self, opt, opt_name, tol):
"""Test that the keywords get passed and alter the function"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you end up getting mocker.spy to work, to ensure that the keyword argument gets passed to this function?

Copy link
Member

@josh146 josh146 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @albi3ro! 💯

Happy for this to be merged in once the mocker import is removed!

@@ -19,6 +19,7 @@

import numpy as onp
import pytest
from pytest_mock import mocker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to import mocker here; mocker is a fixture, so pytest will automatically perform 'dependency injection' so that Python can find it in required test functions. Not a very pythonic system!

Suggested change
from pytest_mock import mocker
from pytest_mock import mocker

Comment on lines +774 to +777
assert kwargs2 == {"c": 2.0}
assert kwargs3 == {"c": 3.0}

assert cost_three == pytest.approx(wrapper.func(x, c=3.0), abs=tol)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@albi3ro albi3ro merged commit 3807064 into master Jan 6, 2021
@albi3ro albi3ro deleted the optimize_more_parameters branch January 6, 2021 16:51
@josh146
Copy link
Member

josh146 commented Jan 14, 2021

[ch1081]

forward = getattr(g, "forward", None)

return grad, forward

def apply_grad(self, grad, x):
r"""Update the variables x to take a single optimization step. Flattens and unflattens
def apply_grad(self, grad, args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some feedback from the hackathon based upon a submission that uses compute_grad and apply_grad: they had some confusion because they needed to switch from passing x=params to args=[params] and also (for compute_grad) they needed to explicitly pass kwargs={}.

Should we consider making these methods be similar to step(*args, **kwargs)? Or, alternatively just make these hidden methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trbromley I tend to agree, especially if this would be more of a user-facing method (not sure how much of an issue it would be otherwise). There was a brief discussion re. this when this was under review. Perhaps just making them hidden by preceding with an underscore would be good here. 🤔 @albi3ro @josh146

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat ambivalent here. We definitely considered compute_grad and apply_grad 'private' methods when first designing the optimizers, as they were not intended to be used as the public optimizer API.

But it probably makes more sense to keep them public, and standardize the signatures, since the fact that users were finding and using them suggests they add value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there's a larger demand from users for this function, it'd be inefficient to unpack and repack the values. It would add operations for no real purpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favour of 'doing nothing' for now unless there is more feedback and demand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review-ready 👌 PRs which are ready for review by someone from the core team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants