Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the provided optimizers to support more complicated cost function signatures. #683

Closed
josh146 opened this issue Jun 16, 2020 · 18 comments
Labels
discussion 💬 Requiring extra discussion

Comments

@josh146
Copy link
Member

josh146 commented Jun 16, 2020

Currently, the built-in PennyLane optimizers have the following restrictions:

  • They only accept an objective function with the signature objective_fn(params)

  • params must be a sequence (of arbitrary shape).

This is due to the autograd.grad() function requiring, at call time, the argument indices of the trainable parameters. It was decided originally to simply hardcode argnum=0 and restrict cost function signatures.

As a result, most examples and tutorials that use autograd with the built-in optimizers use the following patterns: (a) they require the trainable parameters be a single NumPy array, and (b) if the cost function takes additional arguments, the cost function is 'wrapped' to ensure the signature is what the optimizer expects. This is sometimes done by defining a separate cost function, or simply using an anonymous function:

params = opt.step(lambda params: circuit(data, params), params)

With the completion of the Variable refactor, however we can now avoid this issue; the user can explicitly mark parameters as differentiable or not, and the built-in qml.grad() function uses this to determine at call time, dynamically, the differentiable arguments:

dev = qml.device("default.qubit", wires=1)

@qml.qnode(dev)
def circuit(phi1, phi2, data):
    qml.RZ(data, wires=0)
    qml.RX(phi1, wires=0)
    qml.RY(phi2, wires=0)
    return qml.expval(qml.PauliZ(0))

phi1 = 0.54
phi2 = 0.12
data = np.array(0.1, requires_grad=False)

dcircuit = qml.grad(circuit)  # equivalent to autograd.grad(circuit, argnum=[0, 1])
print(dcircuit(phi1, phi2, data))

This allows us to rethink the optimizers, and increase their flexibility. For example, we could now allow the following:

# trainable_parameter_updates = opt.step(objective_fn, args)
phi1, phi2 = opt.step(circuit, [phi1, phi2, data])

This likely requires several changes:

  • Rather than assuming objective_fn(x), the optimizers should now assume objective_fn(*args).

  • Replace autograd.grad(objective_fn, argnum=0)(x) with qml.grad(objective_fn)(*args)

  • Modify apply_grad(grad, x) to apply_grad(grad, *args), and have it inspect requires_grad to determine which arguments should are trainable.

We should also be able to preserve backcompatibility, e.g., params = opt.step(circuit, params), simply by applying the previous logic if isinstance(args, qml.numpy.Tensor) and args.requires_grad = True.

@josh146 josh146 added the discussion 💬 Requiring extra discussion label Jun 16, 2020
@co9olguy
Copy link
Member

co9olguy commented Jun 16, 2020

This is good to fix. Current implementation would "accidentally" update variables which the user has marked as requires_grad=False, if the user is not careful to remove these non-differentiable args (e.g., data) from the input arg to the optimizer. So maybe not really a bug, but definitely a gotcha

@josh146
Copy link
Member Author

josh146 commented Jun 17, 2020

Current implementation would "accidentally" update variables which the user has marked as requires_grad=False

This shouldn't be the case; the variable refactor was implemented to catch this at a low level. For example, attempting to do the following on the master branch:

import pennylane as qml
from pennylane import numpy as np

dev = qml.device("default.qubit", wires=1)

@qml.qnode(dev)
def circuit(params):
    qml.RX(params[0], wires=0)
    qml.RY(params[1], wires=0)
    return qml.expval(qml.PauliZ(0))


params = np.array([0.11, 0.12], requires_grad=False)

# initialise the optimizer
opt = qml.GradientDescentOptimizer(stepsize=0.4)
params = opt.step(circuit, params)

raises the following exception:

  File "/home/josh/xanadu/pennylane/pennylane/numpy/tensor.py", line 137, in tensor_to_arraybox
    "{} is non-differentiable. Set the requires_grad attribute to True.".format(x)
pennylane.numpy.tensor.NonDifferentiableError: [0.11 0.12] is non-differentiable. Set the requires_grad attribute to True.

@co9olguy
Copy link
Member

Nice 🙏

@Lucaman99
Copy link
Contributor

Based off of a discussion in PennyLaneAI/qml#103, it might also be nice for there to be a way that the value of a cost function can be extracted from the optimizer.step operation, so that displaying the value of the cost function for each step, when optimization a variational circuit, doesn't double the number of required circuit executions

@albi3ro
Copy link
Contributor

albi3ro commented Dec 8, 2020

Allowing arbitrary cost functions might prove error-prone, especially with hard to diagnose errors. But allowing non-iterated arguments allows more flexibility to the cost function. So we restrict the cost function to a form

  f(params, *args, **kwargs)

for example circuit(params, data, some_flag=True). The optimizer will only increment the variable in the first position.

@josh146
Copy link
Member Author

josh146 commented Dec 8, 2020

Allowing arbitrary cost functions might prove error-prone, especially with hard to diagnose errors.

I think this is a good call 👍 Another nice side effect is that it conforms to the scipy.optimize.minimize cost function constraints, so there is (a) precedent for this approach, and (b) intuitive for users familiar with SciPy.

Also, if I recall almost all user feedback had to do with finding it hard to work out how to pass non-trainable keyword arguments to the cost function they were training -- I don't recall seeing questions regarding training multiple arguments.

albi3ro added a commit to albi3ro/pennylane that referenced this issue Dec 8, 2020
@mariaschuld
Copy link
Contributor

mariaschuld commented Dec 9, 2020

Just to add a thought: Would an alternative option be to provide a function that produces a cost function of the form the optimisers will recognise? The function would basically just produce the lambda function we have at the moment in our example.

The advantage would be flexibility: If an optimiser would need a different input, we just adapt the function, not the standard "first argument of cost is parameter". I am just saying this because it follows more the functional style and avoids the mine field of forcing signatures, which I have always regretted in the past...

But I am struggling to come up with a convincing code example, to be honest. Something like

cost = prepare_cost(original_cost, diffable_arguments=[0,2]) 

params = optimizer.step(cost, params)

where cost is a function whose first argument summarises all trainable inputs and original_cost shall be optimised in its 0th and 2nd argument?

Ok, that still feels very wrong.

Another idea: Could the optimiser not instead go through the parameters to figure out which ones are trainable?

@josh146
Copy link
Member Author

josh146 commented Dec 9, 2020

Just to add a thought: Would an alternative option be to provide a function that produces a cost function of the form the optimisers will recognise? The function would basically just produce the lambda function we have at the moment in our example.

I think this is maybe too complicated; rather than attempting to provide utility functions to work around an un-intuitive step restriction, we should re-think the step UX. Especially because the requirement that the cost function signature must be cost(params) is overly restrictive.

Also, wrapping/creating lambda functions on the fly is frought from my experience.

Ok, that still feels very wrong.

😆

Another idea: Could the optimiser not instead go through the parameters to figure out which ones are trainable?

Yep, this is the idea from the top comment. Perhaps the following could be possible?

def cost(phi1, data, phi2):
    ...

phi1 = np.array([0.1, 0.2], requires_grad=True) # trainable
data = np.array([1, 2, 3], requires_grad=False) # non-trainable
phi2= np.array([[0.56, -0.6], [0.54, 0.123]], requires_grad=False) # trainable

# phi1 and phi2 are updated, data is not since it is not trainable
phi1, data, phi2 = opt.step(circuit, [phi1, data, phi2])

Chatting with @albi3ro offline, the worry was that the above ^ approach could introduce too many edge cases we would have to cover. So the above suggestion (first param trainable, rest are not) was suggested due to how SciPy works:

def cost(phi1, data):
    ...

phi1 = np.array([0.1, 0.2], requires_grad=True) # trainable
data = np.array([1, 2, 3], requires_grad=False) # non-trainable

# phi1 and phi2 are updated, data is not since it is not trainable
phi1 = opt.step(circuit, phi1, data=data)

@mariaschuld
Copy link
Contributor

mariaschuld commented Dec 9, 2020

Oh I see, sorry for only reading half of the discussion. I know that it is annoying when people comment without having a better solution, but if there is any way to avoid introducing a restriction of the cost function's signature I think that would be really great. Even if scipy uses it, it feels so wrong for an autodiff library. It was introducing headaches in templates before, and is also a downside of the keras/torch layers.

Would it not be a much better approach to ask "what do we have to change down below so that designing the optimiser is not so darn difficult"? In a sense, safely tracking which pars are diffable should be a core feature (and I remember needing it for a Fourier spectrum extractor before)! No?

@josh146
Copy link
Member Author

josh146 commented Dec 9, 2020

Would it not be a much better approach to ask "what do we have to change down below so that designing the optimiser is not so darn difficult"?

Yeah for sure!

Looking at it from that point of view, I would say that

# phi1 and phi2 are updated, data is not since it is not trainable
phi1, data, phi2 = opt.step(circuit, [phi1, data, phi2])

is maybe the best we could do? What do you think of this UI?

I am hesitant to have the output be a different shape to the input (e.g., have output phi1, phi2 and drop the data), since that will add a lot of complexity to the code.

avoid introducing a restriction of the cost function's restriction I think that would be really great.

I suppose I see the other proposal as more of a loosening an existing restriction; currently the cost functions must be strictly cost(params), whereas that proposal allows cost(params, *non_trainable_args) 😆

@albi3ro
Copy link
Contributor

albi3ro commented Dec 9, 2020

I like

# phi1 and phi2 are updated, data is not since it is not trainable
phi1, data, phi2 = opt.step(circuit, [phi1, data, phi2])

This format fixes the worries I had about returning a different number of variables than got passed to the function. I worried that someone would momentarily forget which ones are differentiable and start sticking variables into the wrong name.

How useful would including non-trainable keywords be? For example,

def circuit(params, printing=False):
     if printing:
          print("I am iterating")

params = opt.step(circuit, params, printing=True)

@josh146
Copy link
Member Author

josh146 commented Dec 9, 2020

Good point @albi3ro. I suppose in this case it would look like:

def circuit(params, printing=False):
     if printing:
          print("I am iterating")

params, _ = opt.step(circuit, [params, True])

(where the second element in the tuple would just be True?)

@albi3ro
Copy link
Contributor

albi3ro commented Dec 9, 2020

Any named/ keyword arguments, passed in the form f(name = value), like printing would be caught by **kwargs instead of *args. These are not positional, get stored in a dictionary, and thus may have their order switched around.

So useful as configuration flags, but not as variables.

albi3ro added a commit to albi3ro/pennylane that referenced this issue Dec 9, 2020
@albi3ro
Copy link
Contributor

albi3ro commented Dec 9, 2020

Let's leave the RotoSelectOptimizer the way it is because the number of parameters and the number of generators have to match. While we could allow non-training parameters and classical parameters, I think that would just confuse the situation too much.

@albi3ro
Copy link
Contributor

albi3ro commented Dec 11, 2020

For opt.step_and_cost(f, x,y) , do we want to return

(x_new, y_new), cost = opt.step_and_cost(f, x, y)

or

x_new, y_new, cost = opt.step_and_cost(f, x, y)

@josh146
Copy link
Member Author

josh146 commented Dec 14, 2020

I would say definitely the latter,

(x_new, y_new), cost = opt.step_and_cost(f, x, y)

@albi3ro
Copy link
Contributor

albi3ro commented Dec 14, 2020

Are the arguments to the cost function restricted to by NumPy arrays?

If not, can we make that restriction from now on?

I currently have two ways out of a giant "NonDifferentiableError":

  • Restricting arguments to NumPy arrays
  • specifying the non-trainable parameters by passing them through keywords instead of as an argument with requires_grad=False.

albi3ro added a commit that referenced this issue Jan 6, 2021
* allowing more parameters to cost function in gradient descent

Issue #683 , implements the basic idea

* multiple args and kwargs support for most optimizers

* improved structure, return format

* edit changelog

* near finished version of the operators

Only tests and changelog to go

* update doc about provided gradient form

* update test_optimize for new gradient form

* testing multiple arguments, non-training args, keywords

* improved changelog

* linting


* Update .github/CHANGELOG.md

Co-authored-by: Josh Izaac <josh146@gmail.com>

* changelog conform to black

Co-authored-by: Josh Izaac <josh146@gmail.com>

* wording change

Co-authored-by: Josh Izaac <josh146@gmail.com>

* wording change

Co-authored-by: Josh Izaac <josh146@gmail.com>

* comments on code example

Co-authored-by: Josh Izaac <josh146@gmail.com>

* wording change

Co-authored-by: Josh Izaac <josh146@gmail.com>

* Update pennylane/optimize/gradient_descent.py

Co-authored-by: Josh Izaac <josh146@gmail.com>

* Update pennylane/optimize/gradient_descent.py

Co-authored-by: Josh Izaac <josh146@gmail.com>

* Update pennylane/optimize/momentum.py

Co-authored-by: Josh Izaac <josh146@gmail.com>

* docs string wording

Co-authored-by: Josh Izaac <josh146@gmail.com>

* Update pennylane/optimize/rotosolve.py

Co-authored-by: Josh Izaac <josh146@gmail.com>

* Update pennylane/optimize/nesterov_momentum.py

Co-authored-by: Josh Izaac <josh146@gmail.com>

* Update pennylane/optimize/rotosolve.py

Co-authored-by: Josh Izaac <josh146@gmail.com>

* Update pennylane/optimize/adam.py

Co-authored-by: Josh Izaac <josh146@gmail.com>

* fix rotosolve

* improve docstrings

* Apply simple, local suggestions from code review

Co-authored-by: antalszava <antalszava@gmail.com>
Co-authored-by: Theodor <theodor@xanadu.ai>

* Most code review comments implemented

* black on new tests

* fix nesterov momentum

* actually add rotoselect kwargs this time. nesterov test

* ran black on rotoselect

* minor docstring fixes

* name on changelog, tests in progress changing

* black

* test rotosolve, fix rotosolve

* remove import of mocker

Co-authored-by: Josh Izaac <josh146@gmail.com>
Co-authored-by: antalszava <antalszava@gmail.com>
Co-authored-by: Theodor <theodor@xanadu.ai>
@albi3ro
Copy link
Contributor

albi3ro commented Jan 6, 2021

This discussion should now be resolved with PR #959

@albi3ro albi3ro closed this as completed Jan 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion 💬 Requiring extra discussion
Projects
None yet
Development

No branches or pull requests

5 participants