Update the provided optimizers to support more complicated cost function signatures. #683

josh146 · 2020-06-16T06:57:15Z

Currently, the built-in PennyLane optimizers have the following restrictions:

They only accept an objective function with the signature objective_fn(params)
params must be a sequence (of arbitrary shape).

This is due to the autograd.grad() function requiring, at call time, the argument indices of the trainable parameters. It was decided originally to simply hardcode argnum=0 and restrict cost function signatures.

As a result, most examples and tutorials that use autograd with the built-in optimizers use the following patterns: (a) they require the trainable parameters be a single NumPy array, and (b) if the cost function takes additional arguments, the cost function is 'wrapped' to ensure the signature is what the optimizer expects. This is sometimes done by defining a separate cost function, or simply using an anonymous function:

params = opt.step(lambda params: circuit(data, params), params)

With the completion of the Variable refactor, however we can now avoid this issue; the user can explicitly mark parameters as differentiable or not, and the built-in qml.grad() function uses this to determine at call time, dynamically, the differentiable arguments:

dev = qml.device("default.qubit", wires=1)

@qml.qnode(dev)
def circuit(phi1, phi2, data):
    qml.RZ(data, wires=0)
    qml.RX(phi1, wires=0)
    qml.RY(phi2, wires=0)
    return qml.expval(qml.PauliZ(0))

phi1 = 0.54
phi2 = 0.12
data = np.array(0.1, requires_grad=False)

dcircuit = qml.grad(circuit)  # equivalent to autograd.grad(circuit, argnum=[0, 1])
print(dcircuit(phi1, phi2, data))

This allows us to rethink the optimizers, and increase their flexibility. For example, we could now allow the following:

# trainable_parameter_updates = opt.step(objective_fn, args)
phi1, phi2 = opt.step(circuit, [phi1, phi2, data])

This likely requires several changes:

Rather than assuming objective_fn(x), the optimizers should now assume objective_fn(*args).
Replace autograd.grad(objective_fn, argnum=0)(x) with qml.grad(objective_fn)(*args)
Modify apply_grad(grad, x) to apply_grad(grad, *args), and have it inspect requires_grad to determine which arguments should are trainable.

We should also be able to preserve backcompatibility, e.g., params = opt.step(circuit, params), simply by applying the previous logic if isinstance(args, qml.numpy.Tensor) and args.requires_grad = True.

The text was updated successfully, but these errors were encountered:

co9olguy · 2020-06-16T15:46:01Z

This is good to fix. Current implementation would "accidentally" update variables which the user has marked as requires_grad=False, if the user is not careful to remove these non-differentiable args (e.g., data) from the input arg to the optimizer. So maybe not really a bug, but definitely a gotcha

josh146 · 2020-06-17T02:17:38Z

Current implementation would "accidentally" update variables which the user has marked as requires_grad=False

This shouldn't be the case; the variable refactor was implemented to catch this at a low level. For example, attempting to do the following on the master branch:

import pennylane as qml
from pennylane import numpy as np

dev = qml.device("default.qubit", wires=1)

@qml.qnode(dev)
def circuit(params):
    qml.RX(params[0], wires=0)
    qml.RY(params[1], wires=0)
    return qml.expval(qml.PauliZ(0))


params = np.array([0.11, 0.12], requires_grad=False)

# initialise the optimizer
opt = qml.GradientDescentOptimizer(stepsize=0.4)
params = opt.step(circuit, params)

raises the following exception:

  File "/home/josh/xanadu/pennylane/pennylane/numpy/tensor.py", line 137, in tensor_to_arraybox
    "{} is non-differentiable. Set the requires_grad attribute to True.".format(x)
pennylane.numpy.tensor.NonDifferentiableError: [0.11 0.12] is non-differentiable. Set the requires_grad attribute to True.

co9olguy · 2020-06-17T14:05:30Z

Nice 🙏

Lucaman99 · 2020-06-23T15:56:16Z

Based off of a discussion in PennyLaneAI/qml#103, it might also be nice for there to be a way that the value of a cost function can be extracted from the optimizer.step operation, so that displaying the value of the cost function for each step, when optimization a variational circuit, doesn't double the number of required circuit executions

albi3ro · 2020-12-08T15:41:23Z

Allowing arbitrary cost functions might prove error-prone, especially with hard to diagnose errors. But allowing non-iterated arguments allows more flexibility to the cost function. So we restrict the cost function to a form

  f(params, *args, **kwargs)

for example circuit(params, data, some_flag=True). The optimizer will only increment the variable in the first position.

josh146 · 2020-12-08T16:04:13Z

Allowing arbitrary cost functions might prove error-prone, especially with hard to diagnose errors.

I think this is a good call 👍 Another nice side effect is that it conforms to the scipy.optimize.minimize cost function constraints, so there is (a) precedent for this approach, and (b) intuitive for users familiar with SciPy.

Also, if I recall almost all user feedback had to do with finding it hard to work out how to pass non-trainable keyword arguments to the cost function they were training -- I don't recall seeing questions regarding training multiple arguments.

Issue PennyLaneAI#683

mariaschuld · 2020-12-09T11:37:31Z

Just to add a thought: Would an alternative option be to provide a function that produces a cost function of the form the optimisers will recognise? The function would basically just produce the lambda function we have at the moment in our example.

The advantage would be flexibility: If an optimiser would need a different input, we just adapt the function, not the standard "first argument of cost is parameter". I am just saying this because it follows more the functional style and avoids the mine field of forcing signatures, which I have always regretted in the past...

But I am struggling to come up with a convincing code example, to be honest. Something like

cost = prepare_cost(original_cost, diffable_arguments=[0,2]) 

params = optimizer.step(cost, params)

where cost is a function whose first argument summarises all trainable inputs and original_cost shall be optimised in its 0th and 2nd argument?

Ok, that still feels very wrong.

Another idea: Could the optimiser not instead go through the parameters to figure out which ones are trainable?

josh146 · 2020-12-09T11:47:28Z

Just to add a thought: Would an alternative option be to provide a function that produces a cost function of the form the optimisers will recognise? The function would basically just produce the lambda function we have at the moment in our example.

I think this is maybe too complicated; rather than attempting to provide utility functions to work around an un-intuitive step restriction, we should re-think the step UX. Especially because the requirement that the cost function signature must be cost(params) is overly restrictive.

Also, wrapping/creating lambda functions on the fly is frought from my experience.

Ok, that still feels very wrong.

😆

Another idea: Could the optimiser not instead go through the parameters to figure out which ones are trainable?

Yep, this is the idea from the top comment. Perhaps the following could be possible?

def cost(phi1, data, phi2):
    ...

phi1 = np.array([0.1, 0.2], requires_grad=True) # trainable
data = np.array([1, 2, 3], requires_grad=False) # non-trainable
phi2= np.array([[0.56, -0.6], [0.54, 0.123]], requires_grad=False) # trainable

# phi1 and phi2 are updated, data is not since it is not trainable
phi1, data, phi2 = opt.step(circuit, [phi1, data, phi2])

Chatting with @albi3ro offline, the worry was that the above ^ approach could introduce too many edge cases we would have to cover. So the above suggestion (first param trainable, rest are not) was suggested due to how SciPy works:

def cost(phi1, data):
    ...

phi1 = np.array([0.1, 0.2], requires_grad=True) # trainable
data = np.array([1, 2, 3], requires_grad=False) # non-trainable

# phi1 and phi2 are updated, data is not since it is not trainable
phi1 = opt.step(circuit, phi1, data=data)

mariaschuld · 2020-12-09T12:27:06Z

Oh I see, sorry for only reading half of the discussion. I know that it is annoying when people comment without having a better solution, but if there is any way to avoid introducing a restriction of the cost function's signature I think that would be really great. Even if scipy uses it, it feels so wrong for an autodiff library. It was introducing headaches in templates before, and is also a downside of the keras/torch layers.

Would it not be a much better approach to ask "what do we have to change down below so that designing the optimiser is not so darn difficult"? In a sense, safely tracking which pars are diffable should be a core feature (and I remember needing it for a Fourier spectrum extractor before)! No?

josh146 · 2020-12-09T12:31:02Z

Would it not be a much better approach to ask "what do we have to change down below so that designing the optimiser is not so darn difficult"?

Yeah for sure!

Looking at it from that point of view, I would say that

# phi1 and phi2 are updated, data is not since it is not trainable
phi1, data, phi2 = opt.step(circuit, [phi1, data, phi2])

is maybe the best we could do? What do you think of this UI?

I am hesitant to have the output be a different shape to the input (e.g., have output phi1, phi2 and drop the data), since that will add a lot of complexity to the code.

avoid introducing a restriction of the cost function's restriction I think that would be really great.

I suppose I see the other proposal as more of a loosening an existing restriction; currently the cost functions must be strictly cost(params), whereas that proposal allows cost(params, *non_trainable_args) 😆

albi3ro · 2020-12-09T14:52:52Z

I like

# phi1 and phi2 are updated, data is not since it is not trainable
phi1, data, phi2 = opt.step(circuit, [phi1, data, phi2])

This format fixes the worries I had about returning a different number of variables than got passed to the function. I worried that someone would momentarily forget which ones are differentiable and start sticking variables into the wrong name.

How useful would including non-trainable keywords be? For example,

def circuit(params, printing=False):
     if printing:
          print("I am iterating")

params = opt.step(circuit, params, printing=True)

josh146 · 2020-12-09T15:02:08Z

Good point @albi3ro. I suppose in this case it would look like:

def circuit(params, printing=False):
     if printing:
          print("I am iterating")

params, _ = opt.step(circuit, [params, True])

(where the second element in the tuple would just be True?)

albi3ro · 2020-12-09T15:17:31Z

Any named/ keyword arguments, passed in the form f(name = value), like printing would be caught by **kwargs instead of *args. These are not positional, get stored in a dictionary, and thus may have their order switched around.

So useful as configuration flags, but not as variables.

Issue PennyLaneAI#683 , implements the basic idea

albi3ro · 2020-12-09T23:38:03Z

Let's leave the RotoSelectOptimizer the way it is because the number of parameters and the number of generators have to match. While we could allow non-training parameters and classical parameters, I think that would just confuse the situation too much.

albi3ro · 2020-12-11T22:20:46Z

For opt.step_and_cost(f, x,y) , do we want to return

(x_new, y_new), cost = opt.step_and_cost(f, x, y)

or

x_new, y_new, cost = opt.step_and_cost(f, x, y)

josh146 · 2020-12-14T07:11:09Z

I would say definitely the latter,

(x_new, y_new), cost = opt.step_and_cost(f, x, y)

albi3ro · 2020-12-14T23:47:10Z

Are the arguments to the cost function restricted to by NumPy arrays?

If not, can we make that restriction from now on?

I currently have two ways out of a giant "NonDifferentiableError":

Restricting arguments to NumPy arrays
specifying the non-trainable parameters by passing them through keywords instead of as an argument with requires_grad=False.

* allowing more parameters to cost function in gradient descent Issue #683 , implements the basic idea * multiple args and kwargs support for most optimizers * improved structure, return format * edit changelog * near finished version of the operators Only tests and changelog to go * update doc about provided gradient form * update test_optimize for new gradient form * testing multiple arguments, non-training args, keywords * improved changelog * linting * Update .github/CHANGELOG.md Co-authored-by: Josh Izaac <josh146@gmail.com> * changelog conform to black Co-authored-by: Josh Izaac <josh146@gmail.com> * wording change Co-authored-by: Josh Izaac <josh146@gmail.com> * wording change Co-authored-by: Josh Izaac <josh146@gmail.com> * comments on code example Co-authored-by: Josh Izaac <josh146@gmail.com> * wording change Co-authored-by: Josh Izaac <josh146@gmail.com> * Update pennylane/optimize/gradient_descent.py Co-authored-by: Josh Izaac <josh146@gmail.com> * Update pennylane/optimize/gradient_descent.py Co-authored-by: Josh Izaac <josh146@gmail.com> * Update pennylane/optimize/momentum.py Co-authored-by: Josh Izaac <josh146@gmail.com> * docs string wording Co-authored-by: Josh Izaac <josh146@gmail.com> * Update pennylane/optimize/rotosolve.py Co-authored-by: Josh Izaac <josh146@gmail.com> * Update pennylane/optimize/nesterov_momentum.py Co-authored-by: Josh Izaac <josh146@gmail.com> * Update pennylane/optimize/rotosolve.py Co-authored-by: Josh Izaac <josh146@gmail.com> * Update pennylane/optimize/adam.py Co-authored-by: Josh Izaac <josh146@gmail.com> * fix rotosolve * improve docstrings * Apply simple, local suggestions from code review Co-authored-by: antalszava <antalszava@gmail.com> Co-authored-by: Theodor <theodor@xanadu.ai> * Most code review comments implemented * black on new tests * fix nesterov momentum * actually add rotoselect kwargs this time. nesterov test * ran black on rotoselect * minor docstring fixes * name on changelog, tests in progress changing * black * test rotosolve, fix rotosolve * remove import of mocker Co-authored-by: Josh Izaac <josh146@gmail.com> Co-authored-by: antalszava <antalszava@gmail.com> Co-authored-by: Theodor <theodor@xanadu.ai>

albi3ro · 2021-01-06T22:33:37Z

This discussion should now be resolved with PR #959

josh146 added the discussion 💬 Requiring extra discussion label Jun 16, 2020

josh146 mentioned this issue Jun 23, 2020

Creates New QGRNN Tutorial PennyLaneAI/qml#103

Merged

albi3ro added a commit to albi3ro/pennylane that referenced this issue Dec 8, 2020

allow additional non-optimized args and kwargs

d68bab0

Issue PennyLaneAI#683

albi3ro added a commit to albi3ro/pennylane that referenced this issue Dec 9, 2020

allowing more parameters to cost function in gradient descent

fe79cc6

Issue PennyLaneAI#683 , implements the basic idea

albi3ro mentioned this issue Dec 9, 2020

Allowing more flexible cost functions for optimizers #959

Merged

albi3ro closed this as completed Jan 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the provided optimizers to support more complicated cost function signatures. #683

Update the provided optimizers to support more complicated cost function signatures. #683

josh146 commented Jun 16, 2020 •

edited

Loading

co9olguy commented Jun 16, 2020 •

edited

Loading

josh146 commented Jun 17, 2020

co9olguy commented Jun 17, 2020

Lucaman99 commented Jun 23, 2020

albi3ro commented Dec 8, 2020

josh146 commented Dec 8, 2020

mariaschuld commented Dec 9, 2020 •

edited

Loading

josh146 commented Dec 9, 2020

mariaschuld commented Dec 9, 2020 •

edited

Loading

josh146 commented Dec 9, 2020 •

edited

Loading

albi3ro commented Dec 9, 2020

josh146 commented Dec 9, 2020

albi3ro commented Dec 9, 2020

albi3ro commented Dec 9, 2020

albi3ro commented Dec 11, 2020

josh146 commented Dec 14, 2020

albi3ro commented Dec 14, 2020

albi3ro commented Jan 6, 2021

Update the provided optimizers to support more complicated cost function signatures. #683

Update the provided optimizers to support more complicated cost function signatures. #683

Comments

josh146 commented Jun 16, 2020 • edited Loading

co9olguy commented Jun 16, 2020 • edited Loading

josh146 commented Jun 17, 2020

co9olguy commented Jun 17, 2020

Lucaman99 commented Jun 23, 2020

albi3ro commented Dec 8, 2020

josh146 commented Dec 8, 2020

mariaschuld commented Dec 9, 2020 • edited Loading

josh146 commented Dec 9, 2020

mariaschuld commented Dec 9, 2020 • edited Loading

josh146 commented Dec 9, 2020 • edited Loading

albi3ro commented Dec 9, 2020

josh146 commented Dec 9, 2020

albi3ro commented Dec 9, 2020

albi3ro commented Dec 9, 2020

albi3ro commented Dec 11, 2020

josh146 commented Dec 14, 2020

albi3ro commented Dec 14, 2020

albi3ro commented Jan 6, 2021

josh146 commented Jun 16, 2020 •

edited

Loading

co9olguy commented Jun 16, 2020 •

edited

Loading

mariaschuld commented Dec 9, 2020 •

edited

Loading

mariaschuld commented Dec 9, 2020 •

edited

Loading

josh146 commented Dec 9, 2020 •

edited

Loading