-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the provided optimizers to support more complicated cost function signatures. #683
Comments
This is good to fix. Current implementation would "accidentally" update variables which the user has marked as |
This shouldn't be the case; the variable refactor was implemented to catch this at a low level. For example, attempting to do the following on the import pennylane as qml
from pennylane import numpy as np
dev = qml.device("default.qubit", wires=1)
@qml.qnode(dev)
def circuit(params):
qml.RX(params[0], wires=0)
qml.RY(params[1], wires=0)
return qml.expval(qml.PauliZ(0))
params = np.array([0.11, 0.12], requires_grad=False)
# initialise the optimizer
opt = qml.GradientDescentOptimizer(stepsize=0.4)
params = opt.step(circuit, params) raises the following exception:
|
Nice 🙏 |
Based off of a discussion in PennyLaneAI/qml#103, it might also be nice for there to be a way that the value of a cost function can be extracted from the |
Allowing arbitrary cost functions might prove error-prone, especially with hard to diagnose errors. But allowing non-iterated arguments allows more flexibility to the cost function. So we restrict the cost function to a form
for example |
I think this is a good call 👍 Another nice side effect is that it conforms to the Also, if I recall almost all user feedback had to do with finding it hard to work out how to pass non-trainable keyword arguments to the cost function they were training -- I don't recall seeing questions regarding training multiple arguments. |
Just to add a thought: Would an alternative option be to provide a function that produces a cost function of the form the optimisers will recognise? The function would basically just produce the lambda function we have at the moment in our example. The advantage would be flexibility: If an optimiser would need a different input, we just adapt the function, not the standard "first argument of cost is parameter". I am just saying this because it follows more the functional style and avoids the mine field of forcing signatures, which I have always regretted in the past... But I am struggling to come up with a convincing code example, to be honest. Something like cost = prepare_cost(original_cost, diffable_arguments=[0,2])
params = optimizer.step(cost, params) where Ok, that still feels very wrong. Another idea: Could the optimiser not instead go through the parameters to figure out which ones are trainable? |
I think this is maybe too complicated; rather than attempting to provide utility functions to work around an un-intuitive Also, wrapping/creating lambda functions on the fly is frought from my experience.
😆
Yep, this is the idea from the top comment. Perhaps the following could be possible? def cost(phi1, data, phi2):
...
phi1 = np.array([0.1, 0.2], requires_grad=True) # trainable
data = np.array([1, 2, 3], requires_grad=False) # non-trainable
phi2= np.array([[0.56, -0.6], [0.54, 0.123]], requires_grad=False) # trainable
# phi1 and phi2 are updated, data is not since it is not trainable
phi1, data, phi2 = opt.step(circuit, [phi1, data, phi2]) Chatting with @albi3ro offline, the worry was that the above ^ approach could introduce too many edge cases we would have to cover. So the above suggestion (first param trainable, rest are not) was suggested due to how SciPy works: def cost(phi1, data):
...
phi1 = np.array([0.1, 0.2], requires_grad=True) # trainable
data = np.array([1, 2, 3], requires_grad=False) # non-trainable
# phi1 and phi2 are updated, data is not since it is not trainable
phi1 = opt.step(circuit, phi1, data=data) |
Oh I see, sorry for only reading half of the discussion. I know that it is annoying when people comment without having a better solution, but if there is any way to avoid introducing a restriction of the cost function's signature I think that would be really great. Even if scipy uses it, it feels so wrong for an autodiff library. It was introducing headaches in templates before, and is also a downside of the keras/torch layers. Would it not be a much better approach to ask "what do we have to change down below so that designing the optimiser is not so darn difficult"? In a sense, safely tracking which pars are diffable should be a core feature (and I remember needing it for a Fourier spectrum extractor before)! No? |
Yeah for sure! Looking at it from that point of view, I would say that # phi1 and phi2 are updated, data is not since it is not trainable
phi1, data, phi2 = opt.step(circuit, [phi1, data, phi2]) is maybe the best we could do? What do you think of this UI? I am hesitant to have the output be a different shape to the input (e.g., have output
I suppose I see the other proposal as more of a loosening an existing restriction; currently the cost functions must be strictly |
I like # phi1 and phi2 are updated, data is not since it is not trainable
phi1, data, phi2 = opt.step(circuit, [phi1, data, phi2]) This format fixes the worries I had about returning a different number of variables than got passed to the function. I worried that someone would momentarily forget which ones are differentiable and start sticking variables into the wrong name. How useful would including non-trainable keywords be? For example, def circuit(params, printing=False):
if printing:
print("I am iterating")
params = opt.step(circuit, params, printing=True) |
Good point @albi3ro. I suppose in this case it would look like: def circuit(params, printing=False):
if printing:
print("I am iterating")
params, _ = opt.step(circuit, [params, True]) (where the second element in the tuple would just be |
Any named/ keyword arguments, passed in the form So useful as configuration flags, but not as variables. |
Issue PennyLaneAI#683 , implements the basic idea
Let's leave the |
For (x_new, y_new), cost = opt.step_and_cost(f, x, y) or x_new, y_new, cost = opt.step_and_cost(f, x, y) |
I would say definitely the latter, (x_new, y_new), cost = opt.step_and_cost(f, x, y) |
Are the arguments to the cost function restricted to by NumPy arrays? If not, can we make that restriction from now on? I currently have two ways out of a giant "NonDifferentiableError":
|
* allowing more parameters to cost function in gradient descent Issue #683 , implements the basic idea * multiple args and kwargs support for most optimizers * improved structure, return format * edit changelog * near finished version of the operators Only tests and changelog to go * update doc about provided gradient form * update test_optimize for new gradient form * testing multiple arguments, non-training args, keywords * improved changelog * linting * Update .github/CHANGELOG.md Co-authored-by: Josh Izaac <josh146@gmail.com> * changelog conform to black Co-authored-by: Josh Izaac <josh146@gmail.com> * wording change Co-authored-by: Josh Izaac <josh146@gmail.com> * wording change Co-authored-by: Josh Izaac <josh146@gmail.com> * comments on code example Co-authored-by: Josh Izaac <josh146@gmail.com> * wording change Co-authored-by: Josh Izaac <josh146@gmail.com> * Update pennylane/optimize/gradient_descent.py Co-authored-by: Josh Izaac <josh146@gmail.com> * Update pennylane/optimize/gradient_descent.py Co-authored-by: Josh Izaac <josh146@gmail.com> * Update pennylane/optimize/momentum.py Co-authored-by: Josh Izaac <josh146@gmail.com> * docs string wording Co-authored-by: Josh Izaac <josh146@gmail.com> * Update pennylane/optimize/rotosolve.py Co-authored-by: Josh Izaac <josh146@gmail.com> * Update pennylane/optimize/nesterov_momentum.py Co-authored-by: Josh Izaac <josh146@gmail.com> * Update pennylane/optimize/rotosolve.py Co-authored-by: Josh Izaac <josh146@gmail.com> * Update pennylane/optimize/adam.py Co-authored-by: Josh Izaac <josh146@gmail.com> * fix rotosolve * improve docstrings * Apply simple, local suggestions from code review Co-authored-by: antalszava <antalszava@gmail.com> Co-authored-by: Theodor <theodor@xanadu.ai> * Most code review comments implemented * black on new tests * fix nesterov momentum * actually add rotoselect kwargs this time. nesterov test * ran black on rotoselect * minor docstring fixes * name on changelog, tests in progress changing * black * test rotosolve, fix rotosolve * remove import of mocker Co-authored-by: Josh Izaac <josh146@gmail.com> Co-authored-by: antalszava <antalszava@gmail.com> Co-authored-by: Theodor <theodor@xanadu.ai>
This discussion should now be resolved with PR #959 |
Currently, the built-in PennyLane optimizers have the following restrictions:
They only accept an objective function with the signature
objective_fn(params)
params
must be a sequence (of arbitrary shape).This is due to the
autograd.grad()
function requiring, at call time, the argument indices of the trainable parameters. It was decided originally to simply hardcodeargnum=0
and restrict cost function signatures.As a result, most examples and tutorials that use autograd with the built-in optimizers use the following patterns: (a) they require the trainable parameters be a single NumPy array, and (b) if the cost function takes additional arguments, the cost function is 'wrapped' to ensure the signature is what the optimizer expects. This is sometimes done by defining a separate cost function, or simply using an anonymous function:
With the completion of the Variable refactor, however we can now avoid this issue; the user can explicitly mark parameters as differentiable or not, and the built-in
qml.grad()
function uses this to determine at call time, dynamically, the differentiable arguments:This allows us to rethink the optimizers, and increase their flexibility. For example, we could now allow the following:
This likely requires several changes:
Rather than assuming
objective_fn(x)
, the optimizers should now assumeobjective_fn(*args)
.Replace
autograd.grad(objective_fn, argnum=0)(x)
withqml.grad(objective_fn)(*args)
Modify
apply_grad(grad, x)
toapply_grad(grad, *args)
, and have it inspectrequires_grad
to determine which arguments should are trainable.We should also be able to preserve backcompatibility, e.g.,
params = opt.step(circuit, params)
, simply by applying the previous logic ifisinstance(args, qml.numpy.Tensor) and args.requires_grad = True
.The text was updated successfully, but these errors were encountered: