Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing more flexible cost functions for optimizers #959

Merged
merged 49 commits into from
Jan 6, 2021
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
fe79cc6
allowing more parameters to cost function in gradient descent
albi3ro Dec 9, 2020
9362b82
multiple args and kwargs support for most optimizers
albi3ro Dec 11, 2020
5609f5d
improved structure, return format
albi3ro Dec 14, 2020
2c14788
edit changelog
albi3ro Dec 14, 2020
24d30e1
near finished version of the operators
albi3ro Dec 15, 2020
21e9e01
update doc about provided gradient form
albi3ro Dec 16, 2020
a820b55
update test_optimize for new gradient form
albi3ro Dec 16, 2020
ad0906a
testing multiple arguments, non-training args, keywords
albi3ro Dec 16, 2020
ed4ac91
improved changelog
albi3ro Dec 16, 2020
b30a552
linting
albi3ro Dec 16, 2020
cb2b328
linting
albi3ro Dec 16, 2020
5c03e65
Merge remote-tracking branch 'origin/optimize_more_parameters' into o…
albi3ro Dec 16, 2020
df0a7a8
Merge branch 'master' into optimize_more_parameters
albi3ro Dec 16, 2020
4cf358c
black formatting
albi3ro Dec 16, 2020
b9a447d
Merge remote-tracking branch 'origin/optimize_more_parameters' into o…
albi3ro Dec 16, 2020
7624f24
different black parameters
albi3ro Dec 16, 2020
5e0c40b
Merge branch 'master' into optimize_more_parameters
josh146 Dec 17, 2020
12598c8
Update .github/CHANGELOG.md
albi3ro Dec 17, 2020
4f35668
changelog conform to black
albi3ro Dec 17, 2020
193cb53
wording change
albi3ro Dec 17, 2020
095de9e
wording change
albi3ro Dec 17, 2020
9f189f9
comments on code example
albi3ro Dec 17, 2020
b4b0d71
wording change
albi3ro Dec 17, 2020
83dcf94
Update pennylane/optimize/gradient_descent.py
albi3ro Dec 17, 2020
a3c2534
Update pennylane/optimize/gradient_descent.py
albi3ro Dec 17, 2020
b4ed411
Update pennylane/optimize/momentum.py
albi3ro Dec 17, 2020
b3c3857
docs string wording
albi3ro Dec 17, 2020
29416d2
Update pennylane/optimize/rotosolve.py
albi3ro Dec 17, 2020
dfc0092
Update pennylane/optimize/nesterov_momentum.py
albi3ro Dec 17, 2020
6cc787c
Update pennylane/optimize/rotosolve.py
albi3ro Dec 17, 2020
703942d
Update pennylane/optimize/adam.py
albi3ro Dec 17, 2020
6ca34cb
fix rotosolve
albi3ro Dec 17, 2020
4854c6b
improve docstrings
albi3ro Dec 17, 2020
16c6bde
Apply simple, local suggestions from code review
albi3ro Dec 18, 2020
7ddbcbc
Most code review comments implemented
albi3ro Dec 18, 2020
c53adb7
black on new tests
albi3ro Dec 18, 2020
d9d03a9
fix nesterov momentum
albi3ro Dec 18, 2020
afd0cfa
Merge branch 'master' into optimize_more_parameters
antalszava Dec 18, 2020
751a030
Merge remote-tracking branch 'origin/optimize_more_parameters' into o…
albi3ro Dec 18, 2020
90257ba
actually add rotoselect kwargs this time. nesterov test
albi3ro Dec 18, 2020
a35d782
ran black on rotoselect
albi3ro Dec 21, 2020
033af1b
minor docstring fixes
albi3ro Dec 22, 2020
3c58644
Merge branch 'master' into optimize_more_parameters
albi3ro Dec 22, 2020
f39a839
name on changelog, tests in progress changing
albi3ro Dec 28, 2020
0eb4133
black
albi3ro Jan 4, 2021
bfe0a4b
test rotosolve, fix rotosolve
albi3ro Jan 4, 2021
c3d1e49
Merge branch 'master' into optimize_more_parameters
albi3ro Jan 4, 2021
761bfed
Merge branch 'master' into optimize_more_parameters
albi3ro Jan 6, 2021
f7e9d67
remove import of mocker
albi3ro Jan 6, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions .github/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,41 @@

<h3>New features since last release</h3>

* Optimizers allow more flexible cost functions. The cost function passed to most optimizers
may accept any combination of trainable arguments, non-trainable arguments, and keywords.
albi3ro marked this conversation as resolved.
Show resolved Hide resolved
The `requires_grad=False` property must mark any non-trainable constant argument.
The `RotoselectOptimizer` allows only keywords.
[(#959)](https://github.com/PennyLaneAI/pennylane/pull/959)

The full changes apply to:

* `AdagradOptimizer`
* `AdamOptimizer`
* `GradientDescentOptimizer`
* `MomentumOptimizer`
* `NesterovMomentumOptimizer`
* `RMSPropOptimizer`
* `RotosolveOptimizer`

Example use:
albi3ro marked this conversation as resolved.
Show resolved Hide resolved

```python
def cost(x, y, data, scale=1.0):
return scale * (x[0]-data)**2 + scale * (y-data)**2
albi3ro marked this conversation as resolved.
Show resolved Hide resolved

x = np.array([1.], requires_grad=True)
y = np.array([1.0])
data = np.array([2.], requires_grad=False)

opt = qml.GradientDescentOptimizer()
x_new, y_new, data = opt.step(cost, x, y, data, scale=0.5)

(x_new, y_new, data), value = opt.step_and_cost(cost, x, y, data, scale=0.5)

params = (x, y, data)
params = opt.step(cost, *params)
albi3ro marked this conversation as resolved.
Show resolved Hide resolved
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to add your name to contributors! (unless you have done that already, and I missed it)


* A new `qml.draw` function is available, allowing QNodes to be easily
drawn without execution by providing example input.
[(#962)](https://github.com/PennyLaneAI/pennylane/pull/962)
Expand Down Expand Up @@ -86,6 +121,7 @@
return qml.expval(qml.PauliZ(0))
```


<h3>Improvements</h3>

* A new test series, pennylane/devices/tests/test_compare_default_qubit.py, has been added, allowing to test if
Expand Down
49 changes: 37 additions & 12 deletions pennylane/optimize/adagrad.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import math

from pennylane.utils import _flatten, unflatten
from pennylane.numpy import ndarray, tensor
from .gradient_descent import GradientDescentOptimizer


Expand Down Expand Up @@ -51,7 +52,7 @@ def __init__(self, stepsize=0.01, eps=1e-8):
self.eps = eps
self.accumulation = None

def apply_grad(self, grad, x):
def apply_grad(self, grad, args):
albi3ro marked this conversation as resolved.
Show resolved Hide resolved
r"""Update the variables x to take a single optimization step. Flattens and unflattens
the inputs to maintain nested iterables as the parameters of the optimization.

Expand All @@ -63,21 +64,45 @@ def apply_grad(self, grad, x):
Returns:
array: the new values :math:`x^{(t+1)}`
"""

x_flat = _flatten(x)
grad_flat = list(_flatten(grad))
args_new = list(args)

if self.accumulation is None:
self.accumulation = [g * g for g in grad_flat]
else:
self.accumulation = [a + g * g for a, g in zip(self.accumulation, grad_flat)]
self.accumulation = [None] * len(args)

trained_index = 0
for index, arg in enumerate(args):
if getattr(arg, "requires_grad", True):
x_flat = _flatten(arg)
grad_flat = list(_flatten(grad[trained_index]))

self._update_accumulation(index, grad_flat)

x_new_flat = [
e - (self._stepsize / math.sqrt(a + self.eps)) * g
for a, g, e in zip(self.accumulation[index], grad_flat, x_flat)
]

args_new[index] = unflatten(x_new_flat, arg)

x_new_flat = [
e - (self._stepsize / math.sqrt(a + self.eps)) * g
for a, g, e in zip(self.accumulation, grad_flat, x_flat)
]
if isinstance(arg, ndarray):
args_new[index] = args_new[index].view(tensor)
args_new[index].requires_grad = True

return unflatten(x_new_flat, x)
return args_new

def _update_accumulation(self, index, grad_flat):
r"""Update the accumulation at index with gradient

Args:
index (Int): location of arg to update
albi3ro marked this conversation as resolved.
Show resolved Hide resolved
grad_flat (list): flattened list form of gradient
"""
if self.accumulation[index] is None:
self.accumulation[index] = [g * g for g in grad_flat]
else:
self.accumulation[index] = [
a + g * g for a, g in zip(self.accumulation[index], grad_flat)
]

def reset(self):
"""Reset optimizer by erasing memory of past steps."""
Expand Down
73 changes: 50 additions & 23 deletions pennylane/optimize/adam.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import math

from pennylane.utils import _flatten, unflatten
from pennylane.numpy import ndarray, tensor
from .gradient_descent import GradientDescentOptimizer


Expand Down Expand Up @@ -61,7 +62,7 @@ def __init__(self, stepsize=0.01, beta1=0.9, beta2=0.99, eps=1e-8):
self.sm = None
self.t = 0

def apply_grad(self, grad, x):
def apply_grad(self, grad, args):
albi3ro marked this conversation as resolved.
Show resolved Hide resolved
r"""Update the variables x to take a single optimization step. Flattens and unflattens
the inputs to maintain nested iterables as the parameters of the optimization.

Expand All @@ -73,37 +74,63 @@ def apply_grad(self, grad, x):
Returns:
array: the new values :math:`x^{(t+1)}`
"""

args_new = list(args)
albi3ro marked this conversation as resolved.
Show resolved Hide resolved
self.t += 1

grad_flat = list(_flatten(grad))
x_flat = _flatten(x)
# Update step size (instead of correcting for bias)
new_stepsize = (
self._stepsize * math.sqrt(1 - self.beta2 ** self.t) / (1 - self.beta1 ** self.t)
)

# Update first moment
if self.fm is None:
self.fm = grad_flat
else:
self.fm = [self.beta1 * f + (1 - self.beta1) * g for f, g in zip(self.fm, grad_flat)]
self.fm = [None] * len(args)

# Update second moment
if self.sm is None:
self.sm = [g * g for g in grad_flat]
else:
self.sm = [
self.beta2 * f + (1 - self.beta2) * g * g for f, g in zip(self.sm, grad_flat)
]
self.sm = [None] * len(args)

# Update step size (instead of correcting for bias)
new_stepsize = (
self._stepsize * math.sqrt(1 - self.beta2 ** self.t) / (1 - self.beta1 ** self.t)
)
trained_index = 0
for index, arg in enumerate(args):
if getattr(arg, "requires_grad", True):
x_flat = _flatten(arg)
grad_flat = list(_flatten(grad[trained_index]))
trained_index += 1

x_new_flat = [
e - new_stepsize * f / (math.sqrt(s) + self.eps)
for f, s, e in zip(self.fm, self.sm, x_flat)
]
self._update_moments(index, grad_flat)

return unflatten(x_new_flat, x)
x_new_flat = [
e - new_stepsize * f / (math.sqrt(s) + self.eps)
for f, s, e in zip(self.fm[index], self.sm[index], x_flat)
]
args_new[index] = unflatten(x_new_flat, arg)

if isinstance(arg, ndarray):
args_new[index] = args_new[index].view(tensor)
args_new[index].requires_grad = True

return args_new

def _update_moments(self, index, grad_flat):
r"""Update the moments

Args:
index (Int): the index of the trainable argument to update out of trainable params
albi3ro marked this conversation as resolved.
Show resolved Hide resolved
grad_flat (list): the flattened gradient for that trainable param
"""
# update first moment
if self.fm[index] is None:
self.fm[index] = grad_flat
else:
self.fm[index] = [
self.beta1 * f + (1 - self.beta1) * g for f, g in zip(self.fm[index], grad_flat)
]

# update second moment
if self.sm[index] is None:
self.sm[index] = [g * g for g in grad_flat]
else:
self.sm[index] = [
self.beta2 * f + (1 - self.beta2) * g * g for f, g in zip(self.sm[index], grad_flat)
]

def reset(self):
"""Reset optimizer by erasing memory of past steps."""
Expand Down
67 changes: 46 additions & 21 deletions pennylane/optimize/gradient_descent.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@

from pennylane._grad import grad as get_gradient
from pennylane.utils import _flatten, unflatten
from pennylane.numpy import ndarray, tensor


class GradientDescentOptimizer:
Expand Down Expand Up @@ -47,86 +48,110 @@ def update_stepsize(self, stepsize):
"""
self._stepsize = stepsize

def step_and_cost(self, objective_fn, x, grad_fn=None):
def step_and_cost(self, objective_fn, *args, grad_fn=None, **kwargs):
"""Update x with one step of the optimizer and return the corresponding objective
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Update x with one step of the optimizer and return the corresponding objective
"""Update args with one step of the optimizer and return the corresponding objective

function value prior to the step.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few xs in some docstrings that should be updated to args with requires_grad=True.

Suggested change
"""Update x with one step of the optimizer and return the corresponding objective
function value prior to the step.
"""Update differentiable arguments with one step of the optimizer and return the corresponding objective
function value prior to the step.


Args:
objective_fn (function): the objective function for optimization
x (array): NumPy array containing the current values of the variables to be updated
*args : Variable length argument list for objective function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*args : Variable length argument list for objective function
*args (list): argument list for objective function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So *args , with the star, is neither a tuple nor a list, its an unpacked tuple.

I've looked up examples of how to document *args and **kwargs in docstrings. At least for this "google style", they don't specify a particular type for *args and **kwargs, just what they do/ get used for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. My bad! Then this is probably the right way to do it. Feel free to mark all of these comments as resolved. 🙂

grad_fn (function): Optional gradient function of the
objective function with respect to the variables ``x``.
If ``None``, the gradient function is computed automatically.
Must match shape of autograd derivative.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nicer to clarify a bit better what shape means in this case. E.g. "will always be a tuple containing ...".

**kwargs : Variable length of keywords for the cost function
albi3ro marked this conversation as resolved.
Show resolved Hide resolved

Returns:
tuple: the new variable values :math:`x^{(t+1)}` and the objective function output
prior to the step
albi3ro marked this conversation as resolved.
Show resolved Hide resolved
"""

g, forward = self.compute_grad(objective_fn, x, grad_fn=grad_fn)
x_out = self.apply_grad(g, x)
g, forward = self.compute_grad(objective_fn, args, kwargs, grad_fn=grad_fn)
new_args = self.apply_grad(g, args)

if forward is None:
forward = objective_fn(x)
forward = objective_fn(*args, **kwargs)

return x_out, forward
if len(new_args) == 1:
return new_args[0], forward
albi3ro marked this conversation as resolved.
Show resolved Hide resolved
return new_args, forward
albi3ro marked this conversation as resolved.
Show resolved Hide resolved

def step(self, objective_fn, x, grad_fn=None):
def step(self, objective_fn, *args, grad_fn=None, **kwargs):
"""Update x with one step of the optimizer.

Args:
objective_fn (function): the objective function for optimization
x (array): NumPy array containing the current values of the variables to be updated
*args : Variable length argument list for objective function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*args : Variable length argument list for objective function
*args (list): argument list for objective function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args itself is a tuple, but *args itself is an unwrapped tuple and its components are not confined to a particular type.

grad_fn (function): Optional gradient function of the
objective function with respect to the variables ``x``.
If ``None``, the gradient function is computed automatically.
Must match shape of autograd derivative.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than saying 'must match shape of...', perhaps it's better to explicitly say what the output of a custom grad function should look like?

E.g., "The provided grad function must have output of shape ..."

**kwargs : Variable length of keywords for the cost function
albi3ro marked this conversation as resolved.
Show resolved Hide resolved

Returns:
array: the new variable values :math:`x^{(t+1)}`
"""
g, _ = self.compute_grad(objective_fn, x, grad_fn=grad_fn)
x_out = self.apply_grad(g, x)

return x_out
g, _ = self.compute_grad(objective_fn, args, kwargs, grad_fn=grad_fn)
new_args = self.apply_grad(g, args)

if len(new_args) == 1:
return new_args[0]
albi3ro marked this conversation as resolved.
Show resolved Hide resolved

return new_args

@staticmethod
def compute_grad(objective_fn, x, grad_fn=None):
def compute_grad(objective_fn, args, kwargs, grad_fn=None):
albi3ro marked this conversation as resolved.
Show resolved Hide resolved
r"""Compute gradient of the objective_fn at the point x and return it along with the
objective function forward pass (if available).

Args:
objective_fn (function): the objective function for optimization
x (array): NumPy array containing the current values of the variables to be updated
args (tuple(array)): Tuple of NumPy arrays containing the current values for the
objection function
albi3ro marked this conversation as resolved.
Show resolved Hide resolved
kwargs (dict): Keywords for the cost function
albi3ro marked this conversation as resolved.
Show resolved Hide resolved
grad_fn (function): Optional gradient function of the objective function with respect to
the variables ``x``. If ``None``, the gradient function is computed automatically.
Must match shape of autograd derivative.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than saying 'must match shape of...', perhaps it's better to explicitly say what the output of a custom grad function should look like?

E.g., "The provided grad function must have output of shape ..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just struggling with a good way to describe the form. (df/d(trainable argument 0), df/d(trainable argument 1), ... ) just doesn't look very nice

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

The output of the `grad_fn` function must be a sequence of shape `(num_trainable_args,)`.


Returns:
tuple: The NumPy array containing the gradient :math:`\nabla f(x^{(t)})` and the
objective function output. If ``grad_fn`` is provided, the objective function
will not be evaluted and instead ``None`` will be returned.
"""
g = get_gradient(objective_fn) if grad_fn is None else grad_fn
grad = g(x)
grad = g(*args, **kwargs)
forward = getattr(g, "forward", None)

return grad, forward

def apply_grad(self, grad, x):
r"""Update the variables x to take a single optimization step. Flattens and unflattens
def apply_grad(self, grad, args):
albi3ro marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some feedback from the hackathon based upon a submission that uses compute_grad and apply_grad: they had some confusion because they needed to switch from passing x=params to args=[params] and also (for compute_grad) they needed to explicitly pass kwargs={}.

Should we consider making these methods be similar to step(*args, **kwargs)? Or, alternatively just make these hidden methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trbromley I tend to agree, especially if this would be more of a user-facing method (not sure how much of an issue it would be otherwise). There was a brief discussion re. this when this was under review. Perhaps just making them hidden by preceding with an underscore would be good here. 🤔 @albi3ro @josh146

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat ambivalent here. We definitely considered compute_grad and apply_grad 'private' methods when first designing the optimizers, as they were not intended to be used as the public optimizer API.

But it probably makes more sense to keep them public, and standardize the signatures, since the fact that users were finding and using them suggests they add value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there's a larger demand from users for this function, it'd be inefficient to unpack and repack the values. It would add operations for no real purpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favour of 'doing nothing' for now unless there is more feedback and demand.

r"""Update the variables to take a single optimization step. Flattens and unflattens
the inputs to maintain nested iterables as the parameters of the optimization.

Args:
grad (array): The gradient of the objective
function at point :math:`x^{(t)}`: :math:`\nabla f(x^{(t)})`
x (array): the current value of the variables :math:`x^{(t)}`
x (tuple(array)): the current value of the variables :math:`x^{(t)}`

Returns:
array: the new values :math:`x^{(t+1)}`
"""
args_new = list(args)

trained_index = 0
for index, arg in enumerate(args):
if getattr(arg, "requires_grad", True):
albi3ro marked this conversation as resolved.
Show resolved Hide resolved
x_flat = _flatten(arg)
grad_flat = _flatten(grad[trained_index])
trained_index += 1

x_new_flat = [e - self._stepsize * g for g, e in zip(grad_flat, x_flat)]

x_flat = _flatten(x)
grad_flat = _flatten(grad)
args_new[index] = unflatten(x_new_flat, args[index])

x_new_flat = [e - self._stepsize * g for g, e in zip(grad_flat, x_flat)]
if isinstance(arg, ndarray):
args_new[index] = args_new[index].view(tensor)
args_new[index].requires_grad = True
albi3ro marked this conversation as resolved.
Show resolved Hide resolved

return unflatten(x_new_flat, x)
return args_new
Loading