Reload opt_state and modify learning rate #262

borisdayma · 2021-12-09T09:13:26Z

borisdayma
Dec 9, 2021

Hi,

I'm currently saving opt_state with my model checkpoints to allow resuming training with current optimizer state.

Sometimes I'd want to also change the learning rate scheduler but couldn't find a clean way to do it.

Depending on the one I choose (constant, with warmup/decay, with gradient accumulation), I need to find the correct parameter in opt_state (can be learning rate value, step count...) and manually modify it.

It seems that even going from a constant learning rate to another constant value is not very straightforward.

Is there a cleaner way to do it?

Answered by rosshemsley

Dec 16, 2021

Hey @borisdayma

The default way to handle the learning rate is by

Pass a constant value (which is used for every update)
Pass a schedule function (in which case, optax keeps track of the number of steps elapsed, and uses the learning rate computed from the schedule function given the step count).

However, if you'd like more control over the learning rate (or any other hyperparmeter) you can put the hyperparmeters of your optimizer into the optimizer's state and then mutate the state however you would like. This is required because optax optimizers are pure functions - so the only way to dynamically change the behavior is to change the data passed in.

import numpy as np
import optax

# S…

View full answer

rosshemsley · 2021-12-16T10:15:42Z

rosshemsley
Dec 16, 2021
Maintainer

Hey @borisdayma

The default way to handle the learning rate is by

Pass a constant value (which is used for every update)
Pass a schedule function (in which case, optax keeps track of the number of steps elapsed, and uses the learning rate computed from the schedule function given the step count).

However, if you'd like more control over the learning rate (or any other hyperparmeter) you can put the hyperparmeters of your optimizer into the optimizer's state and then mutate the state however you would like. This is required because optax optimizers are pure functions - so the only way to dynamically change the behavior is to change the data passed in.

import numpy as np
import optax

# Some fake params.
params = {'w': np.zeros(10)}

# Use optax.adam, but tell optax that we'd like to move the adam hyperparameters into the optimizer's state.
opt = optax.inject_hyperparams(optax.adam)(learning_rate=1e-4)
opt_state = opt.init(params)

# We can now set the learning rate however we want by directly mutating the state.
opt_state.hyperparams['learning_rate'] = 1e-5
opt.update(params, opt_state)

# Compute updates given a different learning rate.
opt_state.hyperparams['learning_rate'] = 1e-7
opt.update(params, opt_state)

This is also how our meta learning example is able to meta-learn the optimizer's learning rate using a separate optimizer:
https://optax.readthedocs.io/en/latest/meta_learning.html

Does this help?

7 replies

PabloAMC Mar 17, 2023

Is there a way to look into an optimizer or opt_state to retrieve the current learning rate, to check that changes are implemented effectively? For example

from optax import adam
learning_rate = 1e-3

opt = adam(learning_rate = learning_rate) # outputs class `GradientTransformation`
opt_state = opt.init(params)

I have tried to look into both, but without much success.
Thanks!

ddrous Jun 13, 2023

I second @PabloAMC's question. I would like to know how to gracefully retrieve the learning rate, especially if we are using a scheduler. For instance, the following code fails since opt_state[0] and opt_state[1] are respectively ScaleByAdamState and ScaleByScheduleState instances, which all have no attribute hyperparams.

n_epochs = 500
params = {'w': np.zeros(10)}

scheduler = optax.piecewise_constant_schedule(init_value=1e-4, boundaries_and_scales={int(n_epochs*0.5):0.1})
opt = optax.adam(learning_rate=scheduler)
opt_state = opt.init(params)

opt_state[0].hyperparams['learning_rate']   # we get AttributeError

Thank you for any help on this.

Ridhamz-nd Nov 7, 2023

This is very hacky but it works. The indices and nested structure may be different based on the optimizer one selects.
This doesn't require inject_hyperparams and calls the schedule function to get the learning rate for the ith step.
I don't think this allows modifying the learning rate but I'm not sure about that.

state.tx.update.__closure__[0].cell_contents[2].__closure__[0].cell_contents.update.__closure__[0].cell_contents(i)

hericks May 26, 2024

@ddrous Note that scheduler is callable and maps step counts to values. From the ScaleByScheduleState you can obtain the count (i.e. step) and simply evaluate the scheduler.

scale_by_adam_state, scale_by_schedule_state = opt_state
learning_rate = scheduler(scale_by_schedule_state.count)

vroulet May 28, 2024
Maintainer

See #961 for a generic way to access weight decay, learning rate etc... as long as the optimizer is defined through optax.inject_hyperparams. Note that you can also use optax.tree_utils.tree_set to set some values in the state.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reload opt_state and modify learning rate #262

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 7 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Reload opt_state and modify learning rate #262

borisdayma Dec 9, 2021

Replies: 1 comment · 7 replies

rosshemsley Dec 16, 2021 Maintainer

PabloAMC Mar 17, 2023

ddrous Jun 13, 2023

Ridhamz-nd Nov 7, 2023

hericks May 26, 2024

vroulet May 28, 2024 Maintainer

borisdayma
Dec 9, 2021

Replies: 1 comment 7 replies

rosshemsley
Dec 16, 2021
Maintainer

vroulet May 28, 2024
Maintainer