scale_by_adam issue #33

inikishev · 2025-01-12T13:57:33Z

Hi!
I was testing orthograd and found this, scale_by_adam causes TypeError: adam_() takes 6 positional argument but 7 were given

@zero_guard("exp_avg", "exp_avg_sq")
@no_state
def scale_by_adam(group, update, grad, param, exp_avg, exp_avg_sq):
    return utils.adam_(exp_avg, exp_avg_sq, update, utils.get_beta1(group), utils.get_beta2(group), group['step'],  #
                       group['eps'])

in adam_ there is no epsilon argument

def adam_(exp_avg: List[Tensor], exp_avg_sq: List[Tensor], grad: List[Tensor], beta1: float, beta2: float, step: int):
    exp_avg, exp_avg_sq, grad = map(list_guard, (exp_avg, exp_avg_sq, grad))
    beta1, beta2, step = scalar_guard(beta1, beta2, step, exp_avg[0])
    _compilable_adam_(exp_avg, exp_avg_sq, grad, beta1, beta2, step)
    return grad

but heavyball.OrthoAdamW doesn't seem to error. Well I added a print to scale_by_adam and it never actually get called when using OrthoAdamW, only orthogonalize_grad_to_param get called. There might be something wrong there

The text was updated successfully, but these errors were encountered:

inikishev · 2025-01-12T14:29:09Z

Nevermind, I understand now that it changes scale_by_adam to update_by_adam, so adam does get applied.

So the issue is just with scale_by_adam. When I tried to swap C.scale_by_adam and C.orthogonalize_grad_to_param to get AdamWOrtho, it still uses scale_by_adam and causes an error

class OrthoAdamW(C.BaseOpt):
    def __init__(self, params, lr=0.0025, betas=(0.9, 0.99), eps=1e-8, weight_decay=0, warmup_steps=0,
                 foreach: bool = True, storage_dtype: str = 'float32', mars: bool = False, caution: bool = False,
                 mars_gamma: float = 0.0025, gradient_clipping: C.str_or_fn = C.use_default,
                 update_clipping: C.str_or_fn = C.use_default, palm: bool = C.use_default, beta2_scale: float = 0.8):
        defaults = locals()
        defaults.pop("self")
        params = defaults.pop("params")
        super().__init__(params, defaults, foreach, gradient_clipping, update_clipping, palm,
                          C.scale_by_adam, C.orthogonalize_grad_to_param,)

ClashLuke · 2025-01-12T15:39:47Z

Good find, and thank you for the detailed report! I've added eps to the function in 1.5.1

inikishev changed the title ~~scale_by_adam issue and last function doesn't seem to apply~~ scale_by_adam issue Jan 12, 2025

ClashLuke closed this as completed in baea766 Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scale_by_adam issue #33

scale_by_adam issue #33

inikishev commented Jan 12, 2025

inikishev commented Jan 12, 2025 •

edited

Loading

ClashLuke commented Jan 12, 2025

scale_by_adam issue #33

scale_by_adam issue #33

Comments

inikishev commented Jan 12, 2025

inikishev commented Jan 12, 2025 • edited Loading

ClashLuke commented Jan 12, 2025

inikishev commented Jan 12, 2025 •

edited

Loading