Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scale_by_adam issue #33

Closed
inikishev opened this issue Jan 12, 2025 · 2 comments
Closed

scale_by_adam issue #33

inikishev opened this issue Jan 12, 2025 · 2 comments

Comments

@inikishev
Copy link

Hi!
I was testing orthograd and found this, scale_by_adam causes TypeError: adam_() takes 6 positional argument but 7 were given

@zero_guard("exp_avg", "exp_avg_sq")
@no_state
def scale_by_adam(group, update, grad, param, exp_avg, exp_avg_sq):
    return utils.adam_(exp_avg, exp_avg_sq, update, utils.get_beta1(group), utils.get_beta2(group), group['step'],  #
                       group['eps'])

in adam_ there is no epsilon argument

def adam_(exp_avg: List[Tensor], exp_avg_sq: List[Tensor], grad: List[Tensor], beta1: float, beta2: float, step: int):
    exp_avg, exp_avg_sq, grad = map(list_guard, (exp_avg, exp_avg_sq, grad))
    beta1, beta2, step = scalar_guard(beta1, beta2, step, exp_avg[0])
    _compilable_adam_(exp_avg, exp_avg_sq, grad, beta1, beta2, step)
    return grad

but heavyball.OrthoAdamW doesn't seem to error. Well I added a print to scale_by_adam and it never actually get called when using OrthoAdamW, only orthogonalize_grad_to_param get called. There might be something wrong there

@inikishev
Copy link
Author

inikishev commented Jan 12, 2025

Nevermind, I understand now that it changes scale_by_adam to update_by_adam, so adam does get applied.

So the issue is just with scale_by_adam. When I tried to swap C.scale_by_adam and C.orthogonalize_grad_to_param to get AdamWOrtho, it still uses scale_by_adam and causes an error

class OrthoAdamW(C.BaseOpt):
    def __init__(self, params, lr=0.0025, betas=(0.9, 0.99), eps=1e-8, weight_decay=0, warmup_steps=0,
                 foreach: bool = True, storage_dtype: str = 'float32', mars: bool = False, caution: bool = False,
                 mars_gamma: float = 0.0025, gradient_clipping: C.str_or_fn = C.use_default,
                 update_clipping: C.str_or_fn = C.use_default, palm: bool = C.use_default, beta2_scale: float = 0.8):
        defaults = locals()
        defaults.pop("self")
        params = defaults.pop("params")
        super().__init__(params, defaults, foreach, gradient_clipping, update_clipping, palm,
                          C.scale_by_adam, C.orthogonalize_grad_to_param,)

@inikishev inikishev changed the title scale_by_adam issue and last function doesn't seem to apply scale_by_adam issue Jan 12, 2025
@ClashLuke
Copy link
Collaborator

Good find, and thank you for the detailed report! I've added eps to the function in 1.5.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants