-
Hi, I want to use tx = optax.contrib.mechanize(optax.chain(optax.ema(...), optax.lion(...))) Option 2: tx = optax.chain(optax.ema(...), optax.contrib.mechanize(optax.lion(...))) I would also appreciate it much, if you could share a bit of theory/background on why an option is correct, or just share any related resources. Thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Perhaps @adefazio has some insights on this one? |
Beta Was this translation helpful? Give feedback.
-
Mechanic is designed to work with any optimization algorithm as the base optimizer, so you should use the EMA within Mechanic (option 1) rather than wrapping it. The LR estimation may not work correctly if you apply the EMA via option 2, or at least the theory doesn't tell us what will happen. |
Beta Was this translation helpful? Give feedback.
Mechanic is designed to work with any optimization algorithm as the base optimizer, so you should use the EMA within Mechanic (option 1) rather than wrapping it. The LR estimation may not work correctly if you apply the EMA via option 2, or at least the theory doesn't tell us what will happen.