Bi-level optimization and Adam causing NaNs #484
Unanswered
averageFlaxUser
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a bi-level optimization where
model A
produces some output vectorz
that is used as a regularization in theloss_fn
ofmodel B
that is optimized on some image data. Assuming my implementation is correct, I noticed that whenmodel B
is optimized via Adam, gradients becomenan
but any other optimizer works fine. After some hours of digging I noticed theeps_root
hyper-parameter is set by default to0.0
. Changing this value fixes the issue. My concern is, why is this so. Is this an issue from my implementation of this is expected in some cases?Beta Was this translation helpful? Give feedback.
All reactions