-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making GIBBON robust to numerical instability #1814
Conversation
This pull request was exported from Phabricator. Differential Revision: D45584690 |
Codecov Report
@@ Coverage Diff @@
## main #1814 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 170 170
Lines 14913 14915 +2
=========================================
+ Hits 14913 14915 +2
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Summary: Pull Request resolved: pytorch#1814 When using GIBBON for batch acquisition, numerical instabilities can be encountered that lead a quantity (`V_determinant`) that is mathematically positive to be numerically negative. Taking the logarithm of this quantity then leads to NaNs in the information gain tensor, which in turn leads to optimization failures. This commit clamps a term in the expression for `V_determinant` so that the resultant quantity is positive, while ensuring that gradients continue to flow through the backward pass to all relevant variables. Differential Revision: D45584690 fbshipit-source-id: 3fe500e98192a054833a5f87389171d9d6a69237
4b553b7
to
8e12b7d
Compare
This pull request was exported from Phabricator. Differential Revision: D45584690 |
@henrymoss Hi Henry, Sebastian "from GitHub" here :) I encountered some NaNs while using GIBBON for batch optimization and traced it to |
Hello Sebastian "from GitHub". I guess the issue came after a fair few optimisation steps? Once we are quite confident where the optimum is, then the batch gets very concentrated and we run into numerical precision issues with the ratio of the pdf/cdf in the GIBBON calculation (I think). I think clamping makes sense, but I would keep the lower limit as small as possible. Towards the end of BO the actual size of information gain can be pretty tiny (hence all the other numerical hacks in the code), so you might wanna try 1e-12 or something? |
Summary: Pull Request resolved: pytorch#1814 When using GIBBON for batch acquisition, numerical instabilities can be encountered that lead a quantity (`V_determinant`) that is mathematically positive to be numerically negative. Taking the logarithm of this quantity then leads to NaNs in the information gain tensor, which in turn leads to optimization failures. This commit clamps a term in the expression for `V_determinant` so that the resultant quantity is positive, while ensuring that gradients continue to flow through the backward pass to all relevant variables. Reviewed By: Balandat Differential Revision: D45584690 fbshipit-source-id: de464213d8fa10f8bc7000e584d22ccc5ffbe641
This pull request was exported from Phabricator. Differential Revision: D45584690 |
8e12b7d
to
43d24c4
Compare
Thanks for the feedback Henry! I just updated the clamp to use
As far as I can tell, This occurred mostly after a couple of iterations, but I've even observed it in the second batch on a 12-dimensional problem after initializing with 24 Sobol points and with a candidate set consisting of 1024 points. Maybe the dimensionality also has an effect on this. In any case, works like a charm now! |
This pull request has been merged in f39bde4. |
Summary:
When using GIBBON for batch acquisition, numerical instabilities can be encountered that lead a quantity (
V_determinant
) that is mathematically positive to be numerically negative. Taking the logarithm of this quantity then leads to NaNs in the information gain tensor, which in turn leads to optimization failures.This commit clamps a term in the expression for
V_determinant
so that the resultant quantity is positive, while ensuring that gradients continue to flow through the backward pass to all relevant variables.Differential Revision: D45584690