-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An attempt to fix the current CMAES inconsistencies #351
Conversation
I haven't updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @SuvarshaChennareddy for looking into these things! I took a look through and left a bunch of comments. :)
I noticed that this implementation of CMAES does not handle parameters (decision variables) that are represented differently, in a similar fashion. More specifically, parameters that are represented as a column vector are treated differently from the SAME parameters that are represented as a row vector (with appropriate changes in the fitness/loss function of course). That is to say, if there existed P_row and P_col (which are equivalent except in the form that they are represented in) and L_row and L_col (which are respective fitness functions where L_row(P_row) = L_col(P_col)), they do not converge to the same result with the current implementation of CMAES. This wasn't very difficult to solve considering there was a transposed space (for example, row-wise representation) along with the space (column wise representation) itself.
This might be a good idea to add as a test case, to ensure that you get the same result either way.
I didn't see where exactly this was fixed in the code, though---could you point me towards where those changes are? Maybe I overlooked it.
I haven't updated callbacks_test.cpp yet. I'm not sure if I should take the initial step size as a parameter in Optimize(...).
I left a comment in the PR, but step size should be a constructor parameter to match all the other optimizers we have implemented. I suspect the adaptation of callbacks_test.cpp will be basically trivial, like the other test changes you made, once that's changed. 👍
Let me know if I can clarify any of my comments. I haven't gone through the exact changes completely in detail yet, so I still need to check the mathematical details. I figured let's get the general design right first. :)
include/ensmallen_bits/cmaes/transformation_policies/empty_transformation.hpp
Outdated
Show resolved
Hide resolved
include/ensmallen_bits/cmaes/transformation_policies/empty_transformation.hpp
Outdated
Show resolved
Hide resolved
include/ensmallen_bits/cmaes/transformation_policies/boundary_box_constraint.hpp
Outdated
Show resolved
Hide resolved
} | ||
|
||
// Update Step Size. | ||
if (iterate.n_rows > iterate.n_cols) | ||
{ | ||
ps[idx1] = (1 - cs) * ps[idx0] + std::sqrt( | ||
cs * (2 - cs) * muEffective) * covLower.t() * step; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the goal is the compute C^{-1/2}
, it should suffice to use inv(covLower)
: https://math.stackexchange.com/questions/1230051/inverse-square-root-of-matrix
I think Armadillo will solve the system more quickly too if you specify trimatl()
(since it is a lower triangular matrix): inv(trimatl(covLower))
.
I haven't checked this exactly, but it should at least be in the right direction. You should verify my comment here instead of trusting it to be correct... 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Nikolaus Hansen's The CMA Evolution Strategy: A Tutorial C^{-1/2}
is defined to be B * D^{-1} * B^T
which isn't always equal to the inverse of covLower = B * D^{1/2}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern here really comes from the fact that we are computing both the Cholesky decomposition and the eigendecomposition. I'd prefer to use one or the other, because these tend to be expensive operations. However, I haven't worked out the math to either (a) express C^{-1/2}
in terms of the Cholesky decomposition or (b) express the earlier operations that use covLower
in terms of the eigendecomposition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand and agree. I haven't had much time to think about this either but I'll try to figure something out soon.
Thank you @rcurtin for the comments, I'll definitely look into them. Please let me know if anything doesn't make sense or needs further clarification. |
@rcurtin is this ok to merge? |
I haven't had a chance to come back to it---it still needs a pretty complete review (sorry @SuvarshaChennareddy! Life has been busy). |
No worries! |
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍 |
Keep open |
Can we keep this open? |
Reopened. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that I have been so slow to respond on this. It can't be fun on your side waiting for these comments that take so long.
The implementation itself seems fine to me (or at least I have run out of time to check it more deeply), but there are a couple of things that we do have to handle before merge, which should be easy:
- The documentation in
doc/optimizers.md
needs to be updated. Take a look at theCMAES
section; it should be fairly easy to adapt. You don't have to worry about reverse compatibility here---that's only so that old code compiles; we don't need to advertise deprecated support. - Add an update to
HISTORY.md
so that something about this PR is in the release notes. :) - It seems like the CI workers are not working correctly right now, but I am not able to build this branch on my system. When I try to build the tests, I get several compilation errors. Most of them look to be relatively simple to fix.
I think that's it. I had a few other little comments throughout.
} | ||
|
||
// Update Step Size. | ||
if (iterate.n_rows > iterate.n_cols) | ||
{ | ||
ps[idx1] = (1 - cs) * ps[idx0] + std::sqrt( | ||
cs * (2 - cs) * muEffective) * covLower.t() * step; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern here really comes from the fact that we are computing both the Cholesky decomposition and the eigendecomposition. I'd prefer to use one or the other, because these tend to be expensive operations. However, I haven't worked out the math to either (a) express C^{-1/2}
in terms of the Cholesky decomposition or (b) express the earlier operations that use covLower
in terms of the eigendecomposition.
@mlpack-jenkins test this please |
To trigger the CPU test suite. |
That's weird, the accuracy received from the failed test was shown to be 50 percent. It's almost as if CMAES couldn't find a path to a local/global minimum during optimization. |
It's not the first time I have seen this, perhaps it just needs some more steps to find a solution. Let me rerun the test. |
@mlpack-jenkins test this please |
Hmm...This time it wasn't CMAES that failed. |
Right, and we test it 8 times in a row with different random seeds, do you mind to increase the number of iterations for the |
@SuvarshaChennareddy see comment above, let me know, I can quickly push a patch as well. |
@zoq, I'll update the hyper-parameters once more. I don't think just changing the number of iterations will affect the results. |
We could just pass in |
I still like to keep the test time low, if we set the maximum iterations to zero, we could potentially run for a long time. Let's increase the population size. The main goal is to make sure the tests actually test that the implementation is correct, if the test fails with the hyp parameters a few times, that's fine but I like to reduce the cases it fails. |
Alright I'll increase the population size to 120 then. |
@mlpack-jenkins test this please |
Alright, I have an idea on what might be going on. I believe the step size is diverging to a relatively large value ( > 10^14). I'll add a termination condition for this. I'll also add a patience (similar to the one provided by the EarlyStopAtMinLossType Callback) to allow the optimizer to explore the search space even if the overallObjective doesn't improve immediately after a generation. |
Sounds good, thanks. |
@mlpack-jenkins test this please |
@zoq the tests passed |
@rcurtin anything else you want to mention here, or can we merge this? I'll open another PR to fix compiler check complaints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing else from my side, sorry for the slight holdup! Great to see this finally get across the finish line 👍
Thanks for putting this together. |
Do we want to make a release now? (also, awesome work @SuvarshaChennareddy, great to have this in :)) |
I'll open a PR to fix some compiler warnings, I think afterwards we can do another release. |
I've attempted to fix the inconsistencies listed out in issue #70 with a couple of other things:
P_row
andP_col
(which are equivalent except in the form that they are represented in) andL_row
andL_col
(which are respective fitness functions whereL_row(P_row) = L_col(P_col)
), they do not converge to the same result with the current implementation of CMAES. This wasn't very difficult to solve considering there was a transposed space (for example, row-wise representation) along with the space (column wise representation) itself.p_sigma
. Here it was considered thatC^(-1/2)
, where C is the covariance matrix, was equivalent toL^T
where L is the matrix satisfyingC = L*L^T
(Cholesky Decomposition). I'm not sure the way I fixed it is the best way to go about it (but I guess it works for now?).