Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An attempt to fix the current CMAES inconsistencies #351

Merged
merged 21 commits into from
Jul 24, 2023

Conversation

SuvarshaChennareddy
Copy link
Contributor

I've attempted to fix the inconsistencies listed out in issue #70 with a couple of other things:

  1. I noticed that this implementation of CMAES does not handle parameters (decision variables) that are represented differently, in a similar fashion. More specifically, parameters that are represented as a column vector are treated differently from the SAME parameters that are represented as a row vector (with appropriate changes in the fitness/loss function of course). That is to say, if there existed P_row and P_col (which are equivalent except in the form that they are represented in) and L_row and L_col (which are respective fitness functions where L_row(P_row) = L_col(P_col)), they do not converge to the same result with the current implementation of CMAES. This wasn't very difficult to solve considering there was a transposed space (for example, row-wise representation) along with the space (column wise representation) itself.
  2. Another error I noticed was the update step for p_sigma. Here it was considered that C^(-1/2), where C is the covariance matrix, was equivalent to L^T where L is the matrix satisfying C = L*L^T (Cholesky Decomposition). I'm not sure the way I fixed it is the best way to go about it (but I guess it works for now?).
  3. I added Transformation policies (Boundary Box Constraint and Empty Transformation). I used Pull Request Fix for CMA-ES inconsistencies. #193 by @gaurav-singh1998 as a foundation.

@SuvarshaChennareddy
Copy link
Contributor Author

I haven't updated callbacks_test.cpp yet. I'm not sure if I should take the initial step size as a parameter in Optimize(...).

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @SuvarshaChennareddy for looking into these things! I took a look through and left a bunch of comments. :)

I noticed that this implementation of CMAES does not handle parameters (decision variables) that are represented differently, in a similar fashion. More specifically, parameters that are represented as a column vector are treated differently from the SAME parameters that are represented as a row vector (with appropriate changes in the fitness/loss function of course). That is to say, if there existed P_row and P_col (which are equivalent except in the form that they are represented in) and L_row and L_col (which are respective fitness functions where L_row(P_row) = L_col(P_col)), they do not converge to the same result with the current implementation of CMAES. This wasn't very difficult to solve considering there was a transposed space (for example, row-wise representation) along with the space (column wise representation) itself.

This might be a good idea to add as a test case, to ensure that you get the same result either way.

I didn't see where exactly this was fixed in the code, though---could you point me towards where those changes are? Maybe I overlooked it.

I haven't updated callbacks_test.cpp yet. I'm not sure if I should take the initial step size as a parameter in Optimize(...).

I left a comment in the PR, but step size should be a constructor parameter to match all the other optimizers we have implemented. I suspect the adaptation of callbacks_test.cpp will be basically trivial, like the other test changes you made, once that's changed. 👍

Let me know if I can clarify any of my comments. I haven't gone through the exact changes completely in detail yet, so I still need to check the mathematical details. I figured let's get the general design right first. :)

include/ensmallen_bits/cmaes/cmaes.hpp Outdated Show resolved Hide resolved
include/ensmallen_bits/cmaes/cmaes.hpp Outdated Show resolved Hide resolved
include/ensmallen_bits/cmaes/cmaes.hpp Show resolved Hide resolved
include/ensmallen_bits/cmaes/cmaes.hpp Outdated Show resolved Hide resolved
include/ensmallen_bits/cmaes/cmaes.hpp Show resolved Hide resolved
tests/cmaes_test.cpp Outdated Show resolved Hide resolved
}

// Update Step Size.
if (iterate.n_rows > iterate.n_cols)
{
ps[idx1] = (1 - cs) * ps[idx0] + std::sqrt(
cs * (2 - cs) * muEffective) * covLower.t() * step;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is the compute C^{-1/2}, it should suffice to use inv(covLower): https://math.stackexchange.com/questions/1230051/inverse-square-root-of-matrix

I think Armadillo will solve the system more quickly too if you specify trimatl() (since it is a lower triangular matrix): inv(trimatl(covLower)).

I haven't checked this exactly, but it should at least be in the right direction. You should verify my comment here instead of trusting it to be correct... 😄

Copy link
Contributor Author

@SuvarshaChennareddy SuvarshaChennareddy Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Nikolaus Hansen's The CMA Evolution Strategy: A Tutorial C^{-1/2} is defined to be B * D^{-1} * B^T which isn't always equal to the inverse of covLower = B * D^{1/2}.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern here really comes from the fact that we are computing both the Cholesky decomposition and the eigendecomposition. I'd prefer to use one or the other, because these tend to be expensive operations. However, I haven't worked out the math to either (a) express C^{-1/2} in terms of the Cholesky decomposition or (b) express the earlier operations that use covLower in terms of the eigendecomposition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand and agree. I haven't had much time to think about this either but I'll try to figure something out soon.

@SuvarshaChennareddy
Copy link
Contributor Author

Thank you @rcurtin for the comments, I'll definitely look into them. Please let me know if anything doesn't make sense or needs further clarification.

@SuvarshaChennareddy
Copy link
Contributor Author

@rcurtin @zoq Let me know if there is anything else that that you all would like me to change or add.

@conradsnicta
Copy link
Contributor

@rcurtin is this ok to merge?

@rcurtin
Copy link
Member

rcurtin commented Feb 8, 2023

I haven't had a chance to come back to it---it still needs a pretty complete review (sorry @SuvarshaChennareddy! Life has been busy).

@SuvarshaChennareddy
Copy link
Contributor Author

No worries!

@mlpack-bot
Copy link

mlpack-bot bot commented Mar 10, 2023

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@mlpack-bot mlpack-bot bot added the s: stale label Mar 10, 2023
@SuvarshaChennareddy
Copy link
Contributor Author

Keep open

@mlpack-bot mlpack-bot bot closed this Mar 20, 2023
@SuvarshaChennareddy
Copy link
Contributor Author

Can we keep this open?

@SuvarshaChennareddy
Copy link
Contributor Author

@rcurtin @zoq

@rcurtin rcurtin reopened this Mar 20, 2023
@mlpack-bot mlpack-bot bot closed this Mar 27, 2023
@zoq zoq reopened this Apr 3, 2023
@zoq
Copy link
Member

zoq commented Apr 3, 2023

Reopened.

Copy link
Member

@zoq zoq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that I have been so slow to respond on this. It can't be fun on your side waiting for these comments that take so long.

The implementation itself seems fine to me (or at least I have run out of time to check it more deeply), but there are a couple of things that we do have to handle before merge, which should be easy:

  • The documentation in doc/optimizers.md needs to be updated. Take a look at the CMAES section; it should be fairly easy to adapt. You don't have to worry about reverse compatibility here---that's only so that old code compiles; we don't need to advertise deprecated support.
  • Add an update to HISTORY.md so that something about this PR is in the release notes. :)
  • It seems like the CI workers are not working correctly right now, but I am not able to build this branch on my system. When I try to build the tests, I get several compilation errors. Most of them look to be relatively simple to fix.

I think that's it. I had a few other little comments throughout.

include/ensmallen_bits/cmaes/cmaes.hpp Outdated Show resolved Hide resolved
include/ensmallen_bits/cmaes/cmaes.hpp Show resolved Hide resolved
include/ensmallen_bits/cmaes/cmaes.hpp Show resolved Hide resolved
include/ensmallen_bits/cmaes/cmaes.hpp Show resolved Hide resolved
include/ensmallen_bits/cmaes/cmaes_impl.hpp Outdated Show resolved Hide resolved
include/ensmallen_bits/cmaes/cmaes_impl.hpp Outdated Show resolved Hide resolved
include/ensmallen_bits/cmaes/cmaes_impl.hpp Outdated Show resolved Hide resolved
include/ensmallen_bits/cmaes/cmaes_impl.hpp Outdated Show resolved Hide resolved
include/ensmallen_bits/cmaes/cmaes_impl.hpp Outdated Show resolved Hide resolved
}

// Update Step Size.
if (iterate.n_rows > iterate.n_cols)
{
ps[idx1] = (1 - cs) * ps[idx0] + std::sqrt(
cs * (2 - cs) * muEffective) * covLower.t() * step;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern here really comes from the fact that we are computing both the Cholesky decomposition and the eigendecomposition. I'd prefer to use one or the other, because these tend to be expensive operations. However, I haven't worked out the math to either (a) express C^{-1/2} in terms of the Cholesky decomposition or (b) express the earlier operations that use covLower in terms of the eigendecomposition.

@zoq
Copy link
Member

zoq commented Jul 11, 2023

@mlpack-jenkins test this please

@zoq
Copy link
Member

zoq commented Jul 11, 2023

To trigger the CPU test suite.

@SuvarshaChennareddy
Copy link
Contributor Author

That's weird, the accuracy received from the failed test was shown to be 50 percent. It's almost as if CMAES couldn't find a path to a local/global minimum during optimization.
This wasn't the case with other test cases tho :/.

@zoq
Copy link
Member

zoq commented Jul 11, 2023

Details

It's not the first time I have seen this, perhaps it just needs some more steps to find a solution. Let me rerun the test.

@zoq
Copy link
Member

zoq commented Jul 11, 2023

@mlpack-jenkins test this please

@SuvarshaChennareddy
Copy link
Contributor Author

Hmm...This time it wasn't CMAES that failed.

@zoq
Copy link
Member

zoq commented Jul 11, 2023

Right, and we test it 8 times in a row with different random seeds, do you mind to increase the number of iterations for the CMAESLogisticRegressionFMatTest test?

@zoq
Copy link
Member

zoq commented Jul 12, 2023

@SuvarshaChennareddy see comment above, let me know, I can quickly push a patch as well.

@SuvarshaChennareddy
Copy link
Contributor Author

@zoq, I'll update the hyper-parameters once more. I don't think just changing the number of iterations will affect the results.

@SuvarshaChennareddy
Copy link
Contributor Author

We could just pass in 0 as the maximum number of iterations. This would make CMAES terminate only when the overallObjective stops improving.
Increasing the population size would also help.

@zoq
Copy link
Member

zoq commented Jul 12, 2023

I still like to keep the test time low, if we set the maximum iterations to zero, we could potentially run for a long time. Let's increase the population size. The main goal is to make sure the tests actually test that the implementation is correct, if the test fails with the hyp parameters a few times, that's fine but I like to reduce the cases it fails.

@SuvarshaChennareddy
Copy link
Contributor Author

SuvarshaChennareddy commented Jul 12, 2023

Alright I'll increase the population size to 120 then.

@zoq
Copy link
Member

zoq commented Jul 12, 2023

@mlpack-jenkins test this please

@SuvarshaChennareddy
Copy link
Contributor Author

Alright, I have an idea on what might be going on. I believe the step size is diverging to a relatively large value ( > 10^14). I'll add a termination condition for this. I'll also add a patience (similar to the one provided by the EarlyStopAtMinLossType Callback) to allow the optimizer to explore the search space even if the overallObjective doesn't improve immediately after a generation.

@zoq
Copy link
Member

zoq commented Jul 12, 2023

Sounds good, thanks.

@SuvarshaChennareddy
Copy link
Contributor Author

@mlpack-jenkins test this please

@SuvarshaChennareddy
Copy link
Contributor Author

@zoq the tests passed

@zoq
Copy link
Member

zoq commented Jul 17, 2023

@rcurtin anything else you want to mention here, or can we merge this? I'll open another PR to fix compiler check complaints.

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing else from my side, sorry for the slight holdup! Great to see this finally get across the finish line 👍

@zoq zoq merged commit 52e4491 into mlpack:master Jul 24, 2023
@zoq
Copy link
Member

zoq commented Jul 24, 2023

Thanks for putting this together.

@rcurtin
Copy link
Member

rcurtin commented Jul 24, 2023

Do we want to make a release now? (also, awesome work @SuvarshaChennareddy, great to have this in :))

@zoq
Copy link
Member

zoq commented Jul 24, 2023

Do we want to make a release now? (also, awesome work @SuvarshaChennareddy, great to have this in :))

I'll open a PR to fix some compiler warnings, I think afterwards we can do another release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants