An attempt to fix the current CMAES inconsistencies #351

SuvarshaChennareddy · 2023-01-02T06:20:24Z

I've attempted to fix the inconsistencies listed out in issue #70 with a couple of other things:

I noticed that this implementation of CMAES does not handle parameters (decision variables) that are represented differently, in a similar fashion. More specifically, parameters that are represented as a column vector are treated differently from the SAME parameters that are represented as a row vector (with appropriate changes in the fitness/loss function of course). That is to say, if there existed P_row and P_col (which are equivalent except in the form that they are represented in) and L_row and L_col (which are respective fitness functions where L_row(P_row) = L_col(P_col)), they do not converge to the same result with the current implementation of CMAES. This wasn't very difficult to solve considering there was a transposed space (for example, row-wise representation) along with the space (column wise representation) itself.
Another error I noticed was the update step for p_sigma. Here it was considered that C^(-1/2), where C is the covariance matrix, was equivalent to L^T where L is the matrix satisfying C = L*L^T (Cholesky Decomposition). I'm not sure the way I fixed it is the best way to go about it (but I guess it works for now?).
I added Transformation policies (Boundary Box Constraint and Empty Transformation). I used Pull Request Fix for CMA-ES inconsistencies. #193 by @gaurav-singh1998 as a foundation.

SuvarshaChennareddy · 2023-01-02T08:30:08Z

I haven't updated callbacks_test.cpp yet. I'm not sure if I should take the initial step size as a parameter in Optimize(...).

rcurtin

Thanks @SuvarshaChennareddy for looking into these things! I took a look through and left a bunch of comments. :)

I noticed that this implementation of CMAES does not handle parameters (decision variables) that are represented differently, in a similar fashion. More specifically, parameters that are represented as a column vector are treated differently from the SAME parameters that are represented as a row vector (with appropriate changes in the fitness/loss function of course). That is to say, if there existed P_row and P_col (which are equivalent except in the form that they are represented in) and L_row and L_col (which are respective fitness functions where L_row(P_row) = L_col(P_col)), they do not converge to the same result with the current implementation of CMAES. This wasn't very difficult to solve considering there was a transposed space (for example, row-wise representation) along with the space (column wise representation) itself.

This might be a good idea to add as a test case, to ensure that you get the same result either way.

I didn't see where exactly this was fixed in the code, though---could you point me towards where those changes are? Maybe I overlooked it.

I haven't updated callbacks_test.cpp yet. I'm not sure if I should take the initial step size as a parameter in Optimize(...).

I left a comment in the PR, but step size should be a constructor parameter to match all the other optimizers we have implemented. I suspect the adaptation of callbacks_test.cpp will be basically trivial, like the other test changes you made, once that's changed. 👍

Let me know if I can clarify any of my comments. I haven't gone through the exact changes completely in detail yet, so I still need to check the mathematical details. I figured let's get the general design right first. :)

include/ensmallen_bits/cmaes/cmaes.hpp

tests/cmaes_test.cpp

include/ensmallen_bits/cmaes/transformation_policies/empty_transformation.hpp

include/ensmallen_bits/cmaes/transformation_policies/boundary_box_constraint.hpp

rcurtin · 2023-01-09T16:48:27Z

include/ensmallen_bits/cmaes/cmaes_impl.hpp

    }

    // Update Step Size.
    if (iterate.n_rows > iterate.n_cols)
    {
      ps[idx1] = (1 - cs) * ps[idx0] + std::sqrt(
-          cs * (2 - cs) * muEffective) * covLower.t() * step;


If the goal is the compute C^{-1/2}, it should suffice to use inv(covLower): https://math.stackexchange.com/questions/1230051/inverse-square-root-of-matrix

I think Armadillo will solve the system more quickly too if you specify trimatl() (since it is a lower triangular matrix): inv(trimatl(covLower)).

I haven't checked this exactly, but it should at least be in the right direction. You should verify my comment here instead of trusting it to be correct... 😄

In Nikolaus Hansen's The CMA Evolution Strategy: A Tutorial C^{-1/2} is defined to be B * D^{-1} * B^T which isn't always equal to the inverse of covLower = B * D^{1/2}.

My concern here really comes from the fact that we are computing both the Cholesky decomposition and the eigendecomposition. I'd prefer to use one or the other, because these tend to be expensive operations. However, I haven't worked out the math to either (a) express C^{-1/2} in terms of the Cholesky decomposition or (b) express the earlier operations that use covLower in terms of the eigendecomposition.

I understand and agree. I haven't had much time to think about this either but I'll try to figure something out soon.

SuvarshaChennareddy · 2023-01-10T19:05:52Z

Thank you @rcurtin for the comments, I'll definitely look into them. Please let me know if anything doesn't make sense or needs further clarification.

SuvarshaChennareddy · 2023-02-01T17:51:46Z

@rcurtin @zoq Let me know if there is anything else that that you all would like me to change or add.

conradsnicta · 2023-02-08T00:24:49Z

@rcurtin is this ok to merge?

rcurtin · 2023-02-08T00:51:19Z

I haven't had a chance to come back to it---it still needs a pretty complete review (sorry @SuvarshaChennareddy! Life has been busy).

SuvarshaChennareddy · 2023-02-08T08:40:18Z

No worries!

mlpack-bot · 2023-03-10T08:59:46Z

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

SuvarshaChennareddy · 2023-03-13T02:03:58Z

Keep open

SuvarshaChennareddy · 2023-03-20T03:58:31Z

Can we keep this open?

SuvarshaChennareddy · 2023-03-20T03:59:13Z

@rcurtin @zoq

zoq · 2023-04-03T03:37:35Z

Reopened.

zoq

Looks good to me.

rcurtin

Sorry that I have been so slow to respond on this. It can't be fun on your side waiting for these comments that take so long.

The implementation itself seems fine to me (or at least I have run out of time to check it more deeply), but there are a couple of things that we do have to handle before merge, which should be easy:

The documentation in doc/optimizers.md needs to be updated. Take a look at the CMAES section; it should be fairly easy to adapt. You don't have to worry about reverse compatibility here---that's only so that old code compiles; we don't need to advertise deprecated support.
Add an update to HISTORY.md so that something about this PR is in the release notes. :)
It seems like the CI workers are not working correctly right now, but I am not able to build this branch on my system. When I try to build the tests, I get several compilation errors. Most of them look to be relatively simple to fix.

I think that's it. I had a few other little comments throughout.

include/ensmallen_bits/cmaes/cmaes.hpp

include/ensmallen_bits/cmaes/cmaes_impl.hpp

rcurtin · 2023-04-03T19:58:57Z

include/ensmallen_bits/cmaes/cmaes_impl.hpp

    }

    // Update Step Size.
    if (iterate.n_rows > iterate.n_cols)
    {
      ps[idx1] = (1 - cs) * ps[idx0] + std::sqrt(
-          cs * (2 - cs) * muEffective) * covLower.t() * step;


My concern here really comes from the fact that we are computing both the Cholesky decomposition and the eigendecomposition. I'd prefer to use one or the other, because these tend to be expensive operations. However, I haven't worked out the math to either (a) express C^{-1/2} in terms of the Cholesky decomposition or (b) express the earlier operations that use covLower in terms of the eigendecomposition.

zoq · 2023-07-11T01:52:03Z

@mlpack-jenkins test this please

zoq · 2023-07-11T01:52:26Z

To trigger the CPU test suite.

SuvarshaChennareddy · 2023-07-11T07:24:39Z

That's weird, the accuracy received from the failed test was shown to be 50 percent. It's almost as if CMAES couldn't find a path to a local/global minimum during optimization.
This wasn't the case with other test cases tho :/.

zoq · 2023-07-11T11:43:36Z

Details

It's not the first time I have seen this, perhaps it just needs some more steps to find a solution. Let me rerun the test.

zoq · 2023-07-11T11:43:44Z

@mlpack-jenkins test this please

SuvarshaChennareddy · 2023-07-11T13:57:43Z

Hmm...This time it wasn't CMAES that failed.

zoq · 2023-07-11T13:59:33Z

Right, and we test it 8 times in a row with different random seeds, do you mind to increase the number of iterations for the CMAESLogisticRegressionFMatTest test?

zoq · 2023-07-12T13:04:37Z

@SuvarshaChennareddy see comment above, let me know, I can quickly push a patch as well.

SuvarshaChennareddy · 2023-07-12T14:27:34Z

@zoq, I'll update the hyper-parameters once more. I don't think just changing the number of iterations will affect the results.

SuvarshaChennareddy · 2023-07-12T14:39:54Z

We could just pass in 0 as the maximum number of iterations. This would make CMAES terminate only when the overallObjective stops improving.
Increasing the population size would also help.

zoq · 2023-07-12T14:43:03Z

I still like to keep the test time low, if we set the maximum iterations to zero, we could potentially run for a long time. Let's increase the population size. The main goal is to make sure the tests actually test that the implementation is correct, if the test fails with the hyp parameters a few times, that's fine but I like to reduce the cases it fails.

SuvarshaChennareddy · 2023-07-12T14:48:25Z

Alright I'll increase the population size to 120 then.

zoq · 2023-07-12T15:02:50Z

@mlpack-jenkins test this please

SuvarshaChennareddy · 2023-07-12T17:47:56Z

Alright, I have an idea on what might be going on. I believe the step size is diverging to a relatively large value ( > 10^14). I'll add a termination condition for this. I'll also add a patience (similar to the one provided by the EarlyStopAtMinLossType Callback) to allow the optimizer to explore the search space even if the overallObjective doesn't improve immediately after a generation.

zoq · 2023-07-12T17:49:06Z

Sounds good, thanks.

SuvarshaChennareddy · 2023-07-12T21:09:30Z

@mlpack-jenkins test this please

SuvarshaChennareddy · 2023-07-13T05:54:41Z

@zoq the tests passed

zoq · 2023-07-17T14:45:31Z

@rcurtin anything else you want to mention here, or can we merge this? I'll open another PR to fix compiler check complaints.

rcurtin

Nothing else from my side, sorry for the slight holdup! Great to see this finally get across the finish line 👍

zoq · 2023-07-24T04:01:27Z

Thanks for putting this together.

rcurtin · 2023-07-24T07:03:23Z

Do we want to make a release now? (also, awesome work @SuvarshaChennareddy, great to have this in :))

zoq · 2023-07-24T12:07:06Z

Do we want to make a release now? (also, awesome work @SuvarshaChennareddy, great to have this in :))

I'll open a PR to fix some compiler warnings, I think afterwards we can do another release.

attempt to fix current cmaes inconsistencies

670ebb2

mlpack-bot bot added s: needs review s: unanswered s: unlabeled labels Jan 2, 2023

update tests

f28fd18

rcurtin reviewed Jan 9, 2023

View reviewed changes

SuvarshaChennareddy added 3 commits January 17, 2023 00:52

fix style, add deprecated constructor, and update tests

39feebb

change test name

4a42820

remove unused code

21e2e89

zoq added c: optimizers and removed s: unlabeled s: unanswered labels Jan 19, 2023

mlpack-bot bot added the s: stale label Mar 10, 2023

mlpack-bot bot closed this Mar 20, 2023

rcurtin reopened this Mar 20, 2023

mlpack-bot bot closed this Mar 27, 2023

zoq reopened this Apr 3, 2023

zoq approved these changes Apr 3, 2023

View reviewed changes

rcurtin requested changes Apr 3, 2023

View reviewed changes

SuvarshaChennareddy and others added 3 commits June 14, 2023 16:26

Update hyperparameters used in tests

5623d04

add missing comment

a6fb391

Merge branch 'master' into cmaes-fix

b791390

update population size used for CMAESLogisticRegressionFMatTest

8a4e2e6

add patience and update termination conditions

8ea9d06

rcurtin approved these changes Jul 19, 2023

View reviewed changes

mlpack-bot bot removed the s: needs review label Jul 19, 2023

zoq merged commit 52e4491 into mlpack:master Jul 24, 2023

SuvarshaChennareddy mentioned this pull request Aug 18, 2023

Implementation of Active CMAES #367

Merged

An attempt to fix the current CMAES inconsistencies #351

An attempt to fix the current CMAES inconsistencies #351

Conversation

SuvarshaChennareddy commented Jan 2, 2023

SuvarshaChennareddy commented Jan 2, 2023

rcurtin left a comment

Choose a reason for hiding this comment

rcurtin Jan 9, 2023

Choose a reason for hiding this comment

SuvarshaChennareddy Jan 10, 2023 • edited Loading

Choose a reason for hiding this comment

rcurtin Apr 3, 2023

Choose a reason for hiding this comment

SuvarshaChennareddy Apr 27, 2023

Choose a reason for hiding this comment

SuvarshaChennareddy commented Jan 10, 2023

SuvarshaChennareddy commented Feb 1, 2023

conradsnicta commented Feb 8, 2023

rcurtin commented Feb 8, 2023

SuvarshaChennareddy commented Feb 8, 2023

mlpack-bot bot commented Mar 10, 2023

SuvarshaChennareddy commented Mar 13, 2023

SuvarshaChennareddy commented Mar 20, 2023

SuvarshaChennareddy commented Mar 20, 2023

zoq commented Apr 3, 2023

zoq left a comment

Choose a reason for hiding this comment

rcurtin left a comment

Choose a reason for hiding this comment

rcurtin Apr 3, 2023

Choose a reason for hiding this comment

zoq commented Jul 11, 2023

zoq commented Jul 11, 2023

SuvarshaChennareddy commented Jul 11, 2023

zoq commented Jul 11, 2023

zoq commented Jul 11, 2023

SuvarshaChennareddy commented Jul 11, 2023

zoq commented Jul 11, 2023

zoq commented Jul 12, 2023

SuvarshaChennareddy commented Jul 12, 2023

SuvarshaChennareddy commented Jul 12, 2023

zoq commented Jul 12, 2023

SuvarshaChennareddy commented Jul 12, 2023 • edited Loading

zoq commented Jul 12, 2023

SuvarshaChennareddy commented Jul 12, 2023

zoq commented Jul 12, 2023

SuvarshaChennareddy commented Jul 12, 2023

SuvarshaChennareddy commented Jul 13, 2023

zoq commented Jul 17, 2023

rcurtin left a comment

Choose a reason for hiding this comment

zoq commented Jul 24, 2023

rcurtin commented Jul 24, 2023

zoq commented Jul 24, 2023

SuvarshaChennareddy Jan 10, 2023 •

edited

Loading

SuvarshaChennareddy commented Jul 12, 2023 •

edited

Loading