-
Notifications
You must be signed in to change notification settings - Fork 781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pose2SLAMExample_g2o and Pose3SLAMExample_g2o, the GN solver cannot proceed on some common benchmark datasets #846
Comments
Thanks for the report! I think you could look at the Chordal initialization papers (by Dr. Carlone), and #248, and borglab/SwiftFusion#42. I am pretty sure this is caused by bad initialization, but it would be great if you can compare the result with other solvers so we can know whether there is a problem :) |
@yiping : also worth trying to add "#define SLOW_BUT_CORRECT_BETWEENFACTOR" at the beginning of the file. That would use the correct jacobians for the between factors that should improve convergence. Let me know if this helps! |
@yiping please ask questions related to functionality on the Google Group. It should have been linked in the issue template. |
@varunagrawal : this might be a bug since the dataset is supposed to converge easily. I suggest reopening and doing some investigation. I'll be happy to help post ICRA. |
@yiping @varunagrawal : I carefully looked into this and everything looks good. The Jacobians for the Pose2 case match exactly my manual implementation in Matlab and converge to the same solution. The only issue is that with GN the cost increases after the first iteration on the intel dataset. This triggers GTSAM to terminate early. If that stopping condition is disabled, everything converges as expected. To verify what I'm saying, try setting:
in the example. Then the optimization for the Intel dataset converges to a nice and small cost (~107) after 4-5 iterations. (The downside is that if the error tolerances are disabled, then GTSAM always performs the maximum number of iterations, but I guess that's a different issue). For the Pose3 case, I'm already convinced that the Jacobians are correct as long as "SLOW_BUT_CORRECT_BETWEENFACTOR" is defined when including the between factors. At this point, if @yiping is ok, I suggest closing this issue. |
Awesome! Thanks @lucacarlone |
On that note, we should consider making progress on #88 |
@lucacarlone Thanks a lot for the investigation! (And apology for the delayed response) I confirm the result for input_INTEL_g2o dataset using GN solver (after applying the change you proposed) on a Debian 10 machine.
However, if you switch to the Levenberg-Marquardt optimizer (
I also tried the same dataset with the Ceres pose_graph_2d example (The example uses Levenberg-Marquardt solver, which is default in Ceres, I don't think Ceres has a vanilla version of GN)
The solver also does not converge after 100 iterations (see details below), and the optimized poses are off. The residual cost formulation in Ceres 2d pose graph example seems pretty-straightforward and correct, so I am guessing it's the solver? Do you think this is a case in which GN can perform better than LM?
|
That shouldn't be the case since depending on the lambda value, LM switches between Gradient Descent and GN. With the right lambda, LM would simply behave like GN. |
I think the initialization point here might be tricky: GN fails to decrease the cost from the initial point, but then it is able to recover. LM on the other side might end up taking smaller and smaller steps without making it very far from the initialization. |
I know this has been a long time, but I am looking to close outstanding issues as we converge on 4.2. From the discussion above I still have no idea as to why ceres converges but GTSAM does not. They both just implement GN/LM. Any ideas? |
Hi Frank: see my comments above. The issue for the 2D problem is that GTSAM terminates too early (since the stopping condition forces GN to stop if the cost increases) - is that stopping condition is disabled, GN works as expected. In the 3D case, the Jacobians might be the problem and enabling SLOW_BUT_CORRECT_BETWEENFACTOR should solve it. |
Yeah, I am looking at that PR also. |
Closing this based on @lucacarlone's recommendation. |
Description
The shipped pose SLAM examples (both the 2D version and the 3D version) seem to have trouble with some of the commonly used benchmark datasets that are typically considered "easy to solve". Is this expected? I understand these are all local solvers so sometimes they get stuck at local minima, but the solver quits after the 1st iteration due to error increased? Feels like something is wrong with the line search algorithm?
Also, should the example consider using the LM solver or other flavor of trust region algorithms by default?
Two example datasets can be found on Dr. Luca Carlone's website.
input_INTEL_g2o.g2o (SE2 poses)
sphere_bignoise_vertex3.g2o (SE3 poses)
Steps to reproduce
Pose2SLAMExample_g2o by default uses Gauss Newton solver and when tested with INTEL dataset, it gets stuck at the first iteration due to increased error. If I switch to the LM solver in the example, it seems to converge after 42 iterations but the result is garbage. The LM solver itself might be working just fine. I guess the poor result is very likely due to the wild information matrices in this g2o file (values ranging from the magnitude of 10^2 to magnitude of 10^12). The information matrices may cause poorly conditioned hessian, thus bad optimization result. But I would like to hear some insights from other people.
Another widely used dataset, input_M3500_g2o.g2o, works just fine with this example using the default GN solver.
Pose3SLAMExample_g2o also gets stuck at the first iteration with the sphere_bignoise_vertex3.g2o dataset (using GN solver). Note this one is not the sphere2500 dataset (similar, but with a lot more connections, so is a bit more difficult).
Expectation
This may not be a bug - I just want to better understand the situation when the examples don't quite work on some commonly seen datasets - is it expected due to the solver limitation, or something is not set up/configured correctly, or it's a bug?
Environment
Latest GTSAM version compiled on Debian 10
The text was updated successfully, but these errors were encountered: