[core] Simulation speedup (again!)
Simulation speed has been increased by at least 15%, mainly avoiding memory allocation during simulation, postponing accelerations and forces updates until successful stepper step, and not computing all terms for internal stepper steps.
- [core] Avoid algebraic loop in internal stepper steps by postponing accel/force updates after successful outer step.
- [core] Move centroidal kinematics computations in stepper outer loop for efficiency.
- [core] Assume slow variation of subtree inertia during one integration step for efficiency.
- [core] Avoid memory allocation for 'computeAcceleration', 'updateTelemetry' and 'computeAllEffort'.
- [misc] Fix double_pendulum c++ example using Eigen::Ref for controller handles instead of vectorN_t.