Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues on Windows that didn't exist before #10693

Closed
MaEtUgR opened this issue Oct 16, 2018 · 11 comments
Closed

Performance issues on Windows that didn't exist before #10693

MaEtUgR opened this issue Oct 16, 2018 · 11 comments
Assignees
Labels

Comments

@MaEtUgR
Copy link
Member

MaEtUgR commented Oct 16, 2018

Describe the bug
Inconsistent and unexpected simulation performance issues when running SITL jMAVsim on Windows that end up in messages like this and very small to huge visible twitches in the simulation result. It can even lead to loss of global position and a crash of the simulated vehicle.

ERROR [sensors] Accel #0 fail:  TIMEOUT!
ERROR [sensors] Sensor Accel #0 failed. Reconfiguring sensor priorities.
WARN  [sensors] Remaining sensors after failover event 0: Accel #0 priority: 1
WARN  [sensors] Remaining sensors after failover event 0: Accel #1 priority: 1
ERROR [sensors] Gyro #0 fail:  TIMEOUT!
ERROR [sensors] Sensor Gyro #0 failed. Reconfiguring sensor priorities.
WARN  [sensors] Remaining sensors after failover event 0: Gyro #0 priority: 1
WARN  [ekf2] accel id changed, resetting IMU bias
WARN  [ekf2] accel id changed, resetting IMU bias
ERROR [sensors] Accel #0 fail:  TIMEOUT!
ERROR [sensors] Sensor Accel #0 failed. Reconfiguring sensor priorities.
WARN  [sensors] Remaining sensors after failover event 0: Accel #0 priority: 1
WARN  [sensors] Remaining sensors after failover event 0: Accel #1 priority: 1
ERROR [sensors] Gyro #0 fail:  TIMEOUT!
ERROR [sensors] Sensor Gyro #0 failed. Reconfiguring sensor priorities.
WARN  [sensors] Remaining sensors after failover event 0: Gyro #0 priority: 1
WARN  [ekf2] accel id changed, resetting IMU bias

To Reproduce
Steps to reproduce the behavior:

  1. Fast desktop computer with Windows 10
  2. Installed Cygwin Toolchain
  3. start make posix jmavsim on latest master
  4. Sometimes it works fine even for longer periods, sometimes the errors and twitches occur

Expected behavior
Simulation should work just fine on a modern computer that isn't overloaded. It also worked fine before, I'm not sure but I think on 1.8 I didn't have any obvious problem not even on a slower laptop.

Log Files and Screenshots
I can provide a simulation log if that helps.

Add screenshots to help explain your problem.

Additional context
If I look closely at the description of this issue it looks very similar: #10098

Machine is fast enough and not overloaded and it doesn't always happen but regularly/consistently.

I'd suspect threading priority problems with Windows if none of this happens on any other platform but there might be a general problem that just shows off on Windows. While I ported SITL to Cygwin I had to adjust some priority numbers. https://github.com/PX4/Firmware/pull/8407/files#diff-b3ac3b41912e12a40f44757b561a0e4dR216 could be related.

Another general question is why should the simulation not work fine just slower if the physics simulation cannot run in real time. I think this was discussed before.

@dagar
Copy link
Member

dagar commented Oct 16, 2018

What do you see in task manager while this is happening? CPU, memory, and network utilization for both PX4 and jmavsim.

What do you see in uorb top?

@hamishwillee
Copy link
Contributor

@julianoes Will your SITL changes likely help with this issue?

@julianoes
Copy link
Contributor

@julianoes Will your SITL changes likely help with this issue?

Not in principle but I'm now debugging issues like this hoping to find out why uorb drops messages.

@MaEtUgR
Copy link
Member Author

MaEtUgR commented Nov 3, 2018

@dagar Thanks for the hints. CPU usage in task manager looks fine with PX4 taking 0-0.2% and jmavsim 2-5%. Here's an uorb top output snapshot:

update: 1s, num topics: 83
TOPIC NAME                        INST #SUB #MSG #LOST #QSIZE
actuator_controls_0                  0    7  171   287 1
battery_status                       0    6   79   206 1
ekf2_innovations                     0    1  114     0 1
ekf2_timestamps                      0    1  169     0 1
ekf_gps_drift                        0    1    9     0 1
estimator_status                     0    4  114   392 1
geofence_result                      0    1    4     0 1
rate_ctrl_status                     0    1  171     0 1
sensor_accel                         0    2  174   133 1
sensor_accel                         1    1  250    86 1
sensor_baro                          0    1   99    16 1
sensor_bias                          0    4  114   221 1
sensor_combined                      0    4  169   236 1
sensor_gyro                          0    3  174   136 1
sensor_mag                           0    2   99    71 1
sensor_preflight                     0    1  169     0 1
vehicle_air_data                     0    5   83    91 1
vehicle_attitude                     0    9  169   334 1
vehicle_attitude_groundtruth         0    1  140     0 1
vehicle_global_position              0    5  114   271 1
vehicle_global_position_groundtruth  0    1  140     0 1
vehicle_gps_position                 0    5    9     8 1
vehicle_local_position               0    8  114   321 1
vehicle_local_position_groundtruth   0    1  140     0 1
vehicle_magnetometer                 0    4   83    95 1
vehicle_rates_setpoint               0    4  152     0 1

Is there something striking?

@MaEtUgR
Copy link
Member Author

MaEtUgR commented Dec 3, 2018

Update: I talked to @Stifael who uses linux jmavsim and there seems to be a problem in simulation in general, it probably just hides better on linux. But it can always happen with a certain CPU load and timing jitter that your sensors are producing errors and especially that you loose global position and do not regain it until you restart simulation...

@julianoes
Copy link
Contributor

I have seen that too. Let's hope that it will be fixed with #10648.

@zulufoxtrot
Copy link

zulufoxtrot commented Dec 3, 2018

I think I have the same issue.

Environment:
MacOS 10.14.1
Macbook Pro, i7

Branch:
tag/v1.8.2

Command:

make posix jmavsim

Output:

ERROR [sensors] Accel #0 fail: TIMEOUT!
ERROR [sensors] Sensor Accel #0 failed. Reconfiguring sensor priorities.
WARN [sensors] Remaining sensors after failover event 0: Accel #0 priority: 1
WARN [sensors] Remaining sensors after failover event 0: Accel #1 priority: 1
ERROR [sensors] Gyro #0 fail: TIMEOUT!
ERROR [sensors] Sensor Gyro #0 failed. Reconfiguring sensor priorities.
WARN [sensors] Remaining sensors after failover event 0: Gyro #0 priority: 1

Performance:
JarSrcLoader takes about 40% of a core
px4 takes another 20%

Behavior:
Vehicle tends to crash into the ground randomly, although I'm not sure this is related as I am not familiar with the sim and I may be sending bad commands.

edit: the issue is gone in the same environment with latest master.

@MaEtUgR
Copy link
Member Author

MaEtUgR commented Jan 10, 2019

Just as an update. Since the lockstep was introduced the result of this problem is the whole simulation stops. At least you can assume now that before it stops nothing unexpected like a simulated drone crash happens. I'm not sure why it should stop but a first guess would be that a packet and hence the "lockstep token" gets lost e.g. in the simulator and there's a deadlock.

@julianoes
Copy link
Contributor

Fun 😭, so when does it stop? Straight away or after a while of flying? And does it always happen consistently?

@MaEtUgR
Copy link
Member Author

MaEtUgR commented Jan 14, 2019

It always happens but not consistently after the same amount of time. I just compared to linux, there I can easily let jmavsim fly for an hour without problem but on windows it happens quite often that the simulation completely locks before I did my tests which usually take about a minute or two. It also happens when I don't fly at all: I just start sitl jmavsim and do something else when I come back I see the symptom "global position loss" all the time.

@MaEtUgR
Copy link
Member Author

MaEtUgR commented Jan 14, 2019

In my tests after #11177 this issue is gone. Thanks to @bkueng!
For more details read: #11177 (comment)

@MaEtUgR MaEtUgR closed this as completed Jan 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants