-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Navigator running erratic or very slow #13536
Comments
I just tried in SITL and did some rough measurements about what navigator is doing just sitting in manual:
FYI @sfuhrer. We'll have to look into this. |
RTL - constantly re-reading the safepoints when inactiveOne potential quick fix would be to add something to throttle https://github.com/PX4/Firmware/blob/master/src/modules/navigator/rtl.cpp#L66 |
@julianoes yes we should look into this. Let's though first get in #13320 . It's mainly about enabling the rally points also in other than RTL_TYPE = 3 mode. Note that the flight with which we had the issue was actually already with this PR. |
@dagar guess we should change that RTL::find_closest_landing_point() is only executed when the rally points are updated, right? Could you point me to the easiest way to do that? |
We don't really have a mechanism for that, but something similar to mission (https://github.com/PX4/Firmware/blob/master/msg/mission.msg) would probably make sense. In this case it's a bit different though because it's rechecking based on current position. |
not sure if this info helps: we also have large delays with a merge that is about 3/4 years behind current master. We have logs where there is more than a second delay between vehicle_status and triplet message. |
@dagar this hints that it has nothing to do with the recent addition of rally points or even your bigger architectural changes recently. It sounds more like something off with how/when navigator runs. |
@Stifael when this happened, did you have a geofence set, and or a mission? |
The people in kunshan reproduced the error several times by:
Not sure if geofence was active. Let me check |
Geofence was set to default, which is |
@Stifael right but were there geofence items set? This would cause SD card access. |
This is in HITL:
I read this as 1.4 seconds worst case. |
How do I see that from the log? |
I don't think you do. But it doesn't matter. I can reproduce it without a geofence set. According to my profiling using |
@dagar @bkueng @davids5 I narrowed the call talking in the order of 1s to: more specifically the actual I can reproduce it in HITL on a Pixracer flying a mission. It probably happens without flying a mission, the main point is that position is initialized and navigator runs. Edit: I could not reproduce read times above about 5ms when logger is not running! |
The safe points are tiny, let's just keep a full copy of them in RTL. Then let's add another orb message with full dataman metadata so we can limit updates. After that I'd still like to see what the remaining dataman access pattern is like for typical missions. Some simple caching and prefetching in dataman might be interesting. |
@bkueng just mentioned that this could be happening with cheaper SD cards, and indeed I can reproduce it with a cheap SanDisk EDGE 8GB class 4 (which presumably comes with Pixhawk 4) but I don't see reads above 11ms with a more expensive SanDisk Extreme PRO U3 card. |
This is currently not even about safe points. This is about geofence points. I think we need to implement some caching at some level but also think about moving the read calls to the background and access them async. |
It's both. For RTL we can store all the safe points in a small array. In geofence (IIRC) the polygon is already stored, but dataman is still checked constantly for an update. This is the part we get rid of with something similar to ORB_ID(mission) (https://github.com/PX4/Firmware/blob/master/msg/mission.msg). The mission message could turn into a more general dataman message that includes info about all the dataman entries, or we could add separate messages for each type (safe points, fence, waypoints).
Dataman is already like a poor man's async. Caching the first entry (where safe points and geofence store the number of items) might be a quick way to eliminate a lot of the disk reads (similar to the orb idea). |
TODO: review dataman locking. We don't seem to be using it consistently. |
This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions. |
Regarding geofence. We performed some tests with 60-point poly fence. Read+calculations takes from 30ms to 200ms on different UAVs. Looks like it depends on SD card condition. But 200ms is pretty high value. @dagar Do you have any plans to rework this logic? |
I have plans but no time. @dagar might know better. |
This didn't age well.... I'll be taking at look at as time allows within the next few weeks to see if any changes on master have fixed this issue since it was first noted here. Hopefully it's fixed 🤞🏻; I'll post again if not |
Describe the bug
We had a flight where the
vehicle_status.nav_state
was set toNAVIGATION_STATE_AUTO_RTL
for about 0.6 seconds and no message or triplet from navigator was published, as if it was not running.In the logfile below there are times where RTL should have been engaged by navigator because the nav state was set:
To Reproduce
Not sure.
Expected behavior
RTL should have been initiated.
Log Files and Screenshots
https://logs.px4.io/plot_app?log=7c4e2956-6ab5-409f-a2ac-0b908652a1de
Additional context
Navigator is polling on local_position and there are regular updates of this topic.
We need to investigate the effects of accessing the SD card in some of the navigator modes when not inactive. E.g. Mission and Rtl modes check mission items respective rally points even if not active! Also, we check for geofences set in every iteration.
FYI @MaEtUgR
The text was updated successfully, but these errors were encountered: