-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve robustness to bad and lost airspeed data #10733
Conversation
What about airspeed in the fixed wing attitude controller (gain scheduling)? Either pause the airspeed update preserving the current/last value or mark it invalid to force the gain scheduling to use trim airspeed. |
hrt_abstime _time_tas_good_declared{0}; /**< time TAS use was started (uSec) */ | ||
hrt_abstime _time_tas_bad_declared{0}; /**< time TAS use was stopped (uSec) */ | ||
hrt_abstime _time_last_airspeed{0}; /**< time last airspeed measurement was received (uSec) */ | ||
char *_airspeed_fault_type = new char[7]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this with flags instead of strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that will take more time than I have to spend on this PR at this point in time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading your comment, I may have misunderstood it first time. Can you describe what you mean by using flags and why the objection to using strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of something like using more status flag booleans for each of the possibilities instead of strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this uses some more memory with the 7 character class variable, however it avoids the necessity to duplicate branching wherever the message is printed. If you have an idea how this can be avoided, please provide an outline. TXS
The behaviour should probably be dependant on the vehicle control mode so that RTL isn't engaged in manual or stabilized mode. |
|
||
# airspeed fault and airspeed use status | ||
bool aspd_check_failing # true when airspeed consistency checks are failing | ||
bool aspd_fault_declared # true when an airspeed fault has been declared |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we think about handling this similarly to position and velocity checks? https://github.com/PX4/Firmware/blob/master/msg/vehicle_status_flags.msg#L10
- condition_airspeed_valid
- similar probation logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is also publishing separate booleans so I don't understand the point you are making.
src/modules/commander/Commander.cpp
Outdated
status.aspd_use_inhibit = true; | ||
status.aspd_fail_rtl = true; | ||
// let us send the critical message even if already in RTL | ||
if (TRANSITION_DENIED != main_state_transition(status, commander_state_s::MAIN_STATE_AUTO_RTL, status_flags, &internal_state)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only if in an airspeed mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment.
We will be available to help flight test this starting in a couple weeks. |
19f1837
to
7baf9d4
Compare
This has been addressed by
|
Jenkins is failing on the omnibus-f4sd target build due to lack of flash space. |
latest flight test: https://review.px4.io/plot_app?log=672b97f6-1ba8-4426-a07a-aa51925253bb Firmware using this commit: 5600afe Failsafe still not triggering, _load_factor_ratio still stuck at 0 the whole time. |
Great progress, but there are a couple open issues/bugs: 1) Return mode does not send/log proper warnings (should send/log all messages):Return mode correctly triggers in this log (COM_ASPD_FS_ACT set to 4), but PX4 doesn't send down and log the proper RTL warning messages. 2) Silently switches to non-airspeed TECS mode in some cases (should not silently switch):Negative airspeed: The pitot is fully blocked and airspeed is reading negative. There is no indication the airspeed failsafe triggered, however, the vehicle seems to fly normally, possibly in non-airspeed TECS mode? Full block test. I messed up and had COM_ASPD_FS_ACT set to 0. However for some reason the plane flew as if it had failsafed? Killing airspeed driver (simulating airspeed sensor electrical fault): We attempted to stop the airspeed with the console, but it was not registered. Airspeed stoppped, but the vehicle kept flying well. What is happening here? There is no information/feedback. I feel the failsafe should be the mechanism for the TECS switch. Either way there shouldn't be silent switching. I think we will be in really good shape if we can address these issues and then do some more flight testing to establish good default parameters. |
I can see a " AIRSPEED DATA FAULTY - stopping use and returning" warning message in the log: |
yes, however, the RTL messages seem to be bypassed somehow. Some examples of bypassed messages: @dagar might understand why these are being bypassed? |
OK, so you want a separate RTL message. Did the " AIRSPEED DATA FAULTY - stopping use and returning" message appear on the GCS during flight? |
What if the message sent to the ground and logged read: "RTL: AIRSPEED DATA FAULTY" ? |
If I recall correctly, yes, but it was a while ago we did the test. Also, QGC audibly announces the switch to RETURN mode. The issue is that the other RTL related messages that normally are logged (and show up in console) are being improperly bypassed or suppressed or something. |
If the goal is to reduce characters <50 I prefer: "Airspeed data faulty, stopping use and returning" |
after some more ground testing is seems that after stopping the airpseed driver, restarting it does not allow the airspeed sensor/estimate to fully reinitialize. It seems to require an actual full system reboot. I'm not sure if or how this might factor in with the failsafe. Just commenting on it in case it does factor. |
@dagar I ran a test in SITL to test the negative pressure case with COM_ASPD_FS_ACT = 4, see https://logs.px4.io/plot_app?log=51e7247d-fc91-4ca3-bd83-076a5c0c056e The Mavlink GCS console received the following messages: WARN [commander] AIRSPEED DATA FAULTY - stopping use and returning However only the 'WARN [commander] AIRSPEED DATA FAULTY - stopping use and returning' message was logged to flash. The 'RTL HOME activated' message is produced here https://github.com/priseborough/Firmware/blob/pr-airSpdFailSafe/src/modules/navigator/navigator_main.cpp#L679 However the name of the function is misleading as it does not log. Edit: @Antiheavy I do not think this is a case of the RTL message being bypassed. It is being set here: https://github.com/priseborough/Firmware/blob/pr-airSpdFailSafe/src/modules/navigator/navigator_main.cpp#L679 @dagar Do you have suggestions on how we can ensure that that call results in a message being logged ad not just sent to console and to ground over Mavlink. |
I've done a first pass through the This is because the negative airspeed has triggered another protection mechanism here: https://github.com/priseborough/Firmware/blob/pr-airSpdFailSafe/src/modules/fw_pos_control_l1/FixedwingPositionControl.cpp#L385 This is a legacy mechanism and always runs. |
So if you want messages that are visible on the console, mavlink, and logged they need to be critical (mavlink_log_critical()) or emergency (mavlink_log_emergency()). |
FWIW there seem to be RTL related messages sent from at least two different modules: Example from "navigator_main.cpp" module: Examples from "RTL.cpp" module: |
Can we roll this into the failsafe functionality such that the the appropriate user notification and action is performed (set by COM_ASPD_FS_ACT)? COM_TAS_FS_T1 should still be respected for any failsafe action. What would be a good approach here? |
Protection against using a negative or NaN airspeed is something the fixed wing position controller (TECS) always requires and should not be something that should be bypassed by a parameter and therefore is not something that I would delegate totally to the commander. I will add a check for this to the commander so that there will be immediate setting of aspd_use_inhibit to true, followed by a notification after 1 second. |
The switch to RTL is always logged as part of the vehicle_status message, so we need to decide on what circumstances we actually need need an additional console message over and above a fault notification. |
Following discussions with @Antiheavy I have added a parameter to enable delaying the return into RTL. This will prevent unnecessary mission failure if wind shear or other wind estimation errors cause the EKF innovation check to temporarily fail. This will allow the mission to continue for a few minutes using the TECS pitch to throttle, albeit at a less efficient flight condition whilst giving a change for wind estimates to re-converge. The following figure shows a flight where launch and wind conditions caused a temporary wind estimation failure. This wind speed estimation error in the North direction was associated with a large state variance until sufficient turns had been completed, eg: I will investigate the possibility of delaying use of airspeed innovation checks until wind state variances have converged. |
* @max 2.0 | ||
* @group Commander | ||
*/ | ||
PARAM_DEFINE_FLOAT(COM_TAS_FS_INNOV, 1.0f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on flight testing and review of flight data, we feel like 1.0 causes too many false positives for COM_TAS_INNOV. It is one thing to throw a flag indicating airspeed readings aren't in agreement with expected values and it is another to declare them bad enough to trigger a failsafe. Hence, I suggested the default for COM_TAS_FS_INNOV = 2.0. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be acceptable because we still have the load factor test. What value did the airspeed innovation ratio peak at for the false positives you experienced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes the airspeed innovations can get pretty large, especially in windy conditions with a lot of turning.
Here are two flight logs with a lot of false positives. Airspeed was generally working, might be not calibrated perfectly, but it was reasonable. The biggest contributor here is high winds. You will see that we were changing params mid-flight to test the sensitivity to false positives.
This flight was flown in wind speeds exceeding the specified limit of the airframe:
https://review.px4.io/plot_app?log=d9a42e33-32b2-4530-ad71-bc9d917465f6
This flight was somewhat less windy, airspeed innovation test ratios are not as bad, but still exceed 1.0 regularly even though airspeed is functioning properly.
https://review.px4.io/plot_app?log=fa18a869-1a7b-4a5f-88ef-8dd5e6c22e79
In this short flight, the learned wind vector start out incorrect and caused a different kind of false positive.
https://review.px4.io/plot_app?log=8b9df1c8-82ca-417b-9f76-30fde4dd190a
Here is another log from an actual crash. This pitot had some moisture inside it and cause the airspeed to read "something", but it was not correct. This vehicle's firmware was similar to v1.8.0, so it did not have the benefit of this PR. You can see the airspeed test ratio slowly grow to peak at ~3.75.
https://review.px4.io/plot_app?log=1f6bc663-44e5-4bbc-8ecf-6f3fe4e51d27
note: airspeed is scaled by 0.1 in the above plots.
Edit: I fixed the plots above to show the tas_test_ratio instead of the airspeed_innov since it is actually the test_ratio that we care about. I updated the text above as well.
Edit 2: I added another example above of a real crash caused by a blocked pitot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Antiheavy Thanks for sharing those.
Besides increasing the threshold we can also reduce false positives by integrating the error when above the threshold so that larger transient conditions do not trigger, but a smaller persistent condition does.
I will have a go at this and also look at the log where the wind vector initialised incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides increasing the threshold we can also reduce false positives by integrating the error when above the threshold so that larger transient conditions do not trigger, but a smaller persistent condition does.
@priseborough this is a neat idea, but maybe it isn't worth the effort. It could be worth it if the solution allowed us to eliminate the need to parameterize COM_TAS_FS_INNOV, making this feature simpler for users/integrators. However, if we need a parameter to tune COM_TAS_FS_INNOV for different types of vehicles and users, then we might as well keep it a bit simpler to understand as a simple scaler on the airspeed test ratio. I feel with COM_ASPD_FS_DLY the solution is equally "safe" even with some false positives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory with a sensible integral error threshold we should be able to have the same COM_TAS_FS_INNOV for different vehicles because that is based on the innovation test ratio which if above 1 indicates the airspeed has been rejected by the EKF.
However in practice there will be many users that are flying with badly calibrated airspeed installations that are likely to fail the test. For example see the ones supplied by a FW tester for the 1.9 release here: #11785 where there was no awareness of the large impact of static source position error. That log would have triggered any sensible threshold, eg:
However in this example I think the user should have been alerted that they have a bad calibration. One way around this is for the initial implementation would be to use the innovation check in a 'warning only' capacity and relay on the load factor check for the failsafe.
I will implement the 'integral' check method and provide a link to it for further discussion here: #11846
@RomanBapst Could you take a pass here and review / comment? If possible I'd love to have this in 1.9, but it might be too big to bring in. |
Attempted rebase got a bit nasty. I squashed the branch, manually resolved the conflicts, and pushed to Let's continue in #11846. |
After talking with @dagar I now undertand that the logged RTL messages are working as intended. This item can be set aside for a separate and unrelated discussion. |
Description
Added in response to #9869
Enables detection and recovery from disconnected and blocked airspeed sensors.
Uses a combined airspeed innovation check and normal load factor sanity check.
False positives to bad EKF operation are reduced by requiring that GPS and magnetomer innovations are within limits from other sensors are within limits before declaring airspeed innovations bad.
The normal load factor sanity check will fail the airspeed sensor if the normal load factor sustains a value that is greater than what is physically possible given the measured airspeed.
The feature is off by default and controlled by the following parameters:
COM_ASPD_FS_ACT
Sets failsafe action when bad airspeed measurements are detected:
0 do nothing
1 log a message
2 log a message, warn the user
3 log a message, warn the user, switch to non-airspeed TECS mode
4 log a message, warn the user, switch to non-airspeed TECS mode, switch to Return mode
COM_ASPD_STALL
This is the minimum indicated airspeed at which the wing can produce 1g of lift. It is used by the airspeed sensor fault detection and failsafe calculation to detect a significant airspeed low measurement error condition and should be set based on flight test for reliable operation. The failsafe response is controlled by the COM_ASPD_FS_ACT parameter.
COM_TAS_FS_INNOV
This scales the minimum airspeed innovation level required to trigger a failsafe. Increase to make the check less sensitive, decrease to make it more sensitive. The default value of 1.0 uses the same threshold as the EKF does for airspeed acceptance.
COM_TAS_FS_T1
Number of seconds checks need to fail before an airspeed fault is declared and any failsafe response is actioned. Note - if both
COM_TAS_FS_T2
Number of seconds checks need to pass before an airspeed fault is cleared.
Test data / coverage
Preliminary testing in SITL was performed by modifying it here :https://github.com/PX4/sitl_gazebo/blob/3d80f63562c24c1537e1f8423ce649c99ebc15ea/src/gazebo_mavlink_interface.cpp#L549-L555 to fail the airspeed at specified times or heights, eg
SITL test logs
COM_ASPD_FS_ACT = 4
COM_ASPD_STALL = 10
COM_TAS_FS_INNOV = 1.0
COM_TAS_FS_T1 = 3
COM_TAS_FS_T2 = 100
Pitot permanently blocked immediately after takeoff.
https://logs.px4.io/plot_app?log=569f7887-378a-465d-87fe-2e22fbd1163b
Pitot blocked immediately after takeoff and unblocked after 30 sec. Airspeed fault status eventually clears, but vehicle remains in RTL loiter as expected.
https://logs.px4.io/plot_app?log=b42fbf34-25aa-42d6-ab16-52baf613a81a
Flight testing will be required.