-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CAN cleanup #8987
CAN cleanup #8987
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AP_Param::setup_object_defaults(this, var_info); | ||
#if CONFIG_HAL_BOARD == HAL_BOARD_SITL | ||
if (_singleton != nullptr) { | ||
AP_HAL::panic("AP_Proximity must be singleton"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AP_Proximity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups, copy paste done wrong 😞 I'll fix it.
@OXINARF As far as the size issue goes, I think its going to be discussed in dev call by @tridge. I think the issue is inly pertaining to px4-v2 builds, which we can make defunct and replace them with either fmuv2 or get rid of altogether. Meaning boards with 1M+ size will be the only ones to support CAN Protocol, which includes all existing boards with CAN support except for Pixhawks containing uC's with Flash USB errata. |
I've tested the branch from @bugobliterator that is on top of this and I LOVE IT. Tested it with a babel, a Zubax GNSS and an Emlid Edge. The class abstraction and AP_UAVCAN.cpp cleanup is greatly appreciated and very good. However, I'm seeing some heavy cpu scheduler numbers. I've never done proper comparisons so i don't know if it's big or small but on plane if I change the scheduler rate from 50 to 300 the mavlink connection really slows down. takes awhile to get a param download. At 400 I can barely connect enough to get the param changed |
@magicrub I answered in the other PR, but that's a clear sign that you haven't compiled the ChibiOS driver change in. Make sure to update the submodules before building the firmware. |
same problem here as in #8972, I find the zubax v1 is detected with plane, but not with copter. I suspect it is timing related. |
@tridge That's a known issue I've told you about. You can't use the STM32 CAN2 device without enabling CAN1 - because of our multiple CANManager instances it's not very safe to force CAN1 to be initialized, although it's doable; I'd prefer to wait until I've removed the multiple CANManager stuff later though. One thing you can test is initializing CAN1 device (port CAN2 in the Cube) without any protocol attached to it:
Is this with both ports enabled? And do you have ESCs enabled? I've found this problem too, but I don't know how to fix it; the issue is that because one of the ports doesn't have a device attached, all messages fail sending and so libuavcan will continuously try to re-send them during the spin - so the CAN thread doesn't sleep. It's basically the same issue we had before I fixed the ChibiOS driver, except there it also didn't work with just one port enabled and a device attached. I've thought about changing spin to spinOnce and then doing the sleep ourselves, but the issue is that when things are working, that could delay sending messages as spinOnce will take care of all received messages, but won't try to send all the pending messages, instead only sends one message out plus one per received message. @pavel-kirienko Any ideas here? |
Profiling on-target should help. Libuavcan is supposed to sleep even if there are unused interfaces because from the viewpoint of the library the only significant difference between a functioning interface and a floating interface is that in the latter case all frames timeout (it's not entirely correct to say that they "fail sending", they are aborted by the driver). If a floating interface creates a higher CPU load, something is probably wrong with the driver, hence I recommend profiling. |
@pavel-kirienko @tridge I did some experiments to check performance of this PR:
|
Not sure if the function call instrumentation is enabled on your target, but it is known to cause dramatic slowdown for libuavcan due to its design. I discovered this a long time ago in PX4; more info here: PX4/PX4-Autopilot#1660. It was fixed in PX4 since then, but I don't know whether it affects ArduPilot. I can't pass on this opportunity to emphasize the benefits of regular global profiling; if system performance is a concern, it must be done: PX4/PX4-Autopilot#6829 (comment) |
@pavel-kirienko there are no function instrumentation for the ChibiOS builds, that I am using. Further analysis revealed that maximum time is being spent in message's decode method ~300us, and the check was done ensuring that interrupt and context switches are postponed when between the measurements. Digging deeper it was found that around ~10us of CPU time is consumed per field and most of the time is spent here at this line https://github.com/ArduPilot/uavcan/blob/master/libuavcan/src/transport/uc_transfer_buffer.cpp#L175 which leads to https://github.com/ArduPilot/uavcan/blob/master/libuavcan/src/transport/uc_transfer_buffer.cpp#L101 |
Looks like I was onto something here https://github.com/ArduPilot/uavcan/blob/70df64480a7512fbbb85b442234dbd9b3bdae548/libuavcan/src/transport/uc_transfer_buffer.cpp#L168. The linked list based buffer requires sequential iteration of the buffer segments in order to reach the desirable offset, which makes it slow. Additionally, the code that you identified as the culprit makes a lot of auxiliary operations per every byte read in order to facilitate the segment traversal logic at the outer layer. One could get rid of the list traversal problem by using some sort of a stateful reader object which would keep references to the current chunk. Is anybody willing to give it a try? |
@pavel-kirienko If it helps with resolution of timing issue, certainly. I will give it a try. Also let me know if you have sample implementation lying around. |
No such thing. All I have is an idea.
…On Tue, Jul 31, 2018, 20:11 Siddharth Bharat Purohit < ***@***.***> wrote:
Is anybody willing to give it a try?
@pavel-kirienko <https://github.com/pavel-kirienko> If it helps with
resolution of timing issue, certainly. I will give it a try. Also let me
know if you have sample implementation lying around.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8987 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADJUZOLJRrHdT-zLZ3hlpiou9VDpLgdeks5uMI-xgaJpZM4VaKIn>
.
|
81adacb
to
80882fa
Compare
I've updated this PR. I haven't been able to reproduce my previous issue where it seemed that having both ports assigned to one driver and one port without anything attached, would lead to the previous issue of high CPU utilization. I don't know what I did wrong back in that day, but I've tried multiple options and couldn't lose MAVLink connection in any again. I've also found out that #7863 introduced a bug by making ESC and Actuator messages have a timeout of 20ms instead of 2ms. This is now fixed as well. I've left LEDs with a timeout of 20ms as it shouldn't hurt, given that those messages are sent at 20Hz. @tridge Can you please re-test this? I've been testing with the Zubax GNSS and I don't see any problem - besides the occasional bad CRC messages in MAVProxy (not sure if they are too frequent). Regarding the findings from @bugobliterator, they seem quite important and I hope we can reach some solution. |
rebased on master |
I've pushed a rebased version here: https://github.com/tridge/ardupilot/tree/pr/can-cleanup |
Also added more getter methods
And disable it by default
../../libraries/AP_BattMonitor/AP_BattMonitor_UAVCAN.cpp: In member function ‘virtual void AP_BattMonitor_UAVCAN::init()’: ../../libraries/AP_BattMonitor/AP_BattMonitor_UAVCAN.cpp:15:123: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘AP_Int32 {aka AP_ParamT<int, (ap_var_type)3u>}’ [-Wformat=] #define debug_bm_uavcan(level, fmt, args...) do { if ((level) <= AP_BoardConfig_CAN::get_can_debug()) { printf(fmt, ##args); }} while (0) ^ ../../libraries/AP_BattMonitor/AP_BattMonitor_UAVCAN.cpp:36:33: note: in expansion of macro ‘debug_bm_uavcan’ debug_bm_uavcan(2, "UAVCAN BattMonitor BatteryInfo registered id: %d\n\r", _params._serial_number);
Also change the order of logical OR so that led_write in UAVCAN_RGB_LED is called for all UAVCAN instances and not only first one
This includes creating own thread Also adapts example
Also a bit more cleanup
1441eb7
to
a53c848
Compare
Woohoo! |
Awesome job team!!! |
Great to see this happening.. yay good work |
This is a CAN cleanup:
This has been verified as working with a Zubax Babel. I'm waiting arrival of a Zubax GNSS this week and I'll do further testing. Any more testing anybody can do is appreciated.
I don't have an Emlid Edge to test the UAVCAN LED, but in the Babel messages look fine. Hopefully @staroselskii can test it.
As @bugobliterator (#8972), I've done some experiments on breaking AP_UAVCAN message handling into parts, but it really increases size so I've left it out for now. I have some ideas there, but no time to pursue them at the moment.