Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate px4fmu-v5 cpu spikes #9852

Merged
merged 1 commit into from
Dec 22, 2018
Merged

investigate px4fmu-v5 cpu spikes #9852

merged 1 commit into from
Dec 22, 2018

Conversation

dagar
Copy link
Member

@dagar dagar commented Jul 6, 2018

For fmu-v5 testing.

LorenzMeier
LorenzMeier previously approved these changes Jul 6, 2018
@dagar
Copy link
Member Author

dagar commented Jul 6, 2018

@LorenzMeier LorenzMeier requested a review from bkueng July 6, 2018 20:52
@dagar dagar requested a review from davids5 July 7, 2018 12:47
@dagar
Copy link
Member Author

dagar commented Jul 7, 2018

Questions

  • is this a valid cpuload measurement or should we look at NuttX CONFIG_SCHED_CPULOAD_EXTCLK or similar?
  • could we add instrumentation to capture the offending task_info or even the entire system_load?

@davids5
Copy link
Member

davids5 commented Jul 9, 2018

Questions

is this a valid cpuload measurement or should we look at NuttX CONFIG_SCHED_CPULOAD_EXTCLK or similar?

We are not running round-robin. So I think is is not an issue. We do not terminate on system clock ticks. So something running long is logged as such.

could we add instrumentation to capture the offending task_info or even the entire system_load?

Why not? Can we at mode to top to track and log?

@dagar
Copy link
Member Author

dagar commented Jul 9, 2018

could we add instrumentation to capture the offending task_info or even the entire system_load?

Why not? Can we at mode to top to track and log?

I meant in a we could live with on by default. The concern being triggering something relatively expensive as a result of abnormally high cpu usage.

Copy link
Member

@davids5 davids5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you looks at the differnece at the instruction level for the math changes?

@@ -207,26 +190,31 @@ void LoadMon::_compute()
}

/* compute system load */
const hrt_abstime interval_idletime = system_load.tasks[0].total_runtime - _last_idle_time;
const float interval = hrt_elapsed_time(&_last_idle_time_sample);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to keep to integer math as much as possible.

// Run it at 1 Hz.
const unsigned LOAD_MON_INTERVAL_US = 1000000;
// Run it at 10 Hz.
const unsigned LOAD_MON_INTERVAL_US = 100000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for bench-testing but I would not increase it in general.

We can add spike-detection instrumentation to the top command. Then we can also do something that is a bit more expensive.

@dagar dagar force-pushed the pr-loadmon branch 5 times, most recently from 9e98fa9 to d397720 Compare November 16, 2018 15:59
@dagar dagar requested a review from a team November 16, 2018 15:59
@dagar
Copy link
Member Author

dagar commented Nov 16, 2018

@PX4/testflights could you please do a quick flight test on any px4fmu-v5 platform with a comparison to master? The only difference should be in the cpu logging on flight review.

@dagar dagar changed the title [WIP] load_mon improve cpuload calculation and cleanup load_mon improve cpuload calculation and cleanup Nov 16, 2018
@Avysuarez
Copy link

Tested on Pixhawk 4 mini. good flight, there is no difference with the master.
Master.
https://review.px4.io/plot_app?log=31625693-cbdd-4b6e-8847-5114789173c0
PR.
https://review.px4.io/plot_app?log=096948f4-f163-4125-8f24-e53bfee8ea33

@dagar
Copy link
Member Author

dagar commented Nov 16, 2018

Thanks @Avysuarez

Still seeing cpu spikes. @davids5

image

@dagar dagar changed the title load_mon improve cpuload calculation and cleanup px4fmu-v5 cpu spikes Nov 16, 2018
@dagar dagar added bug Board: Pixhawk fmu-v5 FMU-v5 including pixhawk 4, pixhack v5, etc labels Nov 16, 2018
@dagar dagar added this to the Release v1.9.0 milestone Nov 16, 2018
@dagar
Copy link
Member Author

dagar commented Nov 16, 2018

These don't seem to be present in similar stm32f4 logs.

@dagar dagar changed the title px4fmu-v5 cpu spikes investigate px4fmu-v5 cpu spikes Nov 16, 2018
@dagar
Copy link
Member Author

dagar commented Nov 21, 2018

Here's a possibly interesting data point.

The AV-X board (another stm32f7 board) doesn't appear to suffer from this. In this test flight it wasn't using i2c for anything. https://review.px4.io/plot_app?log=273dd3d5-c530-4ab1-bdac-f36b8a816421

image

@davids5
Copy link
Member

davids5 commented Nov 21, 2018

@dagar that would confirm the suspicion I had. I will instrument it today.

@davids5
Copy link
Member

davids5 commented Nov 23, 2018

@dagar I started looking at this and no obvious smoking gun on the I2C. More on Monday,...

@dagar
Copy link
Member Author

dagar commented Nov 26, 2018

Seems a bit more inconsistent than I initially thought. Here are 2 more logs from earlier today on a pixhawk 4 and 4 mini. Otherwise these are nearly identical setups, flying the exact same firmware revision, same vehicle, same place, same time.

Here's a pixhawk 4 mini flight from today. https://review.px4.io/plot_app?log=8b8adf6b-0bcb-49c0-a684-0f23685c4e71
image

Here's a pixhawk 4 flight from today. https://review.px4.io/plot_app?log=a01ee82e-fe67-4755-98d8-649cc51ecd02
image

EDIT: I went through a random sampling of old flights for both of those boards and the presence of cpu spikes seems consistent per board.

@dagar
Copy link
Member Author

dagar commented Dec 14, 2018

Related #11013 (comment)

@dagar
Copy link
Member Author

dagar commented Dec 14, 2018

@davids5 good to merge? The only functional change here is using the actual measured interval rather than the LOAD_MON_INTERVAL_US constant for calculating the load.

Copy link
Member

@davids5 davids5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dagar - looks good, I think there are 2 schools of thought on the task_stack_info becoming automatic. It has all the info in one place that is now not persistent. But if needed the we can make it static to see the deltas in the debugger so this is an OK change.

@dagar dagar merged commit 49be26b into master Dec 22, 2018
@dagar dagar deleted the pr-loadmon branch December 22, 2018 06:28
@davids5
Copy link
Member

davids5 commented Feb 13, 2019

@dagar are we still seeing the spikes post nuttx upgrade?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Board: Pixhawk fmu-v5 FMU-v5 including pixhawk 4, pixhack v5, etc bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants