Reported CPU usage is confusing #1194

CAFxX · 2017-08-07T07:09:06Z

Command

cf app my_app

What occurred

The CPU usage is displayed as percentage, but nowhere it's defined what the percentage refers to. Users are puzzled by this, they often assume that the range is [0,100]%, whereas in reality, the range is [0,cores*100] where cores is a number they have no real visibility/control over since it's operator-defined.

In addition, this number does not reflect the CPU quotas at all, so whether e.g. "25%" means the application has a lot of idle resources or it is actually CPU-starved depends on:

the number of cores on the cell (operator controlled, user visible)
the container size (user controlled)
the mem->cpu quota mapping (operator controlled, non user visible)
other applications on the same cell (operator controlled, non user visible)

This means that currently the displayed CPU usage is basically not "actionable" at all, i.e. it doesn't tell users much about what's going on inside their application.

Extreme example of this: two instances with CPU usage that should have the same CPU usage because they handle the same workload:

instance 1: 200%
instance 2: 50%

Which instance is working correctly? Which one is not?

if we assume that the two instances are on two non-overloaded cells, most likely instance 1 has some issues because for the same workload uses 4 times as much CPU
if we know that instance 2 is on an overloaded cell so it's actually CPU-starved by other applications (i.e. instance 2 would like to use resources over its quota) then the problem is on instance 2

cf app myapp
Showing health and status for app myapp in org myorg / space myspace as me@example.com...
name:              myapp
requested state:   started
instances:         5/5
usage:             2G x 5 instances
routes:            myapp.example.com
last uploaded:     Thu 30 Mar 10:30:01 JST 2017
stack:             cflinuxfs2
buildpack:         https://github.com/cloudfoundry/java-buildpack.git
      state     since                  cpu      memory         disk           details
#0    running   2017-07-28T11:26:45Z   22.7%    1.3G of 2G     200.6M of 1G
#1    running   2017-08-04T16:43:52Z   17.3%    1.3G of 2G     200.6M of 1G
#2    running   2017-08-02T04:15:21Z   19.5%    1.3G of 2G     200.6M of 1G
#3    running   2017-08-07T04:15:21Z   20.1%    775.1M of 2G   200.6M of 1G
#4    running   2017-08-05T16:28:55Z   155.2%   1.2G of 2G     200.6M of 1G

What you expected to occur

CPU usage should be relative to the CPU quotas assigned to the container: 100% should, therefore, map to "100% of the allocated quota". (alternatively: the allocated CPU quota should be reported together with CPU usage, similarly as is done for memory and disk)

Instances running over 100% of the allocated quota should be highlighted in red because they are using best-effort resources that are not guaranteed to be available. User documentation should be updated to make this clear.

CLI Version

6.29.0

CC API Endpoint Version

2.74

The text was updated successfully, but these errors were encountered:

cf-gitbot · 2017-08-07T07:09:08Z

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/149996510

The labels on this github issue will be updated when the story is started.

dkoper · 2017-08-07T13:28:18Z

Hi @CAFxX

Thanks for this feedback.
Where in the user documentation do you think this explanation would fit well and is easy to find? I'm sure the Docs team would appreciate a pull request from you.

Red in CLI output is reserved for errors that warrant user attention. I don't see how a usage of >100% due to best-effort resources require attention?
Note that the CLI is just displaying the numbers returned by CC. I can CC @zrob on this issue, but it might be more efficient to submit a feature request or issue to CAPI's issue tracker?

Regards,
Dies Koper
CF CLI PM

CAFxX · 2017-08-09T14:39:39Z

Where in the user documentation do you think this explanation would fit well and is easy to find? I'm sure the Docs team would appreciate a pull request from you.

I'm not sure what should we document exactly... The point I'm trying to make is that the CPU usage number right now is not very useful from a user perspective and that we should replace it with something more actionable. Put it otherwise, this is not a documentation bug.

(if I really have to answer the question about where to put the documentation: I think I would argue that the most discoverable place for such documentation is cf app -help and the online docs - since I don't have access to the access stats of the docs I can't really argue whether priority should be given to the cli docs or the cf docs)

I don't see how a usage of >100% due to best-effort resources require attention?

Because an application that is routinely using over 100% capacity definitely needs scaling up/out or it is at risk of failing when the best-effort resources become unavailable for factors completely outside the application itself (e.g. because a different application on the same cell starts spinning).

Red in CLI output is reserved for errors that warrant user attention.

This may not be an "error" but as I argued above it should definitely warrant not just user attention but likely user action (scale out/up). In any case red was just a suggestion, the point is that it should be highlighted to suggest that it warrants user attention/action.

Note that the CLI is just displaying the numbers returned by CC. I can CC @zrob on this issue, but it might be more efficient to submit a feature request or issue to CAPI's issue tracker?

This is a long standing issue with the CF project. When something as big as a user story (e.g. this ticket) is reported it normally doesn't include in its scope a single component (e.g. only cli or only capi) but rather many components (e.g. in this case probably at least cli and capi, maybe diego as well). But AFAIK there's no single issue tracker that can be used for this, so we're forced to open the issue, at least at first, on a somewhat arbitrary component (normally the one where the problem "surfaces" the most).

So yes, eventually this will have to propagate also to capi and likely beyond. But if we don't agree that the issue is also in the CLI (again, mostly because it's where it's more likely to surface) then there's no point in opening the corresponding issues in the other components.

CAFxX · 2017-08-30T01:05:50Z

Updated the "what to expect" section to make it clear that "over 100%" refers to the allocated quota discussed beforehand, not about the currently displayed number

dkoper · 2017-09-04T23:59:31Z

I think I understand the issue(s), but not sure how to proceed. I don't have any stats on how/whether users refer to the cpu stats.
As you mentioned, currently the displayed CPU usage is basically not "actionable" at all, but in its current state it warrants user attention/action, there doesn't seem to be much we can do on the CLI side alone.
The CF Dev mailing list should be a better place to get the input from all relevant PMs and support from other users to prioritise an exploration around this?
Apologies for not taking ownership and driving this on your behalf: it seems you understand the issue much better and have ideas on how it should be, so it should be more effective if you lead the conversation with the relevant teams and users.

XenoPhex · 2018-04-24T22:40:19Z

Adding a link to this thread for additional context: https://lists.cloudfoundry.org/g/cf-dev/topic/16273332

Callisto13 · 2018-10-25T18:54:59Z

The Garden team (container runtime) have started work on a track which will:

change how CPU sharing is handled
produce a new metric which will (hopefully) make more sense in CLI metrics

You can check out our progress in our tracker by searching for the better-cpu-sharing tag.

abbyachau · 2019-04-24T20:54:03Z

Hi @Callisto13 @julz please could you provide a update to this issue? Thanks.

Callisto13 · 2019-04-25T10:33:56Z

Hi @abbyachau

From cf-deployment v3.6.0 (garden-runc-release 1.16.3) operators can deploy with operations/experimental/set-cpu-weight.yml which will turn on the new way CPU shares are calculated in Garden.

From cf-deployment v7.8.0 (garden-runc-release 1.19.0), having deployed with the same ops file mentioned above, CLI users can install the CPU Entitlement Plugin to get accurate CPU metrics for their apps.

Please note that this is all still highly experimental and the user experience could yet change.

abbyachau · 2019-04-25T16:35:14Z

Thanks @Callisto13! Appreciate the update.

@CAFxX if you are not already and are able to update to the aforementioned version of cf-deployment please let us know what you think of the plugin. Let us know as well if you are happy to close this GitHub issue as there doesn't appear to be anything the CLI team can do at the moment until the plugin gains more traction for feedback and the Garden team are able to iterate on it.

CAFxX · 2019-04-26T00:55:10Z

I'll defer to @giner as, unfortunately, I'm not working on CF anymore.

abbyachau · 2019-08-21T21:12:24Z

cc @emalm for visibility.

@gsiener please let us know us know if you've been able to try the plugin. Thanks.

gsiener · 2019-08-21T23:14:19Z

I think the request was intended for @giner. Thanks

heyjcollins · 2020-07-14T00:26:38Z

With the GA of the v7 CLI, we're no longer actively developing against the v6 line. With an interest the overall hygiene of the CLI project we're closing this issue.
If this issue is still occurring in v7 please feel free to comment and re-open. Thank you!

b10s · 2020-07-14T05:33:25Z

@heyjcollins hi,

is percentage in v7 in a range [0,100] or [0, cores*100] and is it relative to assigned CPU quota to an app?

a-b · 2020-07-14T19:05:47Z

The cli reports whatever CAPI https://v3-apidocs.cloudfoundry.org/version/3.86.0/index.html#the-process-stats-object reports back to us. You may want to reach out CAPI for more insights https://cloudfoundry.slack.com/archives/C07C04W4Q

univ0298 · 2020-08-19T14:44:03Z

@heyjcollins I don't believe this should have closed with V7

XenoPhex · 2020-08-24T18:09:06Z

@univ0298 I think what Josh is trying to say is that the CF CLI displays an unmodified version of what the CF [V3] API provides. In general, the CF CLI should not modify the contents of that data when presenting it to the user.

If an adjustment should be made, then it should be done on the API side so it will be consistent with all API clients for the foundation. So it's better to file a ticket against the Cloud Controller instead of the CLI.

ywei2017 · 2020-09-19T04:54:19Z

+1 for the issue. It's is especially unfortunate given how long the issue has been raised, and still not addressed.

cf-gitbot added the unscheduled label Aug 7, 2017

dkoper added enhancement needs discussion labels Sep 4, 2017

heyjcollins closed this as completed Jul 14, 2020

cf-gitbot removed the unscheduled label Jul 14, 2020

cf-gitbot added the delivered label Jul 14, 2020

cf-gitbot added accepted and removed delivered labels Jul 14, 2020

acrmp mentioned this issue Mar 15, 2024

Output CPU Entitlement metrics rather than the old CPU metric #2812

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reported CPU usage is confusing #1194

Reported CPU usage is confusing #1194

CAFxX commented Aug 7, 2017 •

edited

Loading

cf-gitbot commented Aug 7, 2017

dkoper commented Aug 7, 2017

CAFxX commented Aug 9, 2017 •

edited

Loading

CAFxX commented Aug 30, 2017

dkoper commented Sep 4, 2017

XenoPhex commented Apr 24, 2018

Callisto13 commented Oct 25, 2018

abbyachau commented Apr 24, 2019

Callisto13 commented Apr 25, 2019

abbyachau commented Apr 25, 2019

CAFxX commented Apr 26, 2019

abbyachau commented Aug 21, 2019

gsiener commented Aug 21, 2019

heyjcollins commented Jul 14, 2020

b10s commented Jul 14, 2020

a-b commented Jul 14, 2020

univ0298 commented Aug 19, 2020

XenoPhex commented Aug 24, 2020 •

edited

Loading

ywei2017 commented Sep 19, 2020

Reported CPU usage is confusing #1194

Reported CPU usage is confusing #1194

Comments

CAFxX commented Aug 7, 2017 • edited Loading

Command

What occurred

What you expected to occur

CLI Version

CC API Endpoint Version

cf-gitbot commented Aug 7, 2017

dkoper commented Aug 7, 2017

CAFxX commented Aug 9, 2017 • edited Loading

CAFxX commented Aug 30, 2017

dkoper commented Sep 4, 2017

XenoPhex commented Apr 24, 2018

Callisto13 commented Oct 25, 2018

abbyachau commented Apr 24, 2019

Callisto13 commented Apr 25, 2019

abbyachau commented Apr 25, 2019

CAFxX commented Apr 26, 2019

abbyachau commented Aug 21, 2019

gsiener commented Aug 21, 2019

heyjcollins commented Jul 14, 2020

b10s commented Jul 14, 2020

a-b commented Jul 14, 2020

univ0298 commented Aug 19, 2020

XenoPhex commented Aug 24, 2020 • edited Loading

ywei2017 commented Sep 19, 2020

CAFxX commented Aug 7, 2017 •

edited

Loading

CAFxX commented Aug 9, 2017 •

edited

Loading

XenoPhex commented Aug 24, 2020 •

edited

Loading