Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: humanize memory quantities using binary SI units #7428

Merged
merged 2 commits into from
Nov 7, 2024

Conversation

omerap12
Copy link
Member

What type of PR is this?

/kind feature

What this PR does / why we need it:

Improves resource reporting by converting raw memory values (bytes) into human-readable binary units (KiB, MiB, GiB, TiB). Allowing users to quickly understand memory quantities without interpreting large byte numbers.

Which issue(s) this PR fixes:

Fixes #7417

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional flag to recommender --humanize-memory

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 24, 2024
@k8s-ci-robot k8s-ci-robot requested a review from krzysied October 24, 2024 12:46
@k8s-ci-robot k8s-ci-robot requested a review from voelzmo October 24, 2024 12:46
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 24, 2024
@omerap12
Copy link
Member Author

cc @adrianmoisey

@adrianmoisey
Copy link
Member

In general I like this and will likely turn it on in production.
I think it needs some documentation (explaining the rounding, since the VPA's recommendation is now a bit more fuzzy)
I'm curious what the VPA approvers think.

@omerap12
Copy link
Member Author

In general I like this and will likely turn it on in production.

I think it needs some documentation (explaining the rounding, since the VPA's recommendation is now a bit more fuzzy)

I'm curious what the VPA approvers think.

Thanks! If the approvers are satisfied, I’ll be happy to document everything. :)

@voelzmo
Copy link
Contributor

voelzmo commented Oct 25, 2024

/retest

@omerap12
Copy link
Member Author

/retest

Passed , thanks 😀

@adrianmoisey
Copy link
Member

Jotting down some thoughts that I haven't had the time to dig into:

  1. How does this interact with the min/max options?
  2. I may be wrong here, but the precision may get lower as the recommended memory is higher, I wonder if this can be tweaked.

@omerap12
Copy link
Member Author

Jotting down some thoughts that I haven't had the time to dig into:

  1. How does this interact with the min/max options?
  2. I may be wrong here, but the precision may get lower as the recommended memory is higher, I wonder if this can be tweaked.
  1. I haven't fully considered how this ties into the min/max settings yet. However, since we're only adjusting the Si value, I believe it might be better to round down instead of rounding up. Here's an example reference: PR #7428.
  2. I think we could introduce a precision-point integer flag to make this adjustable. but, I’m a bit concerned about adding too many flags.

@adrianmoisey
Copy link
Member

2. I think we could introduce a precision-point integer flag to make this adjustable. but, I’m a bit concerned about adding too many flags.

My gut feel here is that may be we allow 2 digits after the decimal place only? (ie: not configurable).

For example, 1.5Gi is actually easy for a human and more precise than rounding up to 2Gi

@omerap12
Copy link
Member Author

  1. I think we could introduce a precision-point integer flag to make this adjustable. but, I’m a bit concerned about adding too many flags.

My gut feel here is that may be we allow 2 digits after the decimal place only? (ie: not configurable).

For example, 1.5Gi is actually easy for a human and more precise than rounding up to 2Gi

Fine with me. I can adjust :)

@omerap12
Copy link
Member Author

PTAL @adrianmoisey

Signed-off-by: Omer Aplatony <omerap12@gmail.com>
@adrianmoisey
Copy link
Member

I like this, but am also curious what others think.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 31, 2024
Copy link
Member

@kwiesmueller kwiesmueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally lgtm, just a nit about the log

vertical-pod-autoscaler/pkg/recommender/model/types.go Outdated Show resolved Hide resolved
Co-authored-by: Adrian Moisey <adrian@changeover.za.net>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 7, 2024
@omerap12
Copy link
Member Author

omerap12 commented Nov 7, 2024

Generally lgtm, just a nit about the log

Done :)

@adrianmoisey
Copy link
Member

/lgtm

Let's see if @kwiesmueller agrees with the log level

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 7, 2024
@kwiesmueller
Copy link
Member

/lgtm
Thanks!
/approve
/unhold

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kwiesmueller, omerap12

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 7, 2024
@k8s-ci-robot k8s-ci-robot merged commit 3abfbcb into kubernetes:master Nov 7, 2024
7 checks passed
@alekc
Copy link

alekc commented Jan 20, 2025

Was this not released with 1.2.2 for some reason?

❯ docker run --rm -it registry.k8s.io/autoscaling/vpa-recommender:1.2.2 --humanize-memory
Unable to find image 'registry.k8s.io/autoscaling/vpa-recommender:1.2.2' locally
1.2.2: Pulling from autoscaling/vpa-recommender
6d44aedb091c: Download complete
Digest: sha256:813c28c0b08a984e1c3e421cde7bbb7444fee213386e305488043ee8e5ef6e69
Status: Downloaded newer image for registry.k8s.io/autoscaling/vpa-recommender:1.2.2
unknown flag: --humanize-memory
Usage of /recommender:
      --add-dir-header                                       If true, adds the file directory to the header of the log messages
      --address string                                       The address to expose Prometheus metrics. (default ":8942")
      --alsologtostderr                                      log to standard error as well as files (no effect when -logtostderr=true)
      --checkpoints-gc-interval duration                     How often orphaned checkpoints should be garbage collected (default 10m0s)
      --checkpoints-timeout duration                         Timeout for writing checkpoints since the start of the recommender's main loop (default 1m0s)
      --container-name-label string                          Label name to look for container names (default "name")
      --container-namespace-label string                     Label name to look for container namespaces (default "namespace")
      --container-pod-name-label string                      Label name to look for container pod names (default "pod_name")
      --cpu-histogram-decay-half-life duration               The amount of time it takes a historical CPU usage sample to lose half of its weight. (default 24h0m0s)
      --cpu-integer-post-processor-enabled                   Enable the cpu-integer recommendation post processor. The post processor will round up CPU recommendations to a whole CPU for pods which were opted in by setting an appropriate label on VPA object (experimental)
      --external-metrics-cpu-metric string                   ALPHA.  Metric to use with external metrics provider for CPU usage.
      --external-metrics-memory-metric string                ALPHA.  Metric to use with external metrics provider for memory usage.
      --history-length string                                How much time back prometheus have to be queried to get historical metrics (default "8d")
      --history-resolution string                            Resolution at which Prometheus is queried for historical metrics (default "1h")
      --ignored-vpa-object-namespaces string                 Comma separated list of namespaces to ignore. Must not be used if vpa-object-namespace is used.
      --kube-api-burst float                                 QPS burst limit when making requests to Kubernetes apiserver (default 10)
      --kube-api-qps float                                   QPS limit when making requests to Kubernetes apiserver (default 5)
      --kubeconfig string                                    Path to a kubeconfig. Only required if out-of-cluster.
      --leader-elect                                         Start a leader election client and gain leadership before executing the main loop. Enable this when running replicated components for high availability.
      --leader-elect-lease-duration duration                 The duration that non-leader candidates will wait after observing a leadership renewal until attempting to acquire leadership of a led but unrenewed leader slot. This is effectively the maximum duration that a leader can be stopped before it is replaced by another candidate. This is only applicable if leader election is enabled. (default 15s)
      --leader-elect-renew-deadline duration                 The interval between attempts by the acting master to renew a leadership slot before it stops leading. This must be less than the lease duration. This is only applicable if leader election is enabled. (default 10s)
      --leader-elect-resource-lock string                    The type of resource object that is used for locking during leader election. Supported options are 'leases', 'endpointsleases' and 'configmapsleases'. (default "leases")
      --leader-elect-resource-name string                    The name of resource object that is used for locking during leader election. (default "vpa-recommender")
      --leader-elect-resource-namespace string               The namespace of resource object that is used for locking during leader election. (default "kube-system")
      --leader-elect-retry-period duration                   The duration the clients should wait between attempting acquisition and renewal of a leadership. This is only applicable if leader election is enabled. (default 2s)
      --log-backtrace-at traceLocation                       when logging hits line file:N, emit a stack trace (default :0)
      --log-dir string                                       If non-empty, write log files in this directory (no effect when -logtostderr=true)
      --log-file string                                      If non-empty, use this log file (no effect when -logtostderr=true)
      --log-file-max-size uint                               Defines the maximum size a log file can grow to (no effect when -logtostderr=true). Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
      --logtostderr                                          log to standard error instead of files (default true)
      --memory-aggregation-interval duration                 The length of a single interval, for which the peak memory usage is computed. Memory usage peaks are aggregated in multiples of this interval. In other words there is one memory usage sample per interval (the maximum usage over that interval) (default 24h0m0s)
      --memory-aggregation-interval-count int                The number of consecutive memory-aggregation-intervals which make up the MemoryAggregationWindowLength which in turn is the period for memory usage aggregation by VPA. In other words, MemoryAggregationWindowLength = memory-aggregation-interval * memory-aggregation-interval-count. (default 8)
      --memory-histogram-decay-half-life duration            The amount of time it takes a historical memory usage sample to lose half of its weight. In other words, a fresh usage sample is twice as 'important' as one with age equal to the half life period. (default 24h0m0s)
      --memory-saver                                         If true, only track pods which have an associated VPA
      --metric-for-pod-labels string                         Which metric to look for pod labels in metrics (default "up{job=\"kubernetes-pods\"}")
      --min-checkpoints int                                  Minimum number of checkpoints to write per recommender's main loop (default 10)
      --one-output                                           If true, only write logs to their native severity level (vs also writing to each lower severity level; no effect when -logtostderr=true)
      --oom-bump-up-ratio float                              The memory bump up ratio when OOM occurred, default is 1.2. (default 1.2)
      --oom-min-bump-up-bytes float                          The minimal increase of memory when OOM occurred in bytes, default is 100 * 1024 * 1024 (default 1.048576e+08)
      --password string                                      The password used in the prometheus server basic auth
      --pod-label-prefix string                              Which prefix to look for pod labels in metrics (default "pod_label_")
      --pod-name-label string                                Label name to look for pod names (default "kubernetes_pod_name")
      --pod-namespace-label string                           Label name to look for pod namespaces (default "kubernetes_namespace")
      --pod-recommendation-min-cpu-millicores float          Minimum CPU recommendation for a pod (default 25)
      --pod-recommendation-min-memory-mb float               Minimum memory recommendation for a pod (default 250)
      --prometheus-address string                            Where to reach for Prometheus metrics
      --prometheus-cadvisor-job-name string                  Name of the prometheus job name which scrapes the cAdvisor metrics (default "kubernetes-cadvisor")
      --prometheus-query-timeout string                      How long to wait before killing long queries (default "5m")
      --recommendation-lower-bound-cpu-percentile float      CPU usage percentile that will be used for the lower bound on CPU recommendation. (default 0.5)
      --recommendation-lower-bound-memory-percentile float   Memory usage percentile that will be used for the lower bound on memory recommendation. (default 0.5)
      --recommendation-margin-fraction float                 Fraction of usage added as the safety margin to the recommended request (default 0.15)
      --recommendation-upper-bound-cpu-percentile float      CPU usage percentile that will be used for the upper bound on CPU recommendation. (default 0.95)
      --recommendation-upper-bound-memory-percentile float   Memory usage percentile that will be used for the upper bound on memory recommendation. (default 0.95)
      --recommender-interval duration                        How often metrics should be fetched (default 1m0s)
      --recommender-name string                              Set the recommender name. Recommender will generate recommendations for VPAs that configure the same recommender name. If the recommender name is left as default it will also generate recommendations that don't explicitly specify recommender. You shouldn't run two recommenders with the same name in a cluster. (default "default")
      --skip-headers                                         If true, avoid header prefixes in the log messages
      --skip-log-headers                                     If true, avoid headers when opening log files (no effect when -logtostderr=true)
      --stderrthreshold severity                             logs at or above this threshold go to stderr when writing to files and stderr (no effect when -logtostderr=true or -alsologtostderr=false) (default 2)
      --storage string                                       Specifies storage mode. Supported values: prometheus, checkpoint (default)
      --target-cpu-percentile float                          CPU usage percentile that will be used as a base for CPU target recommendation. Doesn't affect CPU lower bound, CPU upper bound nor memory recommendations. (default 0.9)
      --target-memory-percentile float                       Memory usage percentile that will be used as a base for memory target recommendation. Doesn't affect memory lower bound nor memory upper bound. (default 0.9)
      --use-external-metrics                                 ALPHA.  Use an external metrics provider instead of metrics_server.
      --username string                                      The username used in the prometheus server basic auth
  -v, --v Level                                              number for the log level verbosity
      --vmodule moduleSpec                                   comma-separated list of pattern=N settings for file-filtered logging
      --vpa-object-namespace string                          Namespace to search for VPA objects and pod stats. Empty means all namespaces will be used. Must not be used if ignored-vpa-object-namespaces is set.
unknown flag: --humanize-memory

Maybe its missing from https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/pkg/recommender/main.go#L58-L115? Or did I get it wrong in regard of how to use it?

@adrianmoisey
Copy link
Member

There is a change log here: https://github.com/kubernetes/autoscaler/releases/tag/vertical-pod-autoscaler-1.2.2

Patch releases don't get new features, only bug fixes or regressions.
The feature will come out in 1.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert resources to a more human relevant SI
6 participants