Skip to content
This repository has been archived by the owner on Oct 23, 2024. It is now read-only.

Commit

Permalink
Add cpu.num_processors, memory.total metric and make k8s limit metric…
Browse files Browse the repository at this point in the history
…s default (#1344)

We will need these for dashboards that use the new kubelet-metrics monitor
since the new endpoint that that uses doesn't provide that information
and must be gotten from other monitors.
  • Loading branch information
benkeith-splunk authored Jun 4, 2020
1 parent ce9f1ea commit cbab71a
Show file tree
Hide file tree
Showing 17 changed files with 58 additions and 14 deletions.
1 change: 1 addition & 0 deletions docs/monitors/cpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ Metrics that are categorized as

- `cpu.nice` (*cumulative*)<br> CPU time spent in userspace running 'nice'-ed processes. In order to get a percentage this value must be compared against the sum of all CPU states. A sustained high value for this metric may be caused by: 1) The server not having enough CPU capacity for a process, 2) A programming error which causes a process to use an unexpected amount of CPU

- ***`cpu.num_processors`*** (*gauge*)<br> The number of logical processors on the host.
- `cpu.softirq` (*cumulative*)<br> CPU time spent while servicing software interrupts. Unlike a hardware interrupt, a software interrupt happens at the sofware layer. Usually it is a userspace program requesting a service of the kernel. This metric measures how many jiffies were spent by the CPU handling these interrupts. In order to get a percentage this value must be compared against the sum of all CPU states. A sustained high value for this metric may be caused by a programming error which causes a process to unexpectedly request too many services from the kernel.

- `cpu.steal` (*cumulative*)<br> CPU time spent waiting for a hypervisor to service requests from other virtual machines. This metric is only present on virtual machines. This metric records how much time this virtual machine had to wait to have the hypervisor kernel service a request. In order to get a percentage this value must be compared against the sum of all CPU states. A sustained high value for this metric may be caused by: 1) Another VM on the same hypervisor using too many resources, or 2) An underpowered hypervisor
Expand Down
4 changes: 2 additions & 2 deletions docs/monitors/kubernetes-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,11 +76,11 @@ Metrics that are categorized as
(*default*) are ***in bold and italics*** in the list below.


- `kubernetes.container_cpu_limit` (*gauge*)<br> Maximum CPU limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
- ***`kubernetes.container_cpu_limit`*** (*gauge*)<br> Maximum CPU limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
- `kubernetes.container_cpu_request` (*gauge*)<br> CPU requested for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
- `kubernetes.container_ephemeral_storage_limit` (*gauge*)<br> Maximum ephemeral storage set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available. See https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#local-ephemeral-storage for details.
- `kubernetes.container_ephemeral_storage_request` (*gauge*)<br> Ephemeral storage requested for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available. See https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#local-ephemeral-storage for details
- `kubernetes.container_memory_limit` (*gauge*)<br> Maximum memory limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
- ***`kubernetes.container_memory_limit`*** (*gauge*)<br> Maximum memory limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
- `kubernetes.container_memory_request` (*gauge*)<br> Memory requested for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
- ***`kubernetes.container_ready`*** (*gauge*)<br> Whether a container has passed its readiness probe (0 for no, 1 for yes)
- ***`kubernetes.container_restart_count`*** (*gauge*)<br> How many times the container has restarted in the recent past. This value is pulled directly from [the K8s API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.11/#containerstatus-v1-core) and the value can go indefinitely high and be reset to 0 at any time depending on how your [kubelet is configured to prune dead containers](https://kubernetes.io/docs/concepts/cluster-administration/kubelet-garbage-collection/). It is best to not depend too much on the exact value but rather look at it as either `== 0`, in which case you can conclude there were no restarts in the recent past, or `> 0`, in which case you can conclude there were restarts in the recent past, and not try and analyze the value beyond that.
Expand Down
1 change: 1 addition & 0 deletions docs/monitors/memory.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ Metrics that are categorized as
- ***`memory.free`*** (*gauge*)<br> (Linux Only) Bytes of memory available for use.
- ***`memory.slab_recl`*** (*gauge*)<br> (Linux Only) Bytes of memory, used for SLAB-allocation of kernel objects, that can be reclaimed.
- ***`memory.slab_unrecl`*** (*gauge*)<br> (Linux Only) Bytes of memory, used for SLAB-allocation of kernel objects, that can't be reclaimed.
- ***`memory.total`*** (*gauge*)<br> Total bytes of system memory on the system.
- ***`memory.used`*** (*gauge*)<br> Bytes of memory in use by the system.
- ***`memory.utilization`*** (*gauge*)<br> Percent of memory in use on this host. This does NOT include buffer or cache memory on Linux.

Expand Down
4 changes: 2 additions & 2 deletions docs/monitors/openshift-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,11 +77,11 @@ Metrics that are categorized as
(*default*) are ***in bold and italics*** in the list below.


- `kubernetes.container_cpu_limit` (*gauge*)<br> Maximum CPU limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
- ***`kubernetes.container_cpu_limit`*** (*gauge*)<br> Maximum CPU limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
- `kubernetes.container_cpu_request` (*gauge*)<br> CPU requested for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
- `kubernetes.container_ephemeral_storage_limit` (*gauge*)<br> Maximum ephemeral storage set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available. See https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#local-ephemeral-storage for details.
- `kubernetes.container_ephemeral_storage_request` (*gauge*)<br> Ephemeral storage requested for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available. See https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#local-ephemeral-storage for details
- `kubernetes.container_memory_limit` (*gauge*)<br> Maximum memory limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
- ***`kubernetes.container_memory_limit`*** (*gauge*)<br> Maximum memory limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
- `kubernetes.container_memory_request` (*gauge*)<br> Memory requested for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
- ***`kubernetes.container_ready`*** (*gauge*)<br> Whether a container has passed its readiness probe (0 for no, 1 for yes)
- ***`kubernetes.container_restart_count`*** (*gauge*)<br> How many times the container has restarted in the recent past. This value is pulled directly from [the K8s API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.11/#containerstatus-v1-core) and the value can go indefinitely high and be reset to 0 at any time depending on how your [kubelet is configured to prune dead containers](https://kubernetes.io/docs/concepts/cluster-administration/kubelet-garbage-collection/). It is best to not depend too much on the exact value but rather look at it as either `== 0`, in which case you can conclude there were no restarts in the recent past, or `> 0`, in which case you can conclude there were restarts in the recent past, and not try and analyze the value beyond that.
Expand Down
4 changes: 4 additions & 0 deletions pkg/monitors/cpu/cpu.go
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,10 @@ func (m *Monitor) generateDatapoints() []*datapoint.Datapoint {
))
}

if cpuCount, err := cpu.Counts(true); err == nil {
dps = append(dps, sfxclient.Gauge(cpuNumProcessors, nil, int64(cpuCount)))
}

// store current as previous value for next time
m.previousTotal = current

Expand Down
7 changes: 5 additions & 2 deletions pkg/monitors/cpu/genmetadata.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions pkg/monitors/cpu/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -104,5 +104,10 @@ monitors:
group:
default: false
type: cumulative

cpu.num_processors:
description: The number of logical processors on the host.
default: true
type: gauge
monitorType: cpu
properties:
2 changes: 2 additions & 0 deletions pkg/monitors/kubernetes/cluster/meta/genmetadata.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions pkg/monitors/kubernetes/cluster/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ common:
description: Maximum CPU limit set for the container. This value is derived from
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which
comes from the pod spec and is reported only if a non null value is available.
default: false
default: true
type: gauge
kubernetes.container_cpu_request:
description: CPU requested for the container. This value is derived from
Expand Down Expand Up @@ -54,7 +54,7 @@ common:
description: Maximum memory limit set for the container. This value is derived from
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core
which comes from the pod spec and is reported only if a non null value is available.
default: false
default: true
type: gauge
kubernetes.container_memory_request:
description: Memory requested for the container. This value is derived from
Expand Down
3 changes: 3 additions & 0 deletions pkg/monitors/memory/genmetadata.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions pkg/monitors/memory/memory_darwin.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,6 @@ func (m *Monitor) makeMemoryDatapoints(memInfo *mem.VirtualMemoryStat, dimension
datapoint.New("memory.inactive", dimensions, datapoint.NewIntValue(int64(memInfo.Inactive)), datapoint.Gauge, time.Time{}),
datapoint.New("memory.wired", dimensions, datapoint.NewIntValue(int64(memInfo.Wired)), datapoint.Gauge, time.Time{}),
datapoint.New("memory.free", dimensions, datapoint.NewIntValue(int64(memInfo.Free)), datapoint.Gauge, time.Time{}),
datapoint.New("memory.total", dimensions, datapoint.NewIntValue(int64(memInfo.Total)), datapoint.Gauge, time.Time{}),
}
}
1 change: 1 addition & 0 deletions pkg/monitors/memory/memory_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,6 @@ func (m *Monitor) makeMemoryDatapoints(memInfo *mem.VirtualMemoryStat, dimension
datapoint.New("memory.slab_recl", dimensions, datapoint.NewIntValue(int64(memInfo.SReclaimable)), datapoint.Gauge, time.Time{}),
datapoint.New("memory.slab_unrecl", dimensions, datapoint.NewIntValue(int64(memInfo.Slab-memInfo.SReclaimable)), datapoint.Gauge, time.Time{}),
datapoint.New("memory.free", dimensions, datapoint.NewIntValue(int64(memInfo.Free)), datapoint.Gauge, time.Time{}),
datapoint.New("memory.total", dimensions, datapoint.NewIntValue(int64(memInfo.Total)), datapoint.Gauge, time.Time{}),
}
}
1 change: 1 addition & 0 deletions pkg/monitors/memory/memory_windows.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,6 @@ func (m *Monitor) makeMemoryDatapoints(memInfo *mem.VirtualMemoryStat, dimension
datapoint.New("memory.utilization", dimensions, datapoint.NewFloatValue(memInfo.UsedPercent), datapoint.Gauge, time.Time{}),
datapoint.New("memory.used", dimensions, datapoint.NewIntValue(int64(memInfo.Used)), datapoint.Gauge, time.Time{}),
datapoint.New("memory.available", dimensions, datapoint.NewIntValue(int64(memInfo.Available)), datapoint.Gauge, time.Time{}),
datapoint.New("memory.total", dimensions, datapoint.NewIntValue(int64(memInfo.Total)), datapoint.Gauge, time.Time{}),
}
}
5 changes: 5 additions & 0 deletions pkg/monitors/memory/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,5 +48,10 @@ monitors:
include buffer or cache memory on Linux.
default: true
type: gauge
memory.total:
description: Total bytes of system memory on the system.
default: true
type: gauge

monitorType: memory
properties:
22 changes: 18 additions & 4 deletions selfdescribe.json
Original file line number Diff line number Diff line change
Expand Up @@ -22479,6 +22479,7 @@
"cpu.idle",
"cpu.interrupt",
"cpu.nice",
"cpu.num_processors",
"cpu.softirq",
"cpu.steal",
"cpu.system",
Expand Down Expand Up @@ -22508,6 +22509,12 @@
"group": null,
"default": false
},
"cpu.num_processors": {
"type": "gauge",
"description": "The number of logical processors on the host.",
"group": null,
"default": true
},
"cpu.softirq": {
"type": "cumulative",
"description": "CPU time spent while servicing software interrupts. Unlike a hardware interrupt, a software interrupt happens at the sofware layer. Usually it is a userspace program requesting a service of the kernel. This metric measures how many jiffies were spent by the CPU handling these interrupts. In order to get a percentage this value must be compared against the sum of all CPU states. A sustained high value for this metric may be caused by a programming error which causes a process to unexpectedly request too many services from the kernel.\n",
Expand Down Expand Up @@ -37464,7 +37471,7 @@
"type": "gauge",
"description": "Maximum CPU limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.",
"group": null,
"default": false
"default": true
},
"kubernetes.container_cpu_request": {
"type": "gauge",
Expand All @@ -37488,7 +37495,7 @@
"type": "gauge",
"description": "Maximum memory limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.",
"group": null,
"default": false
"default": true
},
"kubernetes.container_memory_request": {
"type": "gauge",
Expand Down Expand Up @@ -40574,6 +40581,7 @@
"memory.free",
"memory.slab_recl",
"memory.slab_unrecl",
"memory.total",
"memory.used",
"memory.utilization"
]
Expand Down Expand Up @@ -40616,6 +40624,12 @@
"group": null,
"default": true
},
"memory.total": {
"type": "gauge",
"description": "Total bytes of system memory on the system.",
"group": null,
"default": true
},
"memory.used": {
"type": "gauge",
"description": "Bytes of memory in use by the system.",
Expand Down Expand Up @@ -41616,7 +41630,7 @@
"type": "gauge",
"description": "Maximum CPU limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.",
"group": null,
"default": false
"default": true
},
"kubernetes.container_cpu_request": {
"type": "gauge",
Expand All @@ -41640,7 +41654,7 @@
"type": "gauge",
"description": "Maximum memory limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.",
"group": null,
"default": false
"default": true
},
"kubernetes.container_memory_request": {
"type": "gauge",
Expand Down
4 changes: 4 additions & 0 deletions test-services/nginx/nginx-k8s.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,10 @@ spec:
resources:
requests:
cpu: 100m
memory: 100M
limits:
cpu: 200m
memory: 100M
readinessProbe:
httpGet:
path: /nginx_status
Expand Down
Loading

0 comments on commit cbab71a

Please sign in to comment.