Add cpu.num_processors, memory.total metric and make k8s limit metric…

…s default (#1344) We will need these for dashboards that use the new kubelet-metrics monitor since the new endpoint that that uses doesn't provide that information and must be gotten from other monitors.
signalfx · Jun 4, 2020 · cbab71a · cbab71a
1 parent ce9f1ea
commit cbab71a
Show file tree

Hide file tree

Showing 17 changed files with 58 additions and 14 deletions.
diff --git a/docs/monitors/cpu.md b/docs/monitors/cpu.md
@@ -57,6 +57,7 @@ Metrics that are categorized as
 
  - `cpu.nice` (*cumulative*)<br>    CPU time spent in userspace running 'nice'-ed processes. In order to get a percentage this value must be compared against the sum of all CPU states. A sustained high value for this metric may be caused by: 1) The server not having enough CPU capacity for a process, 2) A programming error which causes a process to use an unexpected amount of CPU
 
+ - ***`cpu.num_processors`*** (*gauge*)<br>    The number of logical processors on the host.
  - `cpu.softirq` (*cumulative*)<br>    CPU time spent while servicing software interrupts. Unlike a hardware interrupt, a software interrupt happens at the sofware layer. Usually it is a userspace program requesting a service of the kernel. This metric measures how many jiffies were spent by the CPU handling these interrupts. In order to get a percentage this value must be compared against the sum of all CPU states. A sustained high value for this metric may be caused by a programming error which causes a process to unexpectedly request too many services from the kernel.
 
  - `cpu.steal` (*cumulative*)<br>    CPU time spent waiting for a hypervisor to service requests from other virtual machines. This metric is only present on virtual machines. This metric records how much time this virtual machine had to wait to have the hypervisor kernel service a request. In order to get a percentage this value must be compared against the sum of all CPU states. A sustained high value for this metric may be caused by: 1) Another VM on the same hypervisor using too many resources, or 2) An underpowered hypervisor

diff --git a/docs/monitors/kubernetes-cluster.md b/docs/monitors/kubernetes-cluster.md
@@ -76,11 +76,11 @@ Metrics that are categorized as
 (*default*) are ***in bold and italics*** in the list below.
 
 
- - `kubernetes.container_cpu_limit` (*gauge*)<br>    Maximum CPU limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
+ - ***`kubernetes.container_cpu_limit`*** (*gauge*)<br>    Maximum CPU limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
  - `kubernetes.container_cpu_request` (*gauge*)<br>    CPU requested for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
  - `kubernetes.container_ephemeral_storage_limit` (*gauge*)<br>    Maximum ephemeral storage set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available. See https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#local-ephemeral-storage for details.
  - `kubernetes.container_ephemeral_storage_request` (*gauge*)<br>    Ephemeral storage requested for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available. See https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#local-ephemeral-storage for details
- - `kubernetes.container_memory_limit` (*gauge*)<br>    Maximum memory limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
+ - ***`kubernetes.container_memory_limit`*** (*gauge*)<br>    Maximum memory limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
  - `kubernetes.container_memory_request` (*gauge*)<br>    Memory requested for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
  - ***`kubernetes.container_ready`*** (*gauge*)<br>    Whether a container has passed its readiness probe (0 for no, 1 for yes)
  - ***`kubernetes.container_restart_count`*** (*gauge*)<br>    How many times the container has restarted in the recent past.  This value is pulled directly from [the K8s API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.11/#containerstatus-v1-core) and the value can go indefinitely high and be reset to 0 at any time depending on how your [kubelet is configured to prune dead containers](https://kubernetes.io/docs/concepts/cluster-administration/kubelet-garbage-collection/). It is best to not depend too much on the exact value but rather look at it as either `== 0`, in which case you can conclude there were no restarts in the recent past, or `> 0`, in which case you can conclude there were restarts in the recent past, and not try and analyze the value beyond that.

diff --git a/docs/monitors/memory.md b/docs/monitors/memory.md
@@ -53,6 +53,7 @@ Metrics that are categorized as
  - ***`memory.free`*** (*gauge*)<br>    (Linux Only) Bytes of memory available for use.
  - ***`memory.slab_recl`*** (*gauge*)<br>    (Linux Only) Bytes of memory, used for SLAB-allocation of kernel objects, that can be reclaimed.
  - ***`memory.slab_unrecl`*** (*gauge*)<br>    (Linux Only) Bytes of memory, used for SLAB-allocation of kernel objects, that can't be reclaimed.
+ - ***`memory.total`*** (*gauge*)<br>    Total bytes of system memory on the system.
  - ***`memory.used`*** (*gauge*)<br>    Bytes of memory in use by the system.
  - ***`memory.utilization`*** (*gauge*)<br>    Percent of memory in use on this host.  This does NOT include buffer or cache memory on Linux.
 

diff --git a/docs/monitors/openshift-cluster.md b/docs/monitors/openshift-cluster.md
@@ -77,11 +77,11 @@ Metrics that are categorized as
 (*default*) are ***in bold and italics*** in the list below.
 
 
- - `kubernetes.container_cpu_limit` (*gauge*)<br>    Maximum CPU limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
+ - ***`kubernetes.container_cpu_limit`*** (*gauge*)<br>    Maximum CPU limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
  - `kubernetes.container_cpu_request` (*gauge*)<br>    CPU requested for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
  - `kubernetes.container_ephemeral_storage_limit` (*gauge*)<br>    Maximum ephemeral storage set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available. See https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#local-ephemeral-storage for details.
  - `kubernetes.container_ephemeral_storage_request` (*gauge*)<br>    Ephemeral storage requested for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available. See https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#local-ephemeral-storage for details
- - `kubernetes.container_memory_limit` (*gauge*)<br>    Maximum memory limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
+ - ***`kubernetes.container_memory_limit`*** (*gauge*)<br>    Maximum memory limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
  - `kubernetes.container_memory_request` (*gauge*)<br>    Memory requested for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.
  - ***`kubernetes.container_ready`*** (*gauge*)<br>    Whether a container has passed its readiness probe (0 for no, 1 for yes)
  - ***`kubernetes.container_restart_count`*** (*gauge*)<br>    How many times the container has restarted in the recent past.  This value is pulled directly from [the K8s API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.11/#containerstatus-v1-core) and the value can go indefinitely high and be reset to 0 at any time depending on how your [kubelet is configured to prune dead containers](https://kubernetes.io/docs/concepts/cluster-administration/kubelet-garbage-collection/). It is best to not depend too much on the exact value but rather look at it as either `== 0`, in which case you can conclude there were no restarts in the recent past, or `> 0`, in which case you can conclude there were restarts in the recent past, and not try and analyze the value beyond that.

diff --git a/pkg/monitors/cpu/cpu.go b/pkg/monitors/cpu/cpu.go
@@ -143,6 +143,10 @@ func (m *Monitor) generateDatapoints() []*datapoint.Datapoint {
 		))
 	}
 
+	if cpuCount, err := cpu.Counts(true); err == nil {
+		dps = append(dps, sfxclient.Gauge(cpuNumProcessors, nil, int64(cpuCount)))
+	}
+
 	// store current as previous value for next time
 	m.previousTotal = current
 

diff --git a/pkg/monitors/cpu/genmetadata.go b/pkg/monitors/cpu/genmetadata.go
diff --git a/pkg/monitors/cpu/metadata.yaml b/pkg/monitors/cpu/metadata.yaml
@@ -104,5 +104,10 @@ monitors:
       group:
       default: false
       type: cumulative
+
+    cpu.num_processors:
+      description: The number of logical processors on the host.
+      default: true
+      type: gauge
   monitorType: cpu
   properties:
diff --git a/pkg/monitors/kubernetes/cluster/meta/genmetadata.go b/pkg/monitors/kubernetes/cluster/meta/genmetadata.go
diff --git a/pkg/monitors/kubernetes/cluster/metadata.yaml b/pkg/monitors/kubernetes/cluster/metadata.yaml
@@ -26,7 +26,7 @@ common:
       description: Maximum CPU limit set for the container. This value is derived from
         https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which
         comes from the pod spec and is reported only if a non null value is available.
-      default: false
+      default: true
       type: gauge
     kubernetes.container_cpu_request:
       description: CPU requested for the container. This value is derived from
@@ -54,7 +54,7 @@ common:
       description: Maximum memory limit set for the container. This value is derived from
         https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core
         which comes from the pod spec and is reported only if a non null value is available.
-      default: false
+      default: true
       type: gauge
     kubernetes.container_memory_request:
       description: Memory requested for the container. This value is derived from

diff --git a/pkg/monitors/memory/genmetadata.go b/pkg/monitors/memory/genmetadata.go
diff --git a/pkg/monitors/memory/memory_darwin.go b/pkg/monitors/memory/memory_darwin.go
@@ -15,5 +15,6 @@ func (m *Monitor) makeMemoryDatapoints(memInfo *mem.VirtualMemoryStat, dimension
 		datapoint.New("memory.inactive", dimensions, datapoint.NewIntValue(int64(memInfo.Inactive)), datapoint.Gauge, time.Time{}),
 		datapoint.New("memory.wired", dimensions, datapoint.NewIntValue(int64(memInfo.Wired)), datapoint.Gauge, time.Time{}),
 		datapoint.New("memory.free", dimensions, datapoint.NewIntValue(int64(memInfo.Free)), datapoint.Gauge, time.Time{}),
+		datapoint.New("memory.total", dimensions, datapoint.NewIntValue(int64(memInfo.Total)), datapoint.Gauge, time.Time{}),
 	}
 }
diff --git a/pkg/monitors/memory/memory_linux.go b/pkg/monitors/memory/memory_linux.go
@@ -19,5 +19,6 @@ func (m *Monitor) makeMemoryDatapoints(memInfo *mem.VirtualMemoryStat, dimension
 		datapoint.New("memory.slab_recl", dimensions, datapoint.NewIntValue(int64(memInfo.SReclaimable)), datapoint.Gauge, time.Time{}),
 		datapoint.New("memory.slab_unrecl", dimensions, datapoint.NewIntValue(int64(memInfo.Slab-memInfo.SReclaimable)), datapoint.Gauge, time.Time{}),
 		datapoint.New("memory.free", dimensions, datapoint.NewIntValue(int64(memInfo.Free)), datapoint.Gauge, time.Time{}),
+		datapoint.New("memory.total", dimensions, datapoint.NewIntValue(int64(memInfo.Total)), datapoint.Gauge, time.Time{}),
 	}
 }
diff --git a/pkg/monitors/memory/memory_windows.go b/pkg/monitors/memory/memory_windows.go
@@ -12,5 +12,6 @@ func (m *Monitor) makeMemoryDatapoints(memInfo *mem.VirtualMemoryStat, dimension
 		datapoint.New("memory.utilization", dimensions, datapoint.NewFloatValue(memInfo.UsedPercent), datapoint.Gauge, time.Time{}),
 		datapoint.New("memory.used", dimensions, datapoint.NewIntValue(int64(memInfo.Used)), datapoint.Gauge, time.Time{}),
 		datapoint.New("memory.available", dimensions, datapoint.NewIntValue(int64(memInfo.Available)), datapoint.Gauge, time.Time{}),
+		datapoint.New("memory.total", dimensions, datapoint.NewIntValue(int64(memInfo.Total)), datapoint.Gauge, time.Time{}),
 	}
 }
diff --git a/pkg/monitors/memory/metadata.yaml b/pkg/monitors/memory/metadata.yaml
@@ -48,5 +48,10 @@ monitors:
         include buffer or cache memory on Linux.
       default: true
       type: gauge
+    memory.total:
+      description: Total bytes of system memory on the system.
+      default: true
+      type: gauge
+
   monitorType: memory
   properties:
diff --git a/selfdescribe.json b/selfdescribe.json
@@ -22479,6 +22479,7 @@
             "cpu.idle",
             "cpu.interrupt",
             "cpu.nice",
+            "cpu.num_processors",
             "cpu.softirq",
             "cpu.steal",
             "cpu.system",
@@ -22508,6 +22509,12 @@
           "group": null,
           "default": false
         },
+        "cpu.num_processors": {
+          "type": "gauge",
+          "description": "The number of logical processors on the host.",
+          "group": null,
+          "default": true
+        },
         "cpu.softirq": {
           "type": "cumulative",
           "description": "CPU time spent while servicing software interrupts. Unlike a hardware interrupt, a software interrupt happens at the sofware layer. Usually it is a userspace program requesting a service of the kernel. This metric measures how many jiffies were spent by the CPU handling these interrupts. In order to get a percentage this value must be compared against the sum of all CPU states. A sustained high value for this metric may be caused by a programming error which causes a process to unexpectedly request too many services from the kernel.\n",
@@ -37464,7 +37471,7 @@
           "type": "gauge",
           "description": "Maximum CPU limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.",
           "group": null,
-          "default": false
+          "default": true
         },
         "kubernetes.container_cpu_request": {
           "type": "gauge",
@@ -37488,7 +37495,7 @@
           "type": "gauge",
           "description": "Maximum memory limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.",
           "group": null,
-          "default": false
+          "default": true
         },
         "kubernetes.container_memory_request": {
           "type": "gauge",
@@ -40574,6 +40581,7 @@
             "memory.free",
             "memory.slab_recl",
             "memory.slab_unrecl",
+            "memory.total",
             "memory.used",
             "memory.utilization"
           ]
@@ -40616,6 +40624,12 @@
           "group": null,
           "default": true
         },
+        "memory.total": {
+          "type": "gauge",
+          "description": "Total bytes of system memory on the system.",
+          "group": null,
+          "default": true
+        },
         "memory.used": {
           "type": "gauge",
           "description": "Bytes of memory in use by the system.",
@@ -41616,7 +41630,7 @@
           "type": "gauge",
           "description": "Maximum CPU limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.",
           "group": null,
-          "default": false
+          "default": true
         },
         "kubernetes.container_cpu_request": {
           "type": "gauge",
@@ -41640,7 +41654,7 @@
           "type": "gauge",
           "description": "Maximum memory limit set for the container. This value is derived from https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#resourcerequirements-v1-core which comes from the pod spec and is reported only if a non null value is available.",
           "group": null,
-          "default": false
+          "default": true
         },
         "kubernetes.container_memory_request": {
           "type": "gauge",

diff --git a/test-services/nginx/nginx-k8s.yaml b/test-services/nginx/nginx-k8s.yaml
@@ -45,6 +45,10 @@ spec:
           resources:
             requests:
               cpu: 100m
+              memory: 100M
+            limits:
+              cpu: 200m
+              memory: 100M
           readinessProbe:
             httpGet:
               path: /nginx_status