Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing items to Prometheus container metrics table #2927

Merged
merged 3 commits into from
Aug 25, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 43 additions & 37 deletions docs/storage/prometheus.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,16 @@ Metric name | Type | Description | Unit (where applicable) | option parameter |
`container_accelerator_memory_total_bytes` | Gauge | Total accelerator memory | bytes | accelerator |
`container_accelerator_memory_used_bytes` | Gauge | Total accelerator memory allocated | bytes | accelerator |
`container_blkio_device_usage_total` | Counter | Blkio device bytes usage | bytes | diskIO |
`container_cpu_cfs_periods_total` | Counter | Number of elapsed enforcement period intervals | | |
`container_cpu_cfs_throttled_periods_total` | Counter | Number of throttled period intervals | | |
`container_cpu_cfs_throttled_seconds_total` | Counter | Total time duration the container has been throttled | seconds | |
`container_cpu_load_average_10s` | Gauge | Value of container cpu load average over the last 10 seconds | | |
`container_cpu_cfs_periods_total` | Counter | Number of elapsed enforcement period intervals | | cpu |
`container_cpu_cfs_throttled_periods_total` | Counter | Number of throttled period intervals | | cpu |
`container_cpu_cfs_throttled_seconds_total` | Counter | Total time duration the container has been throttled | seconds | cpu |
`container_cpu_load_average_10s` | Gauge | Value of container cpu load average over the last 10 seconds | | cpuLoad |
`container_cpu_schedstat_run_periods_total` | Counter | Number of times processes of the cgroup have run on the cpu | | sched |
`container_cpu_schedstat_run_seconds_total` | Counter | Time duration the processes of the container have run on the CPU | seconds | sched |
`container_cpu_schedstat_runqueue_seconds_total` | Counter | Time duration processes of the container have been waiting on a runqueue | seconds | sched |
`container_cpu_system_seconds_total` | Counter | Cumulative system cpu time consumed | seconds | |
`container_cpu_usage_seconds_total` | Counter | Cumulative cpu time consumed | seconds | |
`container_cpu_user_seconds_total` | Counter | Cumulative user cpu time consumed | seconds | |
`container_cpu_schedstat_run_seconds_total` | Counter | Time duration the processes of the container have run on the CPU | seconds | sched |
`container_cpu_system_seconds_total` | Counter | Cumulative system cpu time consumed | seconds | cpu |
`container_cpu_usage_seconds_total` | Counter | Cumulative cpu time consumed | seconds | cpu |
`container_cpu_user_seconds_total` | Counter | Cumulative user cpu time consumed | seconds | cpu |
`container_file_descriptors` | Gauge | Number of open file descriptors for the container | | process |
`container_fs_inodes_free` | Gauge | Number of available Inodes | | disk |
`container_fs_inodes_total` | Gauge | Total number of Inodes | | disk |
Expand All @@ -40,60 +40,66 @@ Metric name | Type | Description | Unit (where applicable) | option parameter |
`container_fs_io_time_weighted_seconds_total` | Counter | Cumulative weighted I/O time | seconds | diskIO |
`container_fs_limit_bytes` | Gauge | Number of bytes that can be consumed by the container on this filesystem | bytes | disk |
`container_fs_reads_bytes_total` | Counter | Cumulative count of bytes read | bytes | diskIO |
`container_fs_reads_total` | Counter | Cumulative count of reads completed | | diskIO |
`container_fs_read_seconds_total` | Counter | Cumulative count of seconds spent reading | | diskIO |
`container_fs_reads_merged_total` | Counter | Cumulative count of reads merged | | diskIO |
`container_fs_reads_total` | Counter | Cumulative count of reads completed | | diskIO |
`container_fs_sector_reads_total` | Counter | Cumulative count of sector reads completed | | diskIO |
`container_fs_sector_writes_total` | Counter | Cumulative count of sector writes completed | | diskIO |
`container_fs_usage_bytes` | Gauge | Number of bytes that are consumed by the container on this filesystem | bytes | disk |
`container_fs_write_seconds_total` | Counter | Cumulative count of seconds spent writing | seconds | diskIO |
`container_fs_writes_bytes_total` | Counter | Cumulative count of bytes written | bytes | diskIO |
`container_fs_write_seconds_total` | Counter | Cumulative count of seconds spent writing | seconds | diskIO |
`container_fs_writes_merged_total` | Counter | Cumulative count of writes merged | | diskIO |
`container_fs_writes_total` | Counter | Cumulative count of writes completed | | diskIO |
`container_hugetlb_failcnt` | Counter | Number of hugepage usage hits limits | | hugetlb |
`container_hugetlb_max_usage_bytes` | Gauge | Maximum hugepage usages recorded | bytes | hugetlb |
`container_hugetlb_usage_bytes` | Gauge | Current hugepage usage | bytes | hugetlb |
`container_last_seen` | Gauge | Last time a container was seen by the exporter | timestamp | |
`container_last_seen` | Gauge | Last time a container was seen by the exporter | timestamp | - |
`container_llc_occupancy_bytes` | Gauge | Last level cache usage statistics for container counted with RDT Memory Bandwidth Monitoring (MBM). | bytes | resctrl |
`container_memory_bandwidth_bytes` | Gauge | Total memory bandwidth usage statistics for container counted with RDT Memory Bandwidth Monitoring (MBM). | bytes | resctrl |
`container_memory_bandwidth_local_bytes` | Gauge | Local memory bandwidth usage statistics for container counted with RDT Memory Bandwidth Monitoring (MBM). | bytes | resctrl |
`container_memory_cache` | Gauge | Total page cache memory | bytes | |
`container_memory_failcnt` | Counter | Number of memory usage hits limits | | |
`container_memory_failures_total` | Counter | Cumulative count of memory allocation failures | | |
`container_memory_numa_pages` | Gauge | Number of used pages per NUMA node | | memory_numa |
`container_memory_max_usage_bytes` | Gauge | Maximum memory usage recorded | bytes | |
`container_memory_rss` | Gauge | Size of RSS | bytes | |
`container_memory_swap` | Gauge | Container swap usage | bytes | |
`container_memory_mapped_file` | Gauge | Size of memory mapped files | bytes | |
`container_memory_cache` | Gauge | Total page cache memory | bytes | memory |
`container_memory_failcnt` | Counter | Number of memory usage hits limits | | memory |
`container_memory_failures_total` | Counter | Cumulative count of memory allocation failures | | memory |
`container_memory_mapped_file` | Gauge | Size of memory mapped files | bytes | memory |
`container_memory_max_usage_bytes` | Gauge | Maximum memory usage recorded | bytes | memory |
`container_memory_migrate` | Gauge | Memory migrate status | | cpuset |
`container_memory_usage_bytes` | Gauge | Current memory usage, including all memory regardless of when it was accessed | bytes | |
`container_memory_working_set_bytes` | Gauge | Current working set | bytes | |
`container_memory_numa_pages` | Gauge | Number of used pages per NUMA node | | memory_numa |
`container_memory_rss` | Gauge | Size of RSS | bytes | memory |
`container_memory_swap` | Gauge | Container swap usage | bytes | memory |
`container_memory_usage_bytes` | Gauge | Current memory usage, including all memory regardless of when it was accessed | bytes | memory |
`container_memory_working_set_bytes` | Gauge | Current working set | bytes | memory |
`container_network_advance_tcp_stats_total` | Gauge | advanced tcp connections statistic for container | | advtcp |
`container_network_receive_bytes_total` | Counter | Cumulative count of bytes received | bytes | network |
`container_network_receive_errors_total` | Counter | Cumulative count of errors encountered while receiving | | network |
`container_network_receive_packets_dropped_total` | Counter | Cumulative count of packets dropped while receiving | | network |
`container_network_receive_packets_total` | Counter | Cumulative count of packets received | | network |
`container_network_receive_errors_total` | Counter | Cumulative count of errors encountered while receiving | | network |
`container_network_tcp6_usage_total` | Gauge | tcp6 connection usage statistic for container | | tcp |
`container_network_tcp_usage_total` | Gauge | tcp connection usage statistic for container | | tcp |
`container_network_transmit_bytes_total` | Counter | Cumulative count of bytes transmitted | bytes | network |
`container_network_transmit_packets_total` | Counter | Cumulative count of packets transmitted | | network |
`container_network_transmit_packets_dropped_total` | Counter | Cumulative count of packets dropped while transmitting | | network |
`container_network_transmit_errors_total` | Counter | Cumulative count of errors encountered while transmitting | | network |
`container_network_tcp_usage_total` | Gauge | tcp connection usage statistic for container | | tcp |
`container_network_tcp6_usage_total` | Gauge | tcp6 connection usage statistic for container | | tcp |
`container_network_udp_usage_total` | Gauge | udp connection usage statistic for container | | udp |
`container_network_transmit_packets_dropped_total` | Counter | Cumulative count of packets dropped while transmitting | | network |
`container_network_transmit_packets_total` | Counter | Cumulative count of packets transmitted | | network |
`container_network_udp6_usage_total` | Gauge | udp6 connection usage statistic for container | | udp |
`container_perf_events_total` | Counter | Scaled counter of perf core event (event can be identified by `event` label and `cpu` indicates the core for which event was measured). See [perf event configuration](../runtime_options.md#perf-events). | | | libpfm
`container_perf_metric_scaling_ratio` | Gauge | Scaling ratio for perf event counter (event can be identified by `event` label and `cpu` indicates the core for which event was measured). See [perf event configuration](../runtime_options.md#perf-events). | | | libpfm
`container_network_udp_usage_total` | Gauge | udp connection usage statistic for container | | udp |
`container_oom_events_total` | Counter | Count of out of memory events observed for the container | | oom_event |
`container_perf_events_scaling_ratio` | Gauge | Scaling ratio for perf event counter (event can be identified by `event` label and `cpu` indicates the core for which event was measured). See [perf event configuration](../runtime_options.md#perf-events). | | perf_event | libpfm
`container_perf_events_total` | Counter | Scaled counter of perf core event (event can be identified by `event` label and `cpu` indicates the core for which event was measured). See [perf event configuration](../runtime_options.md#perf-events). | | perf_event | libpfm
`container_perf_uncore_events_scaling_ratio` | Gauge | Scaling ratio for perf uncore event counter (event can be identified by `event` label, `pmu` and `socket` lables indicate the PMU and the CPU socket for which event was measured). See [perf event configuration](../runtime_options.md#perf-events). Metric exists only for main cgroup (id="/"). | | perf_event | libpfm
`container_perf_uncore_events_total` | Counter | Scaled counter of perf uncore event (event can be identified by `event` label, `pmu` and `socket` lables indicate the PMU and the CPU socket for which event was measured). See [perf event configuration](../runtime_options.md#perf-events)). Metric exists only for main cgroup (id="/").| | perf_event | libpfm
`container_processes` | Gauge | Number of processes running inside the container | | process |
`container_referenced_bytes` | Gauge | Container referenced bytes during last measurements cycle based on Referenced field in /proc/smaps file, with /proc/PIDs/clear_refs set to 1 after defined number of cycles configured through `referenced_reset_interval` cAdvisor parameter.</br>Warning: this is intrusive collection because can influence kernel page reclaim policy and add latency. Refer to https://github.com/brendangregg/wss#wsspl-referenced-page-flag for more details. | bytes | referenced_memory |
`container_spec_cpu_period` | Gauge | CPU period of the container | | |
`container_spec_cpu_quota` | Gauge | CPU quota of the container | | |
`container_spec_cpu_shares` | Gauge | CPU share of the container | | |
`container_spec_memory_limit_bytes` | Gauge | Memory limit for the container | bytes | |
`container_spec_memory_swap_limit_bytes` | Gauge | Memory swap limit for the container | bytes | |
`container_sockets` | Gauge | Number of open sockets for the container | | process |
`container_spec_cpu_period` | Gauge | CPU period of the container | | - |
`container_spec_cpu_quota` | Gauge | CPU quota of the container | | - |
`container_spec_cpu_shares` | Gauge | CPU share of the container | | - |
`container_spec_memory_limit_bytes` | Gauge | Memory limit for the container | bytes | - |
`container_spec_memory_reservation_limit_bytes` | Gauge | Memory reservation limit for the container | bytes | |
`container_spec_memory_swap_limit_bytes` | Gauge | Memory swap limit for the container | bytes | |
`container_start_time_seconds` | Gauge | Start time of the container since unix epoch | seconds | |
`container_tasks_state` | Gauge | Number of tasks in given state (`sleeping`, `running`, `stopped`, `uninterruptible`, or `ioawaiting`) | | |
`container_perf_uncore_events_total` | Counter | Scaled counter of perf uncore event (event can be identified by `event` label, `pmu` and `socket` lables indicate the PMU and the CPU socket for which event was measured). See [perf event configuration](../runtime_options.md#perf-events)). Metric exists only for main cgroup (id="/").| | | libpfm
`container_perf_uncore_events_scaling_ratio` | Gauge | Scaling ratio for perf uncore event counter (event can be identified by `event` label, `pmu` and `socket` lables indicate the PMU and the CPU socket for which event was measured). See [perf event configuration](../runtime_options.md#perf-events). Metric exists only for main cgroup (id="/"). | | | libpfm
`container_tasks_state` | Gauge | Number of tasks in given state (`sleeping`, `running`, `stopped`, `uninterruptible`, or `ioawaiting`) | | cpuLoad |
`container_threads` | Gauge | Number of threads running inside the container | | process |
`container_threads_max` | Gauge | Maximum number of threads allowed inside the container | | process |
`container_ulimits_soft` | Gauge | Soft ulimit values for the container root process. Unlimited if -1, except priority and nice | | process |

## Prometheus hardware metrics

Expand Down