diff --git a/enhancements/windows-containers/monitoring-windows-nodes.md b/enhancements/windows-containers/monitoring-windows-nodes.md new file mode 100644 index 00000000000..410bc6aad06 --- /dev/null +++ b/enhancements/windows-containers/monitoring-windows-nodes.md @@ -0,0 +1,239 @@ +--- +title: monitoring-windows-nodes +authors: + - "@VaishnaviHire" + - "@PratikMahajan" +reviewers: + - "@@openshift/openshift-team-windows-containers" + - "@simonpasquier" + - "@spadgett" +approvers: + - "@aravindhp" + - "@simonpasquier" +creation-date: 2021-02-08 +last-updated: 2021-03-04 +status: implementable +--- + +# Monitoring Windows Nodes + +## Release Signoff Checklist + +- [x] Enhancement is `implementable` +- [x] Design details are appropriately documented from clear requirements +- [x] Test plan is defined +- [ ] Operational readiness criteria is defined +- [x] Graduation criteria for dev preview, tech preview, GA +- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) + +## Summary + +The intent of this enhancement is to enable performance monitoring on Windows +nodes created by Windows Machine Config Operator(WMCO) in OpenShift cluster. + +## Motivation + +Monitoring is critical to identify issues with nodes, containers running on the +nodes. The main motivation behind this enhancement is to enable monitoring on +the Windows nodes. + +### Goals + +As part of this enhancement, we plan to do the following: +* Run [windows_exporter](https://github.com/prometheus-community/windows_exporter) + as a service on Windows nodes +* Upgrade the windows_exporter on the Windows Nodes +* Leverage cluster-monitoring operator that sets up Prometheus, Alertmanager + and other components + +### Non-Goals + +As part of this enhancement, we do not plan to do the following: +* Integrating windows_exporter with cluster monitoring operator +* Ship Grafana dashboards for Windows Nodes + +## Proposal + +The main idea here is to run windows_exporter as a Windows Service and let +Prometheus instance which was provisioned as part of OpenShift install to +collect data from windows_exporter. The metrics exposed by the windows_exporter +will be used to display console graphs for Windows nodes. + +## Justification + +Unlike [Node exporter](https://github.com/prometheus/node_exporter) on Linux +nodes, windows_exporter cannot run as a container on the Windows nodes since +Windows container images contains a Windows Kernel and Red Hat has a policy not +to ship third party kernels for support reasons. Please refer to the [WMCO + enhancement](https://github.com/openshift/enhancements/blob/master/enhancements/windows-containers/windows-machine-config-operator.md#justification) +for more details. + +### Risks and Mitigations + +* Running `windows_exporter` as a Windows Service, posses a risk of having + inadequate resources to run the service if the Windows node is overwhelmed + with workload containers. This can be mitigated by leveraging [priority + classes](https://docs.microsoft.com/en-us/windows/win32/procthread/scheduling-priorities) for + Windows processes. This is similar to what is being done for other [Windows + services](https://issues.redhat.com/browse/WINC-534). + +* One of the risks with the current approach is renaming Windows metrics to + display pod graphs. The pod metrics for Linux come from cAdvisor. However, we + do not get same metrics from cAdvisor for Windows nodes. This becomes a + hindrance to display pod graphs by creating custom recording rules to use same + console queries as Linux workloads. To mitigate this, use metrics exposed by + the windows_exporter to display pod graphs as mentioned in the [Future + Plans](#future-plans) is required. This also requires changes in console + queries that support OS specific metrics. + +## Design Details + +As we are not able to run windows_exporter as a [container](#justification) +on the Windows Node, to capture data from windows_exporter, WMCO creates a +`windows-machine-config-operator-metrics` Service without selectors and +manually defines Endpoints object for that service. The Endpoints object has +entries for the endpoints `:9182/metrics`, exposed by +windows_exporter for every Windows node. Once the Service and Endpoints +object is created, WMCO ensures that a Service Monitor for `windows-machine-config-operator-metrics` +Service is running so that the Prometheus operator can discover the targets +created above to scrape Windows metrics. Following design details reflect the +current approach and future plans to enable monitoring support for Windows. + +### Current State + +To enable basic monitoring support for Windows node, WMCO has done the +following: + +* Build and add windows_exporter binary to WMCO payload. +* Install windows_exporter on the Windows nodes and ensuring + that it runs as a Windows service. +* Add `openshift.io/cluster-monitoring=true` label to the + `openshift-windows-machine-config-operator` namespace so that cluster + monitoring stack will pick up the Service Monitor created by WMCO. +* Add privileges to WMCO to create Services, Endpoints, Service Monitor in + the `openshift-windows-machine-config-operator` namespace. +* Create a Service and Endpoints object in `openshift-windows-machine-config +-operator` namespace that point to windows_exporter endpoint. WMCO uses default + values to define metrics endpoint, `:9182/metrics`, + exposed by windows_exporter for every Windows node. The Endpoints object + created in the namespace consist of subsets of endpoints from all the + Windows nodes. +* Create a Service Monitor in `openshift-windows-machine-config-operator` + namespace for Service created above. + +To display node graphs WMCO has done the following: + +* Add custom Prometheus rules in `openshift-windows-machine-config-operator` + namespace. The custom recording rules are created using Windows metrics + exposed by the windows_exporter and have the same names as Linux + recording rules. This is to make use of same console queries as Linux. +* Note that WMCO is unable to display pod graphs for the Windows Nodes + with the current implementation. See [Risks and Mitigations](#risks-and-mitigations) + for details. + +### Future Plans + +#### Displaying Console Graphs + +* As we move forward, our plan to display monitoring graphs is to create a + [common interface](https://issues.redhat.com/browse/WINC-530) for Windows + and Linux recording rules. Monitoring team will define recording rules for the + metrics that have different `metric labels` for Linux and Windows. The + differences in `metric labels` for metrics used for Node graphs and pod graphs + are displayed in the tables below. + The Windows team will align the Windows recording rules with these new + recording rules. The recording rules for Windows will be managed by + WMCO. This set of common recording rules for monitoring will return results + for both Linux and Windows nodes for a single query.The console queries + currently use some raw metrics such as `node_filesystem_size_bytes`, + `node_filesystem_free_bytes` etc. They would need to be updated to include + the new recording rules in place of using raw metrics. This will ensure that + we have a consistent user experience for monitoring across Linux and Windows. +* In the cases where `metric labels` are equivalent, we plan to relabel the + Windows metrics to align with the Linux metrics. + +**Node Metrics :** + +| Node Exporter | Windows Exporter | Label Difference | +|--------------------------------|----------------------------------|--------------------------------------------------------------------------| +| node_memory_MemTotal_bytes | windows_cs_physical_memory_bytes | - | +| node_memory_MemAvailable_bytes | windows_memory_available_bytes | - | +| node_filesystem_size_bytes | windows_logical_disk_size_bytes | Missing Labels: (device, mountpoint, fstype) Additional label : (volume) | +| node_filesystem_free_bytes | windows_logical_disk_free_bytes | Missing Label: device, mountpoint, fstype) Additional label : (volume) | +| node_cpu_seconds_total | windows_cpu_time_total | Missing Label : cpu Additional Label: core | + +**Pod Metrics:** + +| Kubelet metrics | Windows Kubelet | Windows Exporter | Label Difference | +|----------------------------------------|----------------------|----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------| +| kubelet_running_pods | kubelet_running_pods | windows_container_available | - | +| container_memory_working_set_bytes | - | windows_container_memory_usage_private_working_set_bytes | Missing Label: (image) Additional Label: (container_id) which is equivalent of (id) for Linux | +| container_cpu_usage_seconds_total | - | windows_container_cpu_usage_seconds_total | Missing Label: (image, metrics_path) Additional Label: (container_id) which is equivalent of (id) for Linux | +| container_fs_usage_bytes | - | - | | +| container_network_receive_bytes_total | - | windows_container_network_receive_bytes_total | Missing Label: (image, metrics_path) Additional Label: (container_id) which is equivalent of (id) for Linux | +| container_network_transmit_bytes_total | - | windows_container_network_transmit_bytes_total | Missing Label: (image, metrics_path) Additional Label: (container_id) which is equivalent of (id) for Linux | + +#### Moving towards EndpointSlices + +* Since the metrics Endpoints object is managed by WMCO, we plan to replace + Endpoints object with [EndpointSlices](https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/#motivation) + to improve performance. This can be done once the `prometheus-operator` has + [support](https://github.com/prometheus-operator/prometheus-operator/issues/3862) + for EndpointSlices object. + +#### Securing windows_exporter endpoint + +* Since the windows-exporter is not running as a [pod](#justification), the + endpoint is not secure. The reason for this is when running inside a pod, we + can use CA signer for providing TLS cert/key to the service for + authentication. We plan to leverage windows_exporter's support for `https` + configuration. WMCO will be responsible for adding [web config](https://github.com/prometheus/exporter-toolkit/blob/master/docs/web-configuration.md) + for TLS. This will ensure that the metrics Endpoint will be able to + authenticate the requests. + +#### Telemetry Rules + +* We plan to ensure that for [telemetry rules](https://docs.openshift.com/container-platform/4.7/support/remote_health_monitoring/showing-data-collected-by-remote-health-monitoring.html#showing-data-collected-from-the-cluster_showing-data-collected-by-remote-health-monitoring) + also use metrics from Windows. This can be done by renaming the Windows + metrics to align with metrics used in telemetry rules. For e.g. + `memory_usage_bytes:sum` rule uses `node_memory_MemTotal_bytes` that is + defined in the Windows rules. We also need to test if the existing telemetry + rules need to be updated similar to console queries, if they have Linux + specific queries. For e.g rules with `job=node-exporter` attribute. + +### Test Plan + +The current tests ensure that WMCO checks if : +* The operator namespace, `openshift-windows-machine-config-operator`, uses + `openshift.io/cluster-monitoring=true` label. +* Service, endpoints and Service Monitor objects are created as expected. +* Prometheus is able to collect data from windows_exporter. +* Custom Prometheus rules return Windows data. + +The test plan for [future implementation](#future-plans) +will use existing tests to test creation of windows_exporter service and +metrics Service, Endpoints and Service Monitor objects. WMCO will also be +responsible for testing Prometheus rules created for Windows. We also +plan to add tests in console repo, that test the common recording rules and +ensure that they return results for Windows. + +### Graduation Criteria + +This enhancement will start as GA + +### Upgrade / Downgrade Strategy + +* WMCO is responsible for upgrading [windows_exporter](https://github.com/prometheus-community/windows_exporter/tags) + binary to the latest release. Downgrades are [not supported](https://github.com/operator-framework/operator-lifecycle-manager/issues/1177) + by OLM. + +## Implementation History + +v1: Initial Proposal + +### Drawbacks + +Running windows_exporter as a Windows service instead of running as a DaemonSet +pod makes it hard for the Prometheus to monitor Windows nodes. The +limitation of not able to run windows_exporter on Windows nodes as a pod is +because of support reasons as mentioned in the [WMCO_enhancement](https://github.com/openshift/enhancements/blob/master/enhancements/windows-containers/windows-machine-config-operator.md#justification). \ No newline at end of file