Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cos_agent lib with generic HostHealth rules #232

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

MichaelThamm
Copy link
Contributor

@MichaelThamm MichaelThamm commented Jan 13, 2025

Issue

Currently, the grafana-agent host health rules are hard-coded in a .rule file. Once the tandem PR is merged, the UX will differ between vm and k8s charms.

Solution

Match the same UX of k8s charms by injecting the alert rules on the fly in the cos_agent Provider. Remove the host_health .rule file to avoid collisions and dedupe conflicts.

Context

In tandem with:

Testing Instructions

cos_agent

In a microk8s controller:

  1. curl -L https://raw.githubusercontent.com/canonical/cos-lite-bundle/main/overlays/offers-overlay.yaml -O juju deploy cos-lite --trust --overlay ./offers-overlay.yaml

In a lxd controller:

  1. Deploy the bundle
default-base: ubuntu@22.04/stable
saas:
  prometheus-receive-remote-write:
    url: microk8s:admin/prom.prometheus-receive-remote-write
applications:
  gagent:
    charm: local:grafana-agent-0
    trust: true
  zoo:
    charm: local:zookeeper-0
    num_units: 2
    to:
    - "1"
    - "2"
    constraints: arch=amd64
    storage:
      data: rootfs,1,1024M
    trust: true
machines:
  "1":
    constraints: arch=amd64
  "2":
    constraints: arch=amd64
relations:
- - prometheus-receive-remote-write:receive-remote-write
  - gagent:send-remote-write
- - gagent:cos-agent
  - zoo:cos-agent
  1. Copy the cos_agent.py lib changes into zookeeper (since it is the provider side of the relation)
  2. juju ssh zookeeper/0 "sudo snap stop charmed-zookeeper"
    1. wait for HostDown alert to fire
    2. start the zookeeper snap again
  3. juju ssh zookeeper/0 "sudo snap stop grafana-agent"
    1. wait for HostMetricsMissing alert to fire
    2. start the grafana-agent snap again

cos-proxy

  • Complete the manual tests for cos-proxy and ensure all tests are satisfied
    • Pack the cos-proxy with the new lib
    • systemctl stop vector or kill PID

Upgrade Notes

By fetching the new libs you would get a set of new alerts automatically. If charms already had up/absent alerts, this will result in duplication of alerts and rules.

  • up/absent alerts are ubiquitous and are handled by the libs modified in this PR. Any custom alerts duplicating this behaviour can be removed.

* Inject generic alert rules via cos_agent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant