Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Prometheus #4940

Merged
merged 3 commits into from
Sep 29, 2020
Merged

Prometheus #4940

merged 3 commits into from
Sep 29, 2020

Conversation

shaiic-pai
Copy link
Contributor

@shaiic-pai shaiic-pai commented Sep 29, 2020

  • Alert-manager: Kill low-gpu-utilization jobs, tag abnormal jobs
    • add virtual cluster info in job-exporter
    • config monitor rules in prometheus
    • send action request through webhook
    • job-handler: deal with webhook request & redirect to RestServer
    • realize customized SMTP service in alert-handler, send alert email to user when possible, change email template to ejs
    • document how to customize alerts/actions

@coveralls
Copy link

Coverage Status

Coverage remained the same at 34.383% when pulling c573146 on shaiic-pai:prometheus into 9755553 on microsoft:master.

@suiguoxin suiguoxin merged commit cf4e6a8 into microsoft:master Sep 29, 2020
@shaiic-pai shaiic-pai deleted the prometheus branch September 29, 2020 08:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants