Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial terraform manifests for monitoring #1877

Closed
wants to merge 2 commits into from

Conversation

rikatz
Copy link
Contributor

@rikatz rikatz commented Apr 6, 2021

Add initial manifest for gitops monitoring.

Some points of attention for this PR:

  • This is not repeatable. Although I think adding too much modules for terraform can turn things more complicated, maybe we can define how and what we monitor in URLs. Assuming we always use SSL/TLS we could create a module and have this simpler in a future, like:
module "thockin_test1_monitoring" {
  source = "../../../modules/monitoring"
  project_name       = "kubernetes-public"
  url       = "thockin-test1.k8s.io"
  cert_expiration_threshold = 15
  // Channel needs to be manually created in GCP interface with the following display name
  channel_notification = "Kubernetes.io Cert Alert"
}
  • We should probably want to define the structure for this. I'm proposing inside infra/monitoring just because got no better idea :D
  • We can add more parameters, and more notification channel/options (like, we can add a list of emails that can turn into Email Notification Channel on GCP)
  • I'm using all the GCP locations, but getting only metrics from usa-iowa as valid to cert_expiration. This can be changed later as well, depending on what we want to do (like, properly monitoring and alerting in case of some downtime/increased latency)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 6, 2021
@k8s-ci-robot k8s-ci-robot requested review from dims and nikhita April 6, 2021 19:32
@k8s-ci-robot k8s-ci-robot added wg/k8s-infra size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 6, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rikatz
To complete the pull request process, please assign spiffxp after the PR has been reviewed.
You can assign the PR to them by writing /assign @spiffxp in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

*/

terraform {
required_version = "~> 0.14.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still use the 0.13 version

// Manual step: Create a StackDriver alert channel pointing to a channel in Slack
// It will select the channel here by its display name
data "google_monitoring_notification_channel" "alertchannel" {
display_name = "Kubernetes.io Cert Alert"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the existing channel : #k8s-infra-alerts. Can you mention the type ?

@rikatz
Copy link
Contributor Author

rikatz commented Apr 7, 2021

@ameukam thanks!

I've changed the channel searcher to use the channel name (label) and the type slack, and did changed the terraform to >= 0.13.0 :)

@rikatz
Copy link
Contributor Author

rikatz commented May 12, 2021

@ameukam anything else on this PR? I wanted to check if monitoring is working :)

terraform {

backend "gcs" {
bucket = "k8s-infra-clusters-terraform"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rikatz the terraform states are now in different buckets : #1952
Feel free to add a new one.

required_providers {
google = {
source = "hashicorp/google"
version = "~> 3.63.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
version = "~> 3.63.0"
version = "~> 3.66.0"

*/

terraform {
required_version = ">= 0.13.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
required_version = ">= 0.13.0"
required_version = ">= 0.14.0"

See: #2019
Sorry for make you revert it. :-D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need a rebase / update I think now, should be:

Suggested change
required_version = ">= 0.13.0"
required_version = "~> 1.0.0"

@ameukam
Copy link
Member

ameukam commented May 12, 2021

@rikatz I dropped some comments.

@spiffxp
Copy link
Member

spiffxp commented May 19, 2021

I have a meta-level nit which is that right now I think @ameukam has preferred to keep all of our terraform stuff in infra/gcp/clusters

I feel like we need to chat a bit about what we want organization of files to look like, but until then I think having them all homed under one directory (even if its name is misleading) makes more sense than allowing sprawl

So move to something like infra/gcp/clusters/projects/kubernetes-public/monitoring

@spiffxp
Copy link
Member

spiffxp commented May 19, 2021

I'm ok with merge and iterate if we immediately follow up with that move

@ameukam
Copy link
Member

ameukam commented Jun 15, 2021

@rikatz What do you think about closing this PR now the domains are gone : #1838 ?

@rikatz
Copy link
Contributor Author

rikatz commented Jun 15, 2021

Let's do this :)
/close

@k8s-ci-robot
Copy link
Contributor

@rikatz: Closed this PR.

In response to this:

Let's do this :)
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants