Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Update prometheus operator 0.43.2 #1162

Merged
merged 5 commits into from
Nov 26, 2020

Conversation

ipochi
Copy link
Member

@ipochi ipochi commented Nov 4, 2020

This PR updates the Prometheus operator component to the latest upstream chart that is now renamed from prometheus-operator to kube-prometheus-stack.

Release notes:

Steps(1-4) must be performed before updating the cluster i.e before running lokoctl cluster apply.
Steps 5,6 must be performed after updating the cluster ie after running lokoctl cluster apply.

Upgrading this component requires downtime and some steps to be executed manually before upgrading:

  1. Patch the PersistenceVolume created/used by the prometheus-operator chart to Retain claim policy:

    kubectl patch pv -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-prometheus-prometheus-operator-prometheus-0")].metadata.name}')
    
    kubectl patch pv -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-alertmanager-prometheus-operator-alertmanager-0")].metadata.name}')

    Note: To execute the above command, the user must have a cluster wide permission.

  2. Uninstall the prometheus-operator release and delete the existing PersistentVolumeClaim, and verify PV become Released.

    lokoctl component delete prometheus-operator
    kubectl delete pvc/data-prometheus-prometheus-operator-prometheus-0" -n monitoring
    kubectl delete pvc/data-alertmanager-prometheus-operator-alertmanager-0" -n monitoring

    You can choose to remove all your existing CRDs (ServiceMonitors, Podmonitors, etc.) if you want to.

  3. Remove current spec.claimRef values to change the PV's status from Released to Available.

    kubectl patch pv --type json -p='[{"op": "remove", "path": "/spec/claimRef"}]' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-prometheus-prometheus-operator-prometheus-0")].metadata.name}')
    
    kubectl patch pv --type json -p='[{"op": "remove", "path": "/spec/claimRef"}]' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-alertmanager-prometheus-operator-alertmanager-0")].metadata.name}')

    Note: To execute the above command, the user must have a cluster wide permission.

  4. Make sure that the storage_class and prometheus.storage_size are unchanged during the upgrade process.

  5. If monitoring was enabled for rook, contour, metallb components, then the user needs to update the said components as well. Backwards compatibility exists and its not mandatory to update the said components before installing updated prometheus-operator component.

  6. Proceed to a fresh prometheus-operator component installation. The new release should now re-attach your previously released PV with its content.

Closes #1079

@ipochi ipochi requested review from surajssd and invidian November 4, 2020 09:44
@ipochi ipochi force-pushed the imran/update-prometheus-operator-0.43.0 branch from c669822 to ba2b390 Compare November 4, 2020 10:32
Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question, otherwise LGTM.

@ipochi ipochi force-pushed the imran/update-prometheus-operator-0.43.0 branch 4 times, most recently from 7c95444 to 7e68844 Compare November 5, 2020 14:20
@ipochi ipochi requested a review from invidian November 5, 2020 16:40
invidian
invidian previously approved these changes Nov 5, 2020
Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

In 5. of release notes, I would add "... otherwise metrics from those components won't be collected."

@ipochi
Copy link
Member Author

ipochi commented Nov 5, 2020

LGTM

In 5. of release notes, I would add "... otherwise metrics from those components won't be collected."

done.

@ipochi ipochi requested a review from iaguis November 5, 2020 16:50
Copy link
Member

@surajssd surajssd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In step 2 of the PR description:

kubectl delete pvc/<PersistenceVolumeClaim name> -n monitoring

I think we know what the PVCs are called in monitoring namespace we should list them explicitly. If user has their own stuff here we just add a caveat that they handle them on their own.

In step 1 & step 3:

Since we already know the names of PVCs we can extract PV names from those PVCs, user does not have to bother with finding names.

The names we found in step 1 can be used in step 3.

@ipochi
Copy link
Member Author

ipochi commented Nov 9, 2020

In step 2 of the PR description:

kubectl delete pvc/<PersistenceVolumeClaim name> -n monitoring

I think we know what the PVCs are called in monitoring namespace we should list them explicitly. If user has their own stuff here we just add a caveat that they handle them on their own.

In step 1 & step 3:

Since we already know the names of PVCs we can extract PV names from those PVCs, user does not have to bother with finding names.

The names we found in step 1 can be used in step 3.

@surajssd For step 1, would this be legible ?

kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-alertmanager-prometheus-operator-alertmanager-0")].metadata.name}' | xargs kubectl patch pv -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
 kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="ddata-prometheus-prometheus-operator-prometheus-0")].metadata.name}' | xargs kubectl patch pv -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

and similar for step 3.

@surajssd
Copy link
Member

surajssd commented Nov 9, 2020

@ipochi yes for step1 it works fine. But then in step3 we won't have those PVs then above steps won't work. So we might need to save it in an intermediate vars.

@ipochi
Copy link
Member Author

ipochi commented Nov 9, 2020

@ipochi yes for step1 it works fine. But then in step3 we won't have those PVs then above steps won't work. So we might need to save it in an intermediate vars.

@surajssd For step 3, we need to change the name of the PVCs.

 kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-alertmanager-prometheus-operator-kube-p-alertmanager-0")].metadata.name}' | xargs kubectl patch pv --type json -p='[{"op": "remove", "path": "/spec/claimRef"}]'

@invidian
Copy link
Member

invidian commented Nov 9, 2020

Why use xargs instead of $() ?

@ipochi
Copy link
Member Author

ipochi commented Nov 9, 2020

Why use xargs instead of $() ?

That works too. My question is "Is it still legible ?"

@invidian
Copy link
Member

invidian commented Nov 9, 2020

That works too. My question is "Is it still legible ?"

Seems like a common pattern to me, so I don't mind. More common than using xargs IMO.

@ipochi ipochi force-pushed the imran/update-prometheus-operator-0.43.0 branch from 7e68844 to ed46b72 Compare November 9, 2020 10:53
@ipochi
Copy link
Member Author

ipochi commented Nov 9, 2020

@surajssd @invidian Updated release notes with the desired changes.

@ipochi ipochi dismissed surajssd’s stale review November 9, 2020 10:54

Addressed. Please re-review.

@ipochi ipochi requested review from surajssd and invidian November 9, 2020 10:54
@invidian
Copy link
Member

invidian commented Nov 9, 2020

I think changelog can be still simplified by using OR operator to just run one command instead of two.

@ipochi
Copy link
Member Author

ipochi commented Nov 9, 2020

@surajssd @invidian

This step doesnt work if PVC name is used to get the PV.

  1. Remove current spec.claimRef values to change the PV's status from Released to Available.
    kubectl patch pv --type json -p='[{"op": "remove", "path": "/spec/claimRef"}]' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-prometheus-prometheus-operator-kube-p-prometheus-0")].metadata.name}')
    
    kubectl patch pv --type json -p='[{"op": "remove", "path": "/spec/claimRef"}]' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-alertmanager-prometheus-operator-kube-p-alertmanager-0")].metadata.name}')
    

Not sure whats the problem here. Command works, but the status of second PVC changes to Bound state if the first command is applied and vice-versa.

Changing the claimRef with pv name works fine. Same problem with xargs

@invidian
Copy link
Member

invidian commented Nov 9, 2020

Not sure whats the problem here. Command works, but the status of second PVC changes to Bound state if the first command is applied and vice-versa.

I'll have a look.

@invidian
Copy link
Member

invidian commented Nov 9, 2020

We could do the following to perform just one patch operation 😄 But it seems less readable to me:

kubectl patch pv -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' $(for i in data-prometheus-prometheus-operator-prometheus-0 data-alertmanager-prometheus-operator-alertmanager-0; do kubectl get pv -o jsonpath="{.items[?(@.spec.claimRef.name=='$i')].metadata.name}"; echo -n " "; done; echo)

@invidian
Copy link
Member

invidian commented Nov 9, 2020

For sure names in 3. are wrong:

-kubectl patch pv --type json -p='[{"op": "remove", "path": "/spec/claimRef"}]' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-prometheus-prometheus-operator-kube-p-prometheus-0")].metadata.name}')
-kubectl patch pv --type json -p='[{"op": "remove", "path": "/spec/claimRef"}]' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-alertmanager-prometheus-operator-kube-p-alertmanager-0")].metadata.name}')
+kubectl patch pv --type json -p='[{"op": "remove", "path": "/spec/claimRef"}]' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-prometheus-prometheus-operator-prometheus-0")].metadata.name}')
+kubectl patch pv --type json -p='[{"op": "remove", "path": "/spec/claimRef"}]' $(kubectl get pv -o jsonpath='{.items[?(@.spec.claimRef.name=="data-alertmanager-prometheus-operator-alertmanager-0")].metadata.name}')

@ipochi ipochi force-pushed the imran/update-prometheus-operator-0.43.0 branch 2 times, most recently from 0cbc489 to 7645a64 Compare November 24, 2020 09:14
This commit also updates sub-charts : grafana, kube-state-metrics and
node-exporter

New versions:
  * Grafana to 7.2.1
  * Prometheus Operator to v0.43.2
  * kube-state-metrics to v1.9.7
  * node_exporter to v1.0.1

Signed-off-by: Imran Pochi <imran@kinvolk.io>
This commit updates the name of the resources such as Deployments,
StatefulSets, DaemonSet and Pod whereever needed to reflect the new name
in the Helm chart.

Signed-off-by: Imran Pochi <imran@kinvolk.io>
@ipochi
Copy link
Member Author

ipochi commented Nov 24, 2020

@invidian @invidian @iaguis
Build is passing. Feedback recieved on metrics backward compatibility is retained as mentioned here.

Please review.

@ipochi ipochi requested review from johananl and invidian November 24, 2020 13:26
invidian
invidian previously approved these changes Nov 25, 2020
Copy link
Member

@surajssd surajssd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nits otherwise LGTM

@ipochi ipochi force-pushed the imran/update-prometheus-operator-0.43.0 branch 2 times, most recently from 7fb6924 to d788908 Compare November 25, 2020 11:10
@ipochi ipochi requested review from invidian and surajssd November 25, 2020 11:11
surajssd
surajssd previously approved these changes Nov 25, 2020
Copy link
Member

@surajssd surajssd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

invidian
invidian previously approved these changes Nov 25, 2020
Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I added a comment in #1212, that maybe we can structure the commit here, that for the next release we only need to revert this commit without manual changes.

This commit updates labels used by dependent components and etcd for
scraping metrics.

This change is due to the Prometheus operator helm chart is now renamed
to Kube-prometheus-stack and as a result uses a new label to pick up
custom alerts.

old label:
```
app: prometheus-operator
```

new label:
```
app:kube-prometheus-stack
```

Signed-off-by: Imran Pochi <imran@kinvolk.io>
NOTE - This commit is to be removed from the branch once a newer version
of Lokomotive post v0.6.0 is released. This commit is added to ensure
backwards compatibility with older Prometheus-operator component by
undoing the changes made in the previous commit, hence it is a temporary
commit in th branch.

This commit creates the following addtional resources:

- Endpoint and Service objects for etcd.
- PrometheusRule object for Contour.
- PrometheusRule object for Rook.
- PrometheusRule object for Metallb.

New resources created above differ in name and the label only offering
compatibility with updated Prometheus Operator. The old resources offer
backward compatibility, meaning if the the cluster is updated etcd
metrics collection won't be broken and if prometheus-operator component
is updated then the metrics collection of contour, rook and metallb wont
be broken even if those components are updated before updating
prometheus-operator.

Old label:
```
app: prometheus-operator
```

New label:
```
app: kube-prometheus-stack
```

Signed-off-by: Imran Pochi <imran@kinvolk.io>
Signed-off-by: Imran Pochi <imran@kinvolk.io>
@ipochi ipochi dismissed stale reviews from invidian and surajssd via 63fa436 November 25, 2020 15:36
@ipochi ipochi force-pushed the imran/update-prometheus-operator-0.43.0 branch from d788908 to 63fa436 Compare November 25, 2020 15:36
@ipochi
Copy link
Member Author

ipochi commented Nov 25, 2020

LGTM

I added a comment in #1212, that maybe we can structure the commit here, that for the next release we only need to revert this commit without manual changes.

@invidian Just did that :)

Now the commit should be removed from the branch once we are ready to roll out a newer version of Lokomotive > 0.6.0

Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@invidian Just did that :)

Thanks! It's looking good as far as I see 👍

@ipochi ipochi requested a review from surajssd November 25, 2020 15:46
@ipochi ipochi merged commit 2e2e430 into master Nov 26, 2020
@ipochi ipochi deleted the imran/update-prometheus-operator-0.43.0 branch November 26, 2020 08:07
@invidian invidian mentioned this pull request Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
priority/P3 Low priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update components of Lokomotive
5 participants