Add prepare-shutdown endpoint to ingesters for down scaling #4718

56quarters · 2023-04-12T20:55:01Z

What this PR does

This change adds a new HTTP endpoint to ingesters that changes their in-memory configuration (with an on-disk backup) such that they will:

Unregister from the ring when they stop
Flush all in-memory data to object storage when they stop

This differs from the shutdown endpoint because it does not actually stop the ingesters, just modifies their configuration in preparation for being permanently stopped. This is a requirement of using the rollout-operator for gracefully scaling down ingesters in Kubernetes.

Which issue(s) this PR fixes or relates to

See grafana/rollout-operator#47

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

pkg/ingester/shutdown_marker.go

This change adds a new HTTP endpoint to ingesters that changes their in-memory configuration (with an on-disk backup) such that they will: * Unregister from the ring when they stop * Flush all in-memory data to object storage when they stop This differs from the shutdown endpoint because it does not actually stop the ingesters, just modifies their configuration in preparation for being permanently stopped. This is a requirement of using the rollout-operator for gracefully scaling down ingesters in Kubernetes. See grafana/rollout-operator#47 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

pkg/ingester/ingester.go

pkg/ingester/ingester_test.go

colega

I think ShutdownMarker should export a metric, so we can add an alert if for some reason there's an ingester continuously running with a shutdown marker (because as I understand, the marker is only removed when we remove the PVC, so if the sts isn't shut down after all (because we upscaled quickly?) it will continue shutting down between restarts.

Which makes me think: do we need a revert mechanism?

docs/sources/mimir/references/http-api/index.md

pkg/ingester/ingester.go

56quarters · 2023-04-13T13:25:39Z

I think ShutdownMarker should export a metric, so we can add an alert if for some reason there's an ingester continuously running with a shutdown marker (because as I understand, the marker is only removed when we remove the PVC, so if the sts isn't shut down after all (because we upscaled quickly?) it will continue shutting down between restarts.

Which makes me think: do we need a revert mechanism?

I'll add a metric. We could build a "revert" mechanism but we'd have to add that to the rollout operator as well. Let's avoid that until we need it.

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

56quarters · 2023-04-13T15:32:58Z

I think ShutdownMarker should export a metric, so we can add an alert if for some reason there's an ingester continuously running with a shutdown marker (because as I understand, the marker is only removed when we remove the PVC, so if the sts isn't shut down after all (because we upscaled quickly?) it will continue shutting down between restarts.

Which makes me think: do we need a revert mechanism?

~~I'll plan on doing this as a follow-up to the open rollout-operator PR and in Mimir. I'd rather get something out to test before handling reverting of scale downs.~~

Done while adding support for GET (added DELETE).

docs/sources/mimir/references/http-api/index.md

pkg/ingester/ingester_test.go

pstibrany

Thank you. I think the direction here is fine, but I'd like to see the PR simplified before merging.

pkg/ingester/shutdown_marker.go

pkg/ingester/metrics.go

pkg/ingester/ingester.go

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

pstibrany · 2023-04-14T16:59:20Z

docs/sources/mimir/references/http-api/index.md

+### Prepare for Shutdown
+
+```
+GET,POST /ingester/prepare-shutdown


Let's mention DELETE too and explain what different methods do on this endpoint.

pstibrany

Thank you for addressing my feedback!

pkg/ingester/ingester.go

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

Forgot to update the list of verbs supported in #4718 after review feedback. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

pracucci

Nice work! 👏

**What this PR does / why we need it**: This updates the PrepareShutdown method so it supports GET and DELETE methods as well. This makes it similar to Mimir: grafana/mimir#4718. The status is now stored in a local file. A new config setting had to be added for this file as there is no obvious place to store it. **Checklist** - [X] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [X] Documentation added - [X] Tests updated - [x] `CHANGELOG.md` updated - [x] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` --------- Signed-off-by: Michel Hollands <michel.hollands@grafana.com> Co-authored-by: Dylan Guedes <djmgguedes@gmail.com>

56quarters commented Apr 12, 2023

View reviewed changes

pkg/ingester/shutdown_marker.go Outdated Show resolved Hide resolved

56quarters force-pushed the 56quarters/prepare-shutdown branch from 397201e to 1909070 Compare April 12, 2023 22:52

56quarters force-pushed the 56quarters/prepare-shutdown branch from 1909070 to 67337cc Compare April 12, 2023 22:53

56quarters marked this pull request as ready for review April 12, 2023 22:58

56quarters requested review from a team as code owners April 12, 2023 22:58

56quarters requested review from aknuds1 and colega April 12, 2023 22:59

charleskorn reviewed Apr 13, 2023

View reviewed changes

pkg/ingester/ingester.go Outdated Show resolved Hide resolved

pkg/ingester/ingester_test.go Show resolved Hide resolved

colega reviewed Apr 13, 2023

View reviewed changes

docs/sources/mimir/references/http-api/index.md Show resolved Hide resolved

pkg/ingester/ingester.go Outdated Show resolved Hide resolved

Code review changes

cc02983

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

56quarters requested review from charleskorn and colega April 13, 2023 15:39

charleskorn reviewed Apr 14, 2023

View reviewed changes

docs/sources/mimir/references/http-api/index.md Outdated Show resolved Hide resolved

pkg/ingester/ingester_test.go Outdated Show resolved Hide resolved

pstibrany reviewed Apr 14, 2023

View reviewed changes

56quarters added 2 commits April 14, 2023 12:12

Code review changes

d0fe669

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

Fix import typo

234308b

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

56quarters requested review from pstibrany and charleskorn April 14, 2023 16:39

pstibrany reviewed Apr 14, 2023

View reviewed changes

pstibrany approved these changes Apr 14, 2023

View reviewed changes

pkg/ingester/ingester.go Outdated Show resolved Hide resolved

Update documentation

c09c7fc

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

56quarters merged commit 804c1dc into main Apr 14, 2023

56quarters deleted the 56quarters/prepare-shutdown branch April 14, 2023 20:40

56quarters restored the 56quarters/prepare-shutdown branch April 17, 2023 14:49

56quarters added a commit that referenced this pull request Apr 17, 2023

Add DELETE verb to prepare-shutdown docs

e7f6f37

Forgot to update the list of verbs supported in #4718 after review feedback. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

56quarters deleted the 56quarters/prepare-shutdown branch April 17, 2023 14:49

56quarters mentioned this pull request Apr 17, 2023

Add DELETE verb to prepare-shutdown docs #4755

Merged

56quarters added a commit that referenced this pull request Apr 17, 2023

Add DELETE verb to prepare-shutdown docs (#4755)

b5519ef

Forgot to update the list of verbs supported in #4718 after review feedback. Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

pracucci reviewed Apr 17, 2023

View reviewed changes

MichelHollands mentioned this pull request Apr 18, 2023

Update prepare shutdown grafana/loki#9175

Merged

5 tasks

duricanikolic mentioned this pull request May 9, 2023

Introducing prepare-shutdown endpoint for store-gateway #4955

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add prepare-shutdown endpoint to ingesters for down scaling #4718

Add prepare-shutdown endpoint to ingesters for down scaling #4718

56quarters commented Apr 12, 2023

colega left a comment

56quarters commented Apr 13, 2023

56quarters commented Apr 13, 2023 •

edited

Loading

pstibrany left a comment

pstibrany Apr 14, 2023

pstibrany left a comment

pracucci left a comment

Add prepare-shutdown endpoint to ingesters for down scaling #4718

Add prepare-shutdown endpoint to ingesters for down scaling #4718

Conversation

56quarters commented Apr 12, 2023

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

colega left a comment

Choose a reason for hiding this comment

56quarters commented Apr 13, 2023

56quarters commented Apr 13, 2023 • edited Loading

pstibrany left a comment

Choose a reason for hiding this comment

pstibrany Apr 14, 2023

Choose a reason for hiding this comment

pstibrany left a comment

Choose a reason for hiding this comment

pracucci left a comment

Choose a reason for hiding this comment

56quarters commented Apr 13, 2023 •

edited

Loading