Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alertmanager: Support UTF-8 #6898

Merged
merged 7 commits into from
Jan 25, 2024
Merged

Alertmanager: Support UTF-8 #6898

merged 7 commits into from
Jan 25, 2024

Conversation

grobinson-grafana
Copy link
Contributor

@grobinson-grafana grobinson-grafana commented Dec 12, 2023

What this PR does

This pull request adds support for UTF-8 in the Mimir Alertmanager (please see prometheus/alertmanager#3486).

Unlike the Prometheus Alertmanager, which operates both parsers at the same time using a fallback mechanism, the Mimir Alertmanager instead operates in one of two modes: classic mode using the pkg/labels parser, and strict UTF-8 mode using the matchers/parse parser. It uses classic mode as the default mode.

To help transition a Mimir installation from classic mode to strict UTF-8 mode, the Mimir Alertmanager contains additional code that runs all API requests and configurations through the fallback parser, incrementing metrics and emitting logs should it encounter configurations that are not forwards compatible or cause disagreement.

The operator of a Mimir cluster is expected to use these metrics and logs to observe incompatible configurations, or configurations that cause disagreement, and fix them. Once all configurations have been fixed the operator can then migrate the installation to strict UTF-8 mode. From there on, the pkg/labels parser will no longer be used, and it will no longer be possible to have configurations that are incompatible with the matchers/parse parser unless the mode be reverted.

A Mimir installation can be switched to the strict UTF-8 mode using the alertmanager.utf8-strict-mode flag.

Dashboards

Here are a couple of screenshots from the dashboards that we will use to manage the rollout of UTF-8 in Mimir.

The first two screenshots show an ideal cell. There are no configurations with incompatibilities or disagreement. This cluster can be migrated to UTF-8 strict mode.

Screenshot 2024-01-24 at 3 24 54 PM Screenshot 2024-01-24 at 3 25 04 PM

The next couple screenshots show a cell that has both disagreement and incompatibilities. The metrics in this screenshot are exaggerated as Mimir reloads configurations at regular intervals.

Screenshot 2024-01-24 at 3 34 47 PM

However, in the per tenant dashboard, we can see disagreement and incompatibilities deduplicated by tenant. In this case, we will fix the incompatible configuration but no the disagreement, as the disagreement in this screenshot is 🙂 encoded as a byte sequence.

Screenshot 2024-01-24 at 3 51 49 PM

Fixes #

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@CLAassistant
Copy link

CLAassistant commented Dec 12, 2023

CLA assistant check
All committers have signed the CLA.

@grobinson-grafana grobinson-grafana changed the title WIP: UTF-8 in Mimir UTF-8 in Mimir Jan 5, 2024
@grobinson-grafana grobinson-grafana force-pushed the grobinson/utf8-mimir branch 2 times, most recently from 804f02c to 36c8886 Compare January 11, 2024 10:11
@grobinson-grafana grobinson-grafana force-pushed the grobinson/utf8-mimir branch 5 times, most recently from b3327a5 to 2767767 Compare January 19, 2024 13:00
@grobinson-grafana grobinson-grafana force-pushed the grobinson/utf8-mimir branch 12 times, most recently from d84aeb0 to b9d47e9 Compare January 24, 2024 14:14
@grobinson-grafana grobinson-grafana self-assigned this Jan 24, 2024
Copy link
Contributor

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-wise this LGTM, but I reckcon we need a few integrations tests:

  1. Integration tests that asserts on the metrics you're mentioning with the various scenarios: no disagreement, some disagreement and all but disagreement of the various objects.
  2. Integration tests that assert on matchers that are now accepted that wouldn't be accepted before.

cmd/mimir/config-descriptor.json Show resolved Hide resolved
Copy link
Contributor

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

CHANGELOG.md Outdated
@@ -4,6 +4,7 @@

### Grafana Mimir

* [CHANGE] Alertmanager: CLI flag `-utf8-strict-mode` has been added, and its default value is set to `true`. #6898
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [CHANGE] Alertmanager: CLI flag `-utf8-strict-mode` has been added, and its default value is set to `true`. #6898
* [FEATURE] Alertmanager: Added `-alertmanager.utf8-strict-mode` to control support for any UTF-8 character as part of Alertmanager configuration/API matchers and labels. It's default value is set to `false`. #6898

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a feature more than a change, we tend to use whole flag and the default value is actually false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the true/false. I copied another CHANGELOG as an example and forgot to change this.

// Enable UTF-8 strict mode. This means Alertmanager uses the matchers/parse parser
// to parse configurations and API requests, instead of pkg/labels. Use this mode
// once you are confident that your configuration is forwards compatible.
UTF8StrictMode bool `yaml:"utf8_strict_mode" category:"advanced"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking this flag as advanced - means it'll need to follow the regular deprecation flow of 2 minor versions. So, you'll need at least three versions to remove it. Marking it as experimental, though, will allow you to remove it whenever you want.

See config docs for more.

I'm OK with this as is as I think there's value in making it permanent for a few versions to enable folks do to do their own migrations but thought it was worth mentioning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL. Let's just make it experimental. Even if it is experimental, doesn't mean we can't follow the regular deprecation flow if we so choose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

Comment on lines 886 to 893
mode := featurecontrol.FeatureClassicMode
if t.Cfg.Alertmanager.UTF8StrictMode {
level.Debug(util_log.Logger).Log("msg", "Starting Alertmanager in UTF-8 strict mode")
mode = featurecontrol.FeatureUTF8StrictMode
} else {
level.Debug(util_log.Logger).Log("msg", "Starting Alertmanager in classic mode")
}
features, err := featurecontrol.NewFlags(util_log.Logger, mode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, this is pretty important and only logged once - I'd consider changing his to info.

Suggested change
mode := featurecontrol.FeatureClassicMode
if t.Cfg.Alertmanager.UTF8StrictMode {
level.Debug(util_log.Logger).Log("msg", "Starting Alertmanager in UTF-8 strict mode")
mode = featurecontrol.FeatureUTF8StrictMode
} else {
level.Debug(util_log.Logger).Log("msg", "Starting Alertmanager in classic mode")
}
features, err := featurecontrol.NewFlags(util_log.Logger, mode)
mode := featurecontrol.FeatureClassicMode
if t.Cfg.Alertmanager.UTF8StrictMode {
level.Info(util_log.Logger).Log("msg", "Starting Alertmanager in UTF-8 strict mode")
mode = featurecontrol.FeatureUTF8StrictMode
} else {
level.Info(util_log.Logger).Log("msg", "Starting Alertmanager in classic mode")
}
features, err := featurecontrol.NewFlags(util_log.Logger, mode)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding the tests, they look beautiful!

@grobinson-grafana
Copy link
Contributor Author

I had to force push to fix conflicts in CHANGELOG.md.

@gotjosh gotjosh marked this pull request as ready for review January 25, 2024 12:39
@gotjosh gotjosh requested review from a team as code owners January 25, 2024 12:39
Copy link
Contributor

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gotjosh gotjosh merged commit 9c8f825 into main Jan 25, 2024
28 checks passed
@gotjosh gotjosh deleted the grobinson/utf8-mimir branch January 25, 2024 12:41
@gotjosh gotjosh changed the title UTF-8 in Mimir Alertmanager: Support any UTF-8 character in matchers and labels Jan 25, 2024
gotjosh added a commit that referenced this pull request Jan 25, 2024
…ollowup

Some follow up from #6898

- Move the changelog entry to the right place
- Add documentation about experimental features on `about-versioning`
- New boolean flags tend to be suffixed with the word "enabled" if it makes sense
- Minor fix to the flag documentation to aling with the changelog
@grobinson-grafana grobinson-grafana changed the title Alertmanager: Support any UTF-8 character in matchers and labels Alertmanager: Support UTF-8 Jan 29, 2024
@charleskorn charleskorn mentioned this pull request Jun 4, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants