Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alertmanager Distributor #3671

Merged
merged 23 commits into from
Feb 18, 2021
Merged

Conversation

codesome
Copy link
Contributor

@codesome codesome commented Jan 8, 2021

What this PR does:

Adds sharding capabilities to alertmanager for the /alerts API and introduces a new Distributor component in it based on the design here.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@gotjosh
Copy link
Contributor

gotjosh commented Jan 8, 2021

Depends on #3664

@codesome codesome force-pushed the am-distributors branch 2 times, most recently from de1bc94 to 5f303ca Compare January 12, 2021 11:04
@codesome codesome force-pushed the am-distributors branch 2 times, most recently from 44306db to 9852490 Compare January 14, 2021 15:04
@pracucci
Copy link
Contributor

Let's get #3664 merged and then we'll review this PR, otherwise it's getting quite complicated to follow. Merging of #3664 should happen relatively soon (looks in a good shape to me).

@gotjosh
Copy link
Contributor

gotjosh commented Jan 19, 2021

#3664 is now merged, feel free to rebase.

@pracucci
Copy link
Contributor

@codesome Josh's PR #3664 has been merged. Could you rebase this PR, please?

@codesome
Copy link
Contributor Author

I have rebased, but I need to fix some TODOs, add some metrics and more tests. I will tell when it is ready for review.

@codesome codesome marked this pull request as ready for review January 20, 2021 13:29
@codesome
Copy link
Contributor Author

It is ready for a review. I still have to fix the metrics part of unit test and have to test running the distributor and alertmanager in a single binary to see if routing is happening properly (with no conflicting route registrations).

@codesome codesome marked this pull request as draft January 27, 2021 15:31
@codesome codesome marked this pull request as ready for review January 28, 2021 10:53
@codesome
Copy link
Contributor Author

PR is ready for review. I am still testing some bits locally to make sure everything is working fine. I will be adding more tests to alertmanager.

The current idea is to handle sharding only for /alerts API and incrementally add support for others. cc @gotjosh

codesome and others added 4 commits January 28, 2021 22:48
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
@codesome codesome force-pushed the am-distributors branch 2 times, most recently from 1bb7349 to 2159ef5 Compare February 5, 2021 14:18
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for addressing all comments. Change LGTM and most of the left comments are last nit, but there are a couple of serious ones reason why I haven't marked as approved yet. Thanks!

pkg/distributor/distributor.go Outdated Show resolved Hide resolved
integration/alertmanager_test.go Outdated Show resolved Hide resolved
pkg/api/api.go Outdated Show resolved Hide resolved
pkg/alertmanager/multitenant.go Outdated Show resolved Hide resolved
pkg/alertmanager/multitenant.go Outdated Show resolved Hide resolved
pkg/alertmanager/distributor_test.go Outdated Show resolved Hide resolved
pkg/alertmanager/distributor_test.go Outdated Show resolved Hide resolved
pkg/alertmanager/distributor_test.go Outdated Show resolved Hide resolved
pkg/alertmanager/distributor_test.go Outdated Show resolved Hide resolved
pkg/alertmanager/distributor_test.go Outdated Show resolved Hide resolved
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
@codesome codesome force-pushed the am-distributors branch 2 times, most recently from 27ad663 to 0ae3b27 Compare February 11, 2021 12:53
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, LGTM. Thanks! (just left few minor nits)

CHANGELOG.md Outdated Show resolved Hide resolved
pkg/alertmanager/distributor.go Outdated Show resolved Hide resolved
pkg/alertmanager/distributor.go Outdated Show resolved Hide resolved
pkg/alertmanager/distributor.go Outdated Show resolved Hide resolved
pkg/alertmanager/multitenant.go Outdated Show resolved Hide resolved

sp, ctx := opentracing.StartSpanFromContext(r.Context(), "Distribute.doRead")
defer sp.Finish()
// Until we have a mechanism to combine the results from multiple alertmanagers,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would combining mean in terms of behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combining would deduplicate the results from those alertmanagers, basically a union

Copy link
Contributor

@ranton256 ranton256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

// DistributeRequest shards the writes and returns as soon as the quorum is satisfied.
// In case of reads, it proxies the request to one of the alertmanagers.
// DistributeRequest assumes that the caller has verified IsPathSupported return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// DistributeRequest assumes that the caller has verified IsPathSupported return
// DistributeRequest assumes that the caller has verified IsPathSupported returns

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, this looks good! I've left some last comments.

pkg/alertmanager/distributor.go Outdated Show resolved Hide resolved
pkg/alertmanager/distributor.go Outdated Show resolved Hide resolved
defer sp.Finish()
// Until we have a mechanism to combine the results from multiple alertmanagers,
// we forward the request to only only of the alertmanagers.
amDesc := replicationSet.Ingesters[rand.Intn(len(replicationSet.Ingesters))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure we don't panic in rand.Intn, we should verify that len(replicationSet) > 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strategy used for AM ring makes sure there is at least 1 AM, so do we need the check here?

pkg/alertmanager/distributor.go Outdated Show resolved Hide resolved
pkg/alertmanager/distributor.go Show resolved Hide resolved
pkg/util/http.go Outdated Show resolved Hide resolved
pkg/alertmanager/distributor.go Outdated Show resolved Hide resolved
pkg/util/http.go Outdated Show resolved Hide resolved
pkg/alertmanager/distributor.go Outdated Show resolved Hide resolved
@codesome codesome force-pushed the am-distributors branch 4 times, most recently from 44bbf73 to 2f4e05c Compare February 17, 2021 12:02
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM. Time to merge it!

CHANGELOG.md Show resolved Hide resolved
CHANGELOG.md Show resolved Hide resolved
@pracucci pracucci merged commit 6c0bebf into cortexproject:master Feb 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants