Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruler: remote rule evaluation #8744

Merged
merged 12 commits into from
Mar 13, 2023

Conversation

dannykopping
Copy link
Contributor

@dannykopping dannykopping commented Mar 8, 2023

What this PR does / why we need it:
Adds the ability to evaluate recording & alerting rules against a given query-frontend, allowing these queries to be executed with all the parallelisation & optimisation that regular adhoc queries have. This is important because with local evaluation all queries are single-threaded, and rules that evaluate a large range/volume of data may timeout or OOM the ruler itself, leading to missed metrics or alerts.

When remote evaluation mode is enabled, the ruler effectively just becomes a gRPC client for the query-frontend, which will dramatically improve the reliability of the ruler and also drastically reduce its resource requirements.

Which issue(s) this PR fixes:
This PR implements the feature discussed in #8129 (LID 0002: Remote Rule Evaluation).

Special notes for your reviewer:
This PR is the first of two; in this PR I have just implemented the happy path. In a subsequent PR I will address all the unhappy paths and add some configurability to this feature. I didn't want to complicate this (already large) PR with all of those failure cases, as a couple of these will require some deliberation. That subsequent PR will also contain documentation.

Also, I added a couple drive-by improvements along the way:

  • refactored our integration tests a little so the ruler config could be controlled by tests in a cleaner way
    • also added the ability to provide extra config fragments which are merged into the main Loki config; this makes tests quite clean and clear now I think
  • added a loki-backend component to our production/docker docker-compose setup, as well as configured the ruler and added a couple sample rules
    • also added remote-writing of metrics to the prometheus instance

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md

@dannykopping dannykopping force-pushed the dannykopping/remote-eval branch 5 times, most recently from 81d5c61 to 7a9e511 Compare March 10, 2023 08:35
@dannykopping dannykopping changed the title [DRAFT] Ruler: remote rule evaluation Ruler: remote rule evaluation Mar 10, 2023
@dannykopping dannykopping marked this pull request as ready for review March 10, 2023 08:44
@dannykopping dannykopping requested a review from a team as a code owner March 10, 2023 08:44
Danny Kopping added 9 commits March 10, 2023 11:33
Implemented local evaluator

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
…theus

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
@dannykopping dannykopping force-pushed the dannykopping/remote-eval branch from f326d2d to 597bc32 Compare March 10, 2023 09:35
Danny Kopping added 2 commits March 10, 2023 12:02
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
@dannykopping dannykopping requested a review from JStickler as a code owner March 10, 2023 10:12
@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Mar 10, 2023
integration/cluster/ruler.go Show resolved Hide resolved
integration/util/merger.go Show resolved Hide resolved
pkg/querier/queryrange/codec.go Show resolved Hide resolved
pkg/ruler/evaluator_remote.go Show resolved Hide resolved
Copy link
Contributor

@kavirajk kavirajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks super nice 👍

integration/client/client.go Show resolved Hide resolved
pkg/loki/loki.go Show resolved Hide resolved
pkg/querier/queryrange/codec.go Show resolved Hide resolved
Copy link
Contributor

@chaudum chaudum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀
I added a few comments of minor importance, but don't want to block merging.

integration/loki_rule_eval_test.go Outdated Show resolved Hide resolved
integration/loki_rule_eval_test.go Outdated Show resolved Hide resolved
pkg/loki/modules.go Outdated Show resolved Hide resolved
pkg/querier/queryrange/codec.go Show resolved Hide resolved
pkg/ruler/evaluator_remote.go Outdated Show resolved Hide resolved
Co-authored-by: Christian Haudum <christian.haudum@gmail.com>
@dannykopping dannykopping merged commit 33e44ed into grafana:main Mar 13, 2023
@dannykopping dannykopping mentioned this pull request Mar 21, 2023
5 tasks
dannykopping pushed a commit that referenced this pull request Mar 22, 2023
**What this PR does / why we need it**:
This PR is part 2 of 2 implementing remote rule evaluation.
See [part 1](#8744) for more
context.
@dannykopping dannykopping deleted the dannykopping/remote-eval branch March 22, 2023 10:28
dannykopping pushed a commit that referenced this pull request Mar 22, 2023
**What this PR does / why we need it**:
This PR is part 2 of 2 implementing remote rule evaluation.
See [part 1](#8744) for more
context.
@weizhengde
Copy link

when this feature will be released

@dannykopping
Copy link
Contributor Author

We have a 2.9 release coming soon which will include this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XXL type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants