Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics reporting for mutation #1435

Merged
merged 78 commits into from
Jul 29, 2021

Conversation

julianKatz
Copy link
Contributor

@julianKatz julianKatz commented Jul 13, 2021

This PR adds reporting functions to gatekeeper's mutator contollers
(assign, assignmetadata) and the mutation System. These metrics track
requests to the mutation webhook, the ingestion of mutator objects, the
number of active mutators in the system, and the number of iterations
required by the System to mutate objects.

Fixes #1214

Signed-off-by: juliankatz juliankatz@google.com

@codecov-commenter
Copy link

codecov-commenter commented Jul 13, 2021

Codecov Report

Merging #1435 (4115741) into master (c961ac6) will decrease coverage by 0.04%.
The diff coverage is 53.69%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1435      +/-   ##
==========================================
- Coverage   52.05%   52.00%   -0.05%     
==========================================
  Files          80       83       +3     
  Lines        7429     7593     +164     
==========================================
+ Hits         3867     3949      +82     
- Misses       3195     3263      +68     
- Partials      367      381      +14     
Flag Coverage Δ
unittests 52.00% <53.69%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/controller/config/config_controller.go 65.33% <0.00%> (ø)
pkg/controller/constraint/constraint_controller.go 5.72% <0.00%> (+0.07%) ⬆️
...onstrainttemplate/constrainttemplate_controller.go 53.93% <0.00%> (-1.91%) ⬇️
pkg/controller/mutators/cache.go 0.00% <0.00%> (ø)
pkg/webhook/namespacelabel.go 70.00% <0.00%> (ø)
pkg/webhook/policy.go 30.83% <0.00%> (ø)
pkg/webhook/webhook.go 0.00% <0.00%> (ø)
pkg/mutation/system.go 60.41% <52.94%> (-1.02%) ⬇️
...kg/controller/mutators/assign/assign_controller.go 47.42% <61.29%> (ø)
...tators/assignmetadata/assignmetadata_controller.go 47.42% <62.50%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aad6c27...4115741. Read the comment docs.

main.go Outdated Show resolved Hide resolved
pkg/mutation/reporter/stats_reporter.go Outdated Show resolved Hide resolved
pkg/mutation/system.go Outdated Show resolved Hide resolved
pkg/controller/assign/assign_controller.go Outdated Show resolved Hide resolved
@julianKatz julianKatz marked this pull request as ready for review July 22, 2021 17:32
@julianKatz julianKatz changed the title Mutation metrics Add metrics reporting for mutation Jul 22, 2021
pkg/controller/mutators/stats_reporter.go Outdated Show resolved Hide resolved
pkg/metrics/reporter.go Outdated Show resolved Hide resolved
pkg/mutation/stats_reporter.go Outdated Show resolved Hide resolved

// NewStatsReporter creaters a reporter for webhook metrics.
func NewStatsReporter() (StatsReporter, error) {
ctx, err := tag.New(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto above. Just use context.TODO().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sozercan Removing the use of tag.New() removes the possibility of creating an error from the instantiation of a reporter. tag.New() can produce an error because it has the ability to mutate the key/value pairs stored inside of a context.Context type. As we feed in an empty context.Context and add no new information, there is no way for an error to be produced.

This really has two implications:

  1. NewStatsReporter is reduced to having essentially no logic. Perhaps we should just remove it? We could use the reporter type (renamed to Reporter) and still specify the StatsReporter interface in function signatures.
  2. reporter no longer needs to contain a context. That context is useless and is essentially cruft. We could just as easily create a fresh context in each of the reporting functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/mutation/stats_reporter_test.go Outdated Show resolved Hide resolved
pkg/mutation/stats_reporter_test.go Outdated Show resolved Hide resolved
pkg/webhook/stats_reporter.go Outdated Show resolved Hide resolved
pkg/mutation/stats_reporter_test.go Outdated Show resolved Hide resolved
@julianKatz
Copy link
Contributor Author

@sozercan @shomron PTAL :)

verifyTags(t, expectedTags, row.Tags)
}

func checkData(t *testing.T, name string, rowLength int) *view.Row {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Prefer returning error and calling t.Errorf/t.Fatalf from the test itself for clarity. (In general, avoid assertion methods)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger. I stole this from other stats_reporter files

Copy link
Member

@willbeason willbeason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits and making sure bases are covered. No changes here are required for approval.

pkg/controller/mutators/stats_reporter_test.go Outdated Show resolved Hide resolved
pkg/mutation/stats_reporter.go Show resolved Hide resolved
Description: systemIterationsM.Description(),
Measure: systemIterationsM,
// JULIAN - We'll need to tune this. I'm not sure if these histogram sections are valid.
Aggregation: view.Distribution(2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is these are milliseconds? If so, I'd recommend:

1, 2, 5, 7, 10, 20, 50, 70, 100, 200, 500, 700, 1000

That results in more human-readable output that's easier to reason about.

Copy link
Contributor Author

@julianKatz julianKatz Jul 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the number of times we go through the mutators when mutating an incoming object. The upper-bound on iterations is (number of mutators) + 1.

As most systems will have something like 1-100 mutators, I figured that having more granularity in the lower buckets would be a good idea. That said, both of these bucket sizes are growing exponentially, which should represent the data well.

1000 mutators seems unlikely to me, but not impossible.

@maxsmythe can you add some color here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, then as-is looks good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we're in "not enough information" territory, as we don't know how users will use mutations.

Forwards compatibility is a nice (if vague) target here.

I'd expect most values at the low numbers (more iterations would be due to multiple mutators affecting the same object).

We can start with 1...10, 20, 50, 100 and see where that lands us, with the idea that those buckets are the most likely to be reused in the future (assume most people will have less than 20 overlapping mutators with some granularity to identify larger cases, hit round numbers that will play well with logarithmic buckets).

pkg/controller/mutators/assign/assign_controller.go Outdated Show resolved Hide resolved
@maxsmythe
Copy link
Contributor

Cannot reply to pre-existing comment (thanks github), so closing and adding as a convo thread:

@willbeason willbeason 4 days ago Member
ditto above. Just use context.TODO().

@julianKatz julianKatz 5 hours ago Author Member
@sozercan Removing the use of tag.New() removes the possibility of creating an error from the instantiation of a reporter. > tag.New() can produce an error because it has the ability to mutate the key/value pairs stored inside of a context.Context type. > As we feed in an empty context.Context and add no new information, there is no way for an error to be produced.

This really has two implications:

NewStatsReporter is reduced to having essentially no logic. Perhaps we should just remove it? We could use the reporter type (renamed to Reporter) and still specify the StatsReporter interface in function signatures.
reporter no longer needs to contain a context. That context is useless and is essentially cruft. We could just as easily create a fresh context in each of the reporting functions.

Removing context SGTM, but let's keep the New() function, it's not doing any harm (we can remove the error return value). If we need to create tags in the future for creating "categories" of reporters or similar, we can always panic, as these reporters are created during bootstrapping.

Copy link
Contributor

@maxsmythe maxsmythe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replying to outstanding comment threads, havent dug into the current version of the code yet.

main.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
pkg/mutation/stats_reporter.go Show resolved Hide resolved
Description: systemIterationsM.Description(),
Measure: systemIterationsM,
// JULIAN - We'll need to tune this. I'm not sure if these histogram sections are valid.
Aggregation: view.Distribution(2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we're in "not enough information" territory, as we don't know how users will use mutations.

Forwards compatibility is a nice (if vague) target here.

I'd expect most values at the low numbers (more iterations would be due to multiple mutators affecting the same object).

We can start with 1...10, 20, 50, 100 and see where that lands us, with the idea that those buckets are the most likely to be reused in the future (assume most people will have less than 20 overlapping mutators with some granularity to identify larger cases, hit round numbers that will play well with logarithmic buckets).

Copy link
Contributor

@maxsmythe maxsmythe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some nits, the biggest one being changing the fibbonaci-sized buckets to logarithmically scaled ones.

pkg/controller/mutators/assign/assign_controller.go Outdated Show resolved Hide resolved
pkg/mutation/system.go Outdated Show resolved Hide resolved
julianKatz and others added 12 commits July 27, 2021 11:54
This PR incorporates frameworks commit
1dbe2618668df5f48b6a256bdcd724eeb2b1c130
Add ConstraintTemplate v1 (#121)

An imperative install CRD install step was added to the
Helm Upgrade GH Workflow, as Helm does not update
CRDs.

Contributes to open-policy-agent#550

Signed-off-by: juliankatz juliankatz@google.com
Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>

Co-authored-by: Sertaç Özercan <852750+sozercan@users.noreply.github.com>
Co-authored-by: Max Smythe <smythe@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
…-policy-agent#1404)

Add prefix-based glob matching support across
Gatekeeper's various match sections.

These include:

- Constraints (via the regolib match)
- Config resource (via the process excluder)
- Mutations (via the match package)

Fixes open-policy-agent#1009

Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
As g8r once had only a validation webhook, certain metrics
(request_count, request_count_duration_seconds) were implemented without
qualifiers.  i.e., the fact that they applied to the validation webhook
was _assumed_.

As g8r now has both validation and mutation webhooks, separate metrics
for both are required.  These metrics have already been implemented, and
have been released since v3.4.0.  The original metrics were deprecated
(but retained) at the time of that release.

As we are approaching v3.6, it is finally time to remove these metrics.
This PR removes the code and updates the metrics README.

Fixes open-policy-agent#1010

Signed-off-by: juliankatz <juliankatz@google.com>
…1364)

When multiple Pods are running mutators, it is possible that they will
ingest mutators in different orders. Before this PR, we would
essentially randomly pick which mutator to honor and discard future
mutators which conflicted.

Fixes: open-policy-agent#1216

Signed-off-by: Will Beason <willbeason@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
julianKatz and others added 22 commits July 27, 2021 11:54
Signed-off-by: juliankatz <juliankatz@google.com>
… a test

Signed-off-by: juliankatz <juliankatz@google.com>
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 2.2.0 to 2.3.0.
- [Release notes](https://github.com/actions/setup-node/releases)
- [Commits](actions/setup-node@v2.2.0...v2.3.0)

---
updated-dependencies:
- dependency-name: actions/setup-node
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
… a test

Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
This PR adds reporting functions to gatekeeper's mutator contollers
(assign, assignmetadata) and the mutation System.  These metrics track
requests to the mutation webhook, the ingestion of mutator objects, the
number of active mutators in the system, and the number of iterations
required by the System to mutate objects.

Fixes open-policy-agent#1214

Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
* Benchmark System.Mutate

Signed-off-by: Will Beason <willbeason@google.com>

* Early exit Mutate if no mutations

Signed-off-by: Will Beason <willbeason@google.com>

Co-authored-by: Oren Shomron <shomron@gmail.com>
Signed-off-by: juliankatz <juliankatz@google.com>
…rror return

Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
- Make mutation.System take an options struct instead of injection func
- put defer anon func in assign_controller into named method
- a bit of test cleanup

Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 2.0.1 to 2.0.2.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/master/CHANGELOG.md)
- [Commits](codecov/codecov-action@v2.0.1...v2.0.2)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Will Beason <willbeason@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
… mutation-metrics

Signed-off-by: juliankatz <juliankatz@google.com>
Signed-off-by: juliankatz <juliankatz@google.com>
… mutation-metrics

Signed-off-by: juliankatz <juliankatz@google.com>
… mutation-metrics

Signed-off-by: juliankatz <juliankatz@google.com>
… mutation-metrics

Signed-off-by: juliankatz <juliankatz@google.com>
@julianKatz julianKatz merged commit 07e2fd0 into open-policy-agent:master Jul 29, 2021
@julianKatz julianKatz deleted the mutation-metrics branch July 29, 2021 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mutation metrics
6 participants