Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry support to objectTracker #1014

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,14 @@ The `data.inventory` document has the following format:
* For namespace-scoped objects: `data.inventory.namespace[<namespace>][groupVersion][<kind>][<name>]`
* Example referencing the Gatekeeper pod: `data.inventory.namespace["gatekeeper"]["v1"]["Pod"]["gatekeeper-controller-manager-d4c98b788-j7d92"]`

### Customize Startup Behavior

#### Allow retries when adding objects to OPA

Gatekeeper's webhook servers undergo a bootstrapping period during which they are unavailable until the initial set of resources (constraints, templates, synced objects, etc...) have been ingested. This prevents Gatekeeper's webhook from validating based on an incomplete set of policies. This wait-for-bootstrapping behavior can be configured.

The `--readiness-retries` flag defines the number of retry attempts allowed for an object (a Constraint, for example) to be successfully added to OPA. The default is `0`. A value of `-1` allows for infinite retries, blocking the webhook until all objects have been added to OPA. This guarantees complete enforcement, but has the potential to indefinitely block the webhook from serving requests.

### Audit

The audit functionality enables periodic evaluations of replicated resources against the policies enforced in the cluster to detect pre-existing misconfigurations. Audit results are stored as violations listed in the `status` field of the failed constraint.
Expand Down
2 changes: 1 addition & 1 deletion apis/mutations/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pkg/controller/constraint/constraint_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -387,7 +387,7 @@ func (r *ReconcileConstraint) cacheConstraint(instance *unstructured.Unstructure
unstructured.RemoveNestedField(obj.Object, "status")
_, err := r.opa.AddConstraint(context.Background(), obj)
if err != nil {
t.CancelExpect(obj)
t.TryCancelExpect(obj)
return err
}

Expand Down
17 changes: 7 additions & 10 deletions pkg/readiness/noop_expectations.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,20 +18,17 @@ package readiness
import "k8s.io/apimachinery/pkg/runtime"

// noopExpectations returns an Expectations that is always satisfied.
type noopExpectations struct {
}
type noopExpectations struct{}

func (n noopExpectations) Expect(o runtime.Object) {
}
func (n noopExpectations) Expect(o runtime.Object) {}

func (n noopExpectations) CancelExpect(o runtime.Object) {
}
func (n noopExpectations) CancelExpect(o runtime.Object) {}

func (n noopExpectations) ExpectationsDone() {
}
func (n noopExpectations) TryCancelExpect(o runtime.Object) {}

func (n noopExpectations) Observe(o runtime.Object) {
}
func (n noopExpectations) ExpectationsDone() {}

func (n noopExpectations) Observe(o runtime.Object) {}

func (n noopExpectations) Satisfied() bool {
return true
Expand Down
67 changes: 57 additions & 10 deletions pkg/readiness/object_tracker.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ limitations under the License.
package readiness

import (
"flag"
"fmt"
"sync"

Expand All @@ -29,13 +30,16 @@ import (
"k8s.io/apimachinery/pkg/types"
)

var readinessRetries = flag.Int("readiness-retries", 0, "The number of resource ingestion attempts allowed before the resource is disregarded. A value of -1 will retry indefinitely.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add this flag to the readme so people know how to use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ritazh I've added some information to the README. Could you take a look?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @julianKatz!


// Expectations tracks expectations for runtime.Objects.
// A set of Expect() calls are made, demarcated by ExpectationsDone().
// Expectations are satisfied by calls to Observe().
// Once all expectations are satisfied, Satisfied() will begin returning true.
type Expectations interface {
Expect(o runtime.Object)
CancelExpect(o runtime.Object)
TryCancelExpect(o runtime.Object)
ExpectationsDone()
Observe(o runtime.Object)
Satisfied() bool
Expand All @@ -51,20 +55,28 @@ type objectTracker struct {
gvk schema.GroupVersionKind
cancelled objSet // expectations that have been cancelled
expect objSet // unresolved expectations
tryCancelled objRetrySet // tracks TryCancelExpect calls, decrementing allotted retries for an object
seen objSet // observations made before their expectations
satisfied objSet // tracked to avoid re-adding satisfied expectations and to support unsatisfied()
populated bool // all expectations have been provided
allSatisfied bool // true once all expectations have been satisfied. Acts as a circuit-breaker.
kindsSnapshot []schema.GroupVersionKind // Snapshot of kinds before freeing memory in Satisfied.
tryCancelObj objDataFactory // Function that creates objData types used in tryCancelled
}

func newObjTracker(gvk schema.GroupVersionKind) *objectTracker {
func newObjTracker(gvk schema.GroupVersionKind, fn objDataFactory) *objectTracker {
if fn == nil {
fn = objDataFromFlags
}

return &objectTracker{
gvk: gvk,
cancelled: make(objSet),
expect: make(objSet),
seen: make(objSet),
satisfied: make(objSet),
gvk: gvk,
cancelled: make(objSet),
expect: make(objSet),
tryCancelled: make(objRetrySet),
seen: make(objSet),
satisfied: make(objSet),
tryCancelObj: fn,
}
}

Expand Down Expand Up @@ -106,6 +118,14 @@ func (t *objectTracker) Expect(o runtime.Object) {
t.expect[k] = struct{}{}
}

func (t *objectTracker) cancelExpectNoLock(k objKey) {
delete(t.expect, k)
delete(t.seen, k)
delete(t.satisfied, k)
delete(t.tryCancelled, k)
t.cancelled[k] = struct{}{}
}

// CancelExpect cancels an expectation and marks it so it
// cannot be expected again going forward.
func (t *objectTracker) CancelExpect(o runtime.Object) {
Expand All @@ -123,10 +143,36 @@ func (t *objectTracker) CancelExpect(o runtime.Object) {
return
}

delete(t.expect, k)
delete(t.seen, k)
delete(t.satisfied, k)
t.cancelled[k] = struct{}{}
t.cancelExpectNoLock(k)
}

func (t *objectTracker) TryCancelExpect(o runtime.Object) {
t.mu.Lock()
defer t.mu.Unlock()

// Respect circuit-breaker.
if t.allSatisfied {
return
}

k, err := objKeyFromObject(o)
if err != nil {
log.Error(err, "skipping")
return
}

// Check if it's time to delete an expectation or just decrement its allotted retries
obj, ok := t.tryCancelled[k]
if !ok {
// If the item isn't in the map, add it. This is the only place t.newObjData() should be called.
obj = t.tryCancelObj()
}
shouldDel := obj.decrementRetries()
t.tryCancelled[k] = obj // set the changed obj back to the map, as the value is not a pointer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not decrement retries here and if shouldCancel is true call cancelExpectNoLock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to duplicate this section:

k, err := objKeyFromObject(o)            
if err != nil {                          
    log.Error(err, "skipping")           
    return                               
}                                        

Would it be better to duplicate it? I could also move this logic into CancelExpect so that this logic isn't called twice in any codepath

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO error handling code is just kind of duplicative anyway, I don't sweat it too much. IMO the added code complexity here is more costly than duplicating trivial code.

Personally, I'd have the interior function take an object key and duplicate the key generation on the caller function.

I'd also duplicate the t.allSatisfied check to occur before the call to objKeyFromObject() (the way it is in the original code), just to make sure the short-circuited state (active 99.9% of the time if the pod is long-lived) is as cheap as possible.

Copy link
Contributor

@maxsmythe maxsmythe Dec 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between copying those bits and copying the various deletes() is that the deletes are highly coordinated, where one missing delete could put the system in a bad state. In that situation, avoiding copying lowers the likelihood that a future change misses one copy, introducing a bug.

For the get-key logic and the short-circuit logic, the risk of duplication is annoying a programmer and lower performance if a copy is mishandled, respectively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great feedback. I like the part about trying to minimize complexity in the most crucial parts of the logic. DRYing up trivial logic is less valuable, as that's easier to grok/harder to break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After moving all of this out of cancelExpectNoLock(), it made more sense to me to remove all conditional logic from that function. All of the "should delete" business is now held in TryCancelExpect() where it belongs. That way none of the complexity of that codepath is bleeding into the simpler CancelExpect()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, awesome!

if shouldDel {
t.cancelExpectNoLock(k)
}
}

// ExpectationsDone tells the tracker to stop accepting new expectations.
Expand Down Expand Up @@ -268,6 +314,7 @@ func (t *objectTracker) Satisfied() bool {
t.expect = nil
t.satisfied = nil
t.cancelled = nil
t.tryCancelled = nil
}
return t.allSatisfied
}
Expand Down
Loading