-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retry support to objectTracker #1014
Add retry support to objectTracker #1014
Conversation
875c024
to
54b66cd
Compare
Note that this PR overlaps with #952 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look at this!
pkg/readiness/object_tracker.go
Outdated
} | ||
|
||
func newObjTracker(gvk schema.GroupVersionKind) *objectTracker { | ||
func newObjTracker(gvk schema.GroupVersionKind, fns ...objDataMutator) *objectTracker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Golang only lets you have one variadic argument type, I'm not sure we want to spend it exclusively on configuring object data specifically.
We could just have:
func newObjTracker(gvk schema.GroupVersionKind, mutator objDataMutator) *objectTracker {
if mutator == nil {
mutator = setRetriesFromFlag
}
}
Also, if we are mutating anyway, why not just pass a constructor method that creates an appropriately formed object? That would save the need to pass arguments to mutatedObjData()
on line 115 of the new code (though the name would likely change).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did my way because I wouldn't have to change the usages of newObjTracker
around the code, though admittedly that isn't much of a reason.
I've changed it to what you have above.
Given that there isn't the ability to pass in arbitrary objDataMutator
s, I went ahead and changed objectTracker.mutators
(of type []objDataMutator
) to a singular objectTracker.mutator
(type objDataMutator
). LMK if you think I should switch it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are functional options still cool these days? That's what I usually use. Fine either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used a functional options pattern initially, and @maxsmythe advocated for saving the variadic parameter for the future if possible. I figure this method leaves things open-ended for the future and perhaps follows a "YAGNI" mentality.
pkg/readiness/object_tracker.go
Outdated
@@ -249,7 +281,7 @@ func (t *objectTracker) Satisfied() bool { | |||
} | |||
delete(t.seen, k) | |||
delete(t.expect, k) | |||
t.satisfied[k] = struct{}{} | |||
t.satisfied[k] = mutatedObjData(t.mutators...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to leave this as an empty struct to avoid unnecessary memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually can't do that anymore, because the value is now of type objData
(since it has the retries
field in it). Is there a different implementation you'd suggest that dodges this?
Passing in a raw objData
would save a call to a mutation function, but it's O(1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just have the different arrays have different types. That being said, I'm not sure the extra memory savings is worth the extra complexity.
@shomron any opinions on uniformity vs. lower memory usage here? IIRC this gets GCd after bootstrapping anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just have the different arrays have different types
I made the same suggestion for other reasons (see my review). I'm not worried about the memory usage (and yes we nuke these maps after bootstrapping) - but more about how we can reason about the code.
pkg/readiness/objset.go
Outdated
|
||
func (o objSet) decrementOrDelete(key objKey) bool { | ||
val := o[key] | ||
if val.retries > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should set this so a retries
value of -1
always returns false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. I'll add a test for this as well
pkg/readiness/objset.go
Outdated
|
||
func (o objSet) decrementOrDelete(key objKey) bool { | ||
val := o[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if the key is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would ultimately return a zero-valued objData
, which would trigger the else
clause below. The delete
would do nothing (key doesn't exist in map) and then the function would return true
.
This really isn't a break from the previous behavior, which was a call to delete
. That said, it's not as clear as it could be. I can make it val, ok := o[key]
and return true
(given the non-existent record is not decremented).
That said, I think this is poking at a question that's bigger than this function: how should objectTracker
deal with bogus keys? With or without my change, those keys end up in t.cancelled
. Is that a problem?
pkg/readiness/objset.go
Outdated
o[key] = val | ||
return false | ||
} else { | ||
delete(o, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just return a recommendation to delete or not so the actual management of all the different maps is handled in one place by the code?
pkg/readiness/objset.go
Outdated
|
||
func (o objSet) decrementOrDelete(key objKey) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if these methods were on objData instead? Would that give any benefits to the consumer code? Greater uniformity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the "Should we just return..." comment are related. Let's discuss here.
I think it would be good to make this a method of objData
, since that's what it's really about. This makes it logical to move the delete
call out of this function, as the objData
does not have access to the map it is part of.
I think this is a better separation of concerns, and makes it easier to understand the objectTracker
without having to jump to the objData
code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, it looks much nicer like this!
It looks like there is a linter failure as well |
54b66cd
to
24b0a75
Compare
Codecov Report
@@ Coverage Diff @@
## master #1014 +/- ##
==========================================
+ Coverage 47.66% 47.92% +0.25%
==========================================
Files 62 62
Lines 4225 4259 +34
==========================================
+ Hits 2014 2041 +27
- Misses 1956 1962 +6
- Partials 255 256 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
24b0a75
to
71b1658
Compare
pkg/readiness/objset.go
Outdated
|
||
type objDataMutator func(*objData) *objData | ||
|
||
func setRetriesFromFlag(o *objData) *objData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setRetriesFromFlag
and mutatedObjData
could be:
func newObjDataFromFlags() *objData {
return &objData{retries: *readinessRetries}
}
and for tests you could pass in the following function as an argument:
func() *objData { return &objData{ retries: <<whatever # of retries>> } }
This avoids having mutators embedded in the code, which is a bit more confusing to follow b/c of the extra layer of indirection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to this implementation (which I called a Factory, since it generates a new instance of a type)
Why return a pointer? It seems better to just return an objData
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was avoiding Factory
because it seems like the cool golang kids don't use it (too Java), but the analogy is apt. Personally, I usually don't name the functional argument types unless the function signature is long or there is a lot of repetition, but both ways work.
WRT why return a pointer... I suppose it doesn't really matter either way, as this is a small struct that only returns an int
and I don't really see us taking advantage of any mutate-in-place behavior.
I tend to default to struct pointers unless I'm returning something I'm persisting to a library consumer. Then I'll return a ref or a deepcopied pointer in order to protect the library's internal state.
TL;DR pointers can help for large structs, they may have a performance/memory penalty for smaller structs. Pointers vs structs also lead to shared state behavior that can be very useful (lots of mutate-in-place logic happening in mutation), or very annoying.
For an internal-to-the-package API I really don't put much thought into it b/c it's so cheap to change later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented in the review that I think the tip off here is that every place we're using the return value, we immediately dereference it.
As an aside, I like geeking out on the value vs pointer debate. I tend to default to values to avoid accidentally allowing mutations, but you can make bugs either way you cut it.
In terms of performance, pointers will escape to the heap and put more pressure on the GC, but also provide less data locality which leads to all kinds of cache line misses. Completely irrelevant micro-optimizations in our case :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does golang make a heap vs. stack distinction for pointers vs. values? I would have thought the ability to send &value
and have GC still work meant that they are treated the same WRT memory management?
There may be a golang subtlety I'm missing here though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there is a stack vs. heap optimization. I wonder when the decision is made... compile time?
https://medium.com/@meeusdylan/when-to-use-pointers-in-go-44c15fe04eac
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my understanding is that the compiler performs escape analysis, not during runtime. You can see some of the decisions it's making by passing -gcflags '-m'
to your build.
71b1658
to
d82198d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting close!
pkg/readiness/object_tracker.go
Outdated
|
||
obj := t.expect[k] | ||
shouldDelete := obj.decrementRetries() | ||
t.expect[k] = obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than copying the code from CancelExpect
, I think we should call it directly.
We'll need to move CancelExpect
to a different function that doesn't acquire a lock, say cancelExpectNoLock
and rewrite CancelExpect()
to acquire a lock and call the same function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the decrementRetries()
call in the middle, I had to pass that part in as a callback. Let me know if you think it makes it too complicated.
|
||
ot.TryCancelExpect(ct[0]) | ||
g.Expect(ot.Satisfied()).To(gomega.BeTrue(), "should be satisfied") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should test the infinite behavior too...
I realize that we can't test for infinite retries, but if we set retry count to -1, cancel, and retry count is still -1 that implies that we will retry indefinitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test that sets retries
to -1
and then does 20 TryCancelExpect
calls. This was the best I could think of as far as testing the .*Expect()
functions.
The test you're suggesting seems to be more of a test of the objData
struct itself. Happy to add that, just didn't because it seemed to break from the existing testing pattern.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the 20x test is sufficient
6ff2696
to
8964247
Compare
func (t *objectTracker) TryCancelExpect(o runtime.Object) { | ||
t.mu.Lock() | ||
defer t.mu.Unlock() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not decrement retries here and if shouldCancel
is true call cancelExpectNoLock
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to duplicate this section:
k, err := objKeyFromObject(o)
if err != nil {
log.Error(err, "skipping")
return
}
Would it be better to duplicate it? I could also move this logic into CancelExpect
so that this logic isn't called twice in any codepath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO error handling code is just kind of duplicative anyway, I don't sweat it too much. IMO the added code complexity here is more costly than duplicating trivial code.
Personally, I'd have the interior function take an object key and duplicate the key generation on the caller function.
I'd also duplicate the t.allSatisfied
check to occur before the call to objKeyFromObject()
(the way it is in the original code), just to make sure the short-circuited state (active 99.9% of the time if the pod is long-lived) is as cheap as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference between copying those bits and copying the various deletes() is that the deletes are highly coordinated, where one missing delete could put the system in a bad state. In that situation, avoiding copying lowers the likelihood that a future change misses one copy, introducing a bug.
For the get-key logic and the short-circuit logic, the risk of duplication is annoying a programmer and lower performance if a copy is mishandled, respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great feedback. I like the part about trying to minimize complexity in the most crucial parts of the logic. DRYing up trivial logic is less valuable, as that's easier to grok/harder to break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After moving all of this out of cancelExpectNoLock()
, it made more sense to me to remove all conditional logic from that function. All of the "should delete" business is now held in TryCancelExpect()
where it belongs. That way none of the complexity of that codepath is bleeding into the simpler CancelExpect()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, awesome!
8964247
to
21f17af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@shomron LGTY?
func (t *objectTracker) TryCancelExpect(o runtime.Object) { | ||
t.mu.Lock() | ||
defer t.mu.Unlock() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, awesome!
dd5d6ad
to
57cfb2f
Compare
pkg/readiness/objset.go
Outdated
} | ||
|
||
type objDataFactory func() *objData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: from what I can tell, all uses of these factory functions always immediately dereference the pointer. Should we just return a value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's dereferences across the board. Personally, I vote in favor of returning objData
by value, as that's consistent with the usage in the package (values of an objSet
are objData
structs, not pointers to said structs).
I'm going to switch this to returning a value unless @maxsmythe feels strongly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
pkg/readiness/object_tracker.go
Outdated
// Check if it's time to delete an expectation or just decrement its allotted retries | ||
obj := t.expect[k] | ||
shouldDel := obj.decrementRetries() | ||
t.expect[k] = obj // set the changed obj back to the map, as the value is not a pointer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the counter is exhausted, subsequent calls to TryCancelExpect
cause the expectation to be added and immediately removed. I think it's fine, but a comment might help future us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to add explicit conditional logic around this, as opposed to relying on the zero-value behavior? Something like:
// Check if it's time to delete an expectation or just decrement its allotted retries
obj, exists := t.expect[k]
if !exists {
// Do a log.Info() here?
return
}
shouldDel := obj.decrementRetries()
t.expect[k] = obj // set the changed obj back to the map, as the value is not a pointer
if shouldDel {
t.cancelExpectNoLock(k)
}
This has more code, but it prevents the behavior that you mentioned. Perhaps the verbosity is worth the clarity? LMK your thoughts @shomron
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to just use pointers in the map to avoid the extra code in the first place.
If you add the if-statement, no need for the log.Info() IMO.
I like the if-statement. It seems less brittle as it wont break if decrementRetries()
changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to be careful - if you skip calling cancelExpectNoLock
then the object is not marked in t.cancelled
- which changes the old behavior for certain out-of-order scenarios.
A comment and a test would work for me, otherwise we need to define the correct behavior for cancelling unexpected objects in light of the new code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The out-of-order execution is a good point... I had forgotten about that. Given that, I agree the comment is the better choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait... out-of-order also adds a new layer of difficulty here.
If we perform a TryCancel
on something that has no expectations yet, we should probably track the # of attempts. We shouldn't short-circuit the retry behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So @maxsmythe and I have been discussing the right way to implement this and came up with the idea of tracking TryCancelExpect
calls in their own separate map. Our thinking was this would allow us to track the number of attempts made to cancel, regardless of the other actions that have been called on an object.
It turns out that this will provide a fix for a nasty edge case. Because of this section in TryCancelExpect()
:
obj := t.expect[k]
shouldDel := obj.decrementRetries()
t.expect[k] = obj // set the changed obj back to the map, as the value is not a pointer
An as yet unexpected obj will actually be added to t.expect
as a zero-value version of the struct. This over-writes the retry behavior, which is set via a constructor function.
This provides further indication that separating these ideas into two maps will make the code easier to grok, as I obviously did this by mistake. Yay for TDD!
pkg/readiness/object_tracker.go
Outdated
} | ||
|
||
func newObjTracker(gvk schema.GroupVersionKind) *objectTracker { | ||
func newObjTracker(gvk schema.GroupVersionKind, fns ...objDataMutator) *objectTracker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are functional options still cool these days? That's what I usually use. Fine either way.
pkg/readiness/objset.go
Outdated
return &objData{retries: *readinessRetries} | ||
} | ||
|
||
type objSet map[objKey]objData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but only the expect
set uses counters. I'm finding it confusing reading the code and checking whether Observe
is still idempotent because it overwrites a counter nobody is looking at. Should we create a separate type for expect
and all the rest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me.
I've added a second type: objRetrySet
. This type is used only for t.expect
, while t.seen
and others will be assigned like:
t.seen[k] = struct{}{}
as they were done before. I believe makes the distinction clear in code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per this other comment, I've now moved the retries tracking into its own map.
So, t.expect
is back to using the original objSet
.
pkg/readiness/object_tracker.go
Outdated
@@ -249,7 +281,7 @@ func (t *objectTracker) Satisfied() bool { | |||
} | |||
delete(t.seen, k) | |||
delete(t.expect, k) | |||
t.satisfied[k] = struct{}{} | |||
t.satisfied[k] = mutatedObjData(t.mutators...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just have the different arrays have different types
I made the same suggestion for other reasons (see my review). I'm not worried about the memory usage (and yes we nuke these maps after bootstrapping) - but more about how we can reason about the code.
LGTM, no blockers. |
928ce30
to
1c3050d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@julianKatz lgtm, thanks for working on this enhancement!
Thanks for the thorough review!! @shomron |
1c3050d
to
3633049
Compare
I think I've addressed all the comments on this. Could @maxsmythe or @shomron merge it for me? |
Thanks for fixing that order-of-ops issue @shomron do you want to take a second look or are we good to merge? |
lgtm. Only nit is that we could release the tryCancelled map in |
@@ -29,13 +30,16 @@ import ( | |||
"k8s.io/apimachinery/pkg/types" | |||
) | |||
|
|||
var readinessRetries = flag.Int("readiness-retries", 0, "The number of resource ingestion attempts allowed before the resource is disregarded. A value of -1 will retry indefinitely.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add this flag to the readme so people know how to use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ritazh I've added some information to the README. Could you take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @julianKatz!
Good call. Will do. |
f298a8f
to
fda0b7a
Compare
This is done |
pkg/readiness/object_tracker.go
Outdated
@@ -51,20 +55,28 @@ type objectTracker struct { | |||
gvk schema.GroupVersionKind | |||
cancelled objSet // expectations that have been cancelled | |||
expect objSet // unresolved expectations | |||
tryCancelled objRetrySet // TryCancelExpect calls, decrementing allotted retries for an object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comment should read: tracks TryCancelExpect
calls, ...
attempts should be made to injest a resource into OPA while blocking the webhook. Previously, a failure to injest an object into the OPA cache would strike that object from the ObjectTracker, a type tasked with blocking the webhook from serving requests until all the necessary objects had been observed on the API server. That setup optimizes for availability (the webhook serving requests ASAP) over security. By adding retry functionality, we configure gatekeeper to block the webhook for longer, but unblock with a more complete set of constraints. fixes: #213 Signed-off-by: juliankatz <juliankatz@google.com>
fda0b7a
to
6bad8bc
Compare
Looks like all outstanding comments have been addressed, merging. |
Add the readiness-retries flag, allowing the user to configure how many
attempts should be made to injest a resource into OPA while blocking the
webhook.
Previously, a failure to injest an object into the OPA cache would
strike that object from the ObjectTracker, a type tasked with blocking
the webhook from serving requests until all the necessary objects had
been observed on the API server. This setup optimizes for availability
(the webhook serving requests ASAP) over security.
By adding retry functionality, we configure gatekeeper to block the
webhook for longer, but unblock with a more complete set of constraints.
Signed-off-by: juliankatz juliankatz@google.com