-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xds: Outlier Detection configuration in Cluster Resolver Balancer #5371
Conversation
9c7e762
to
f01bc93
Compare
Please resolve the conflict and reassign, thanks! |
f01bc93
to
72138c7
Compare
type outlierDetectionBalancer struct { | ||
} | ||
|
||
func (b *outlierDetectionBalancer) UpdateClientConnState(s balancer.ClientConnState) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just return nil
from Build
instead of having this type.
Or don't implement balancer.Builder
fully yet and call bb.ParseConfig
directly in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the second option, it broke since this needs to be registered in config validation at a higher level. Thus, I kept implementing balancer.Builder and returned nil from build instead of the type, and deleted this type.
bb := balancer.Get(Name) | ||
if bb == nil { | ||
t.Fatalf("balancer.Get(%q) returned nil", Name) | ||
} | ||
parser, ok := bb.(balancer.ConfigParser) | ||
if !ok { | ||
t.Fatalf("balancer %q does not implement the ConfigParser interface", Name) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the above: parser := bb{}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point. Switched.
|
||
tests := []struct { | ||
name string | ||
input json.RawMessage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this a string
so you can remove all the json.RawMessage
literals (cast before passing to ParseConfig
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched.
name string | ||
input json.RawMessage | ||
wantCfg serviceconfig.LoadBalancingConfig | ||
wantErr bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a boolean, let's check for a string to be a substring of the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. This actually lead me to discover that I was actually getting a parsing error not a validation error (see the 100 below, it shouldn't error - and it's also the wrong field name haha).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, another thing I noticed is that I never check if there is a child/the child is a clusterimpl/the child config is valid. These are assumptions I made for my od balancer, which if broken would cause nil panic. What should I do about this? Should I add a validation? Defensively program in my Outlier Detection PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't ever be called otherwise since we prepare the config ourselves but still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added check for child being there (validating child was already there implicitly from unmarshalJSON if present)
retConfig.Priorities = append(retConfig.Priorities, names...) | ||
retAddrs = append(retAddrs, addrs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these be moved up above the if envconfig
, and the two if envconfig
s merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, great catch. Switched.
retConfig.Priorities = append(retConfig.Priorities, name) | ||
retAddrs = append(retAddrs, addrs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to one if as well.
} | ||
} | ||
return retConfig, retAddrs, nil | ||
} | ||
|
||
func convertClusterImplMapToOutlierDetection(ciCfgs map[string]*clusterimpl.LBConfig, odCfg *outlierdetection.LBConfig) map[string]*outlierdetection.LBConfig { | ||
odCfgs := make(map[string]*outlierdetection.LBConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make(..., len(ciCfgs))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched. Why is this faster/better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-allocating is faster so it doesn't need to grow as you add entries if you go over the default initial size. Or if you have fewer entries than the default initial size, then you save memory.
|
||
func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg outlierdetection.LBConfig) *outlierdetection.LBConfig { | ||
odCfgRet := odCfg | ||
odCfgRet.ChildPolicy = &internalserviceconfig.BalancerConfig{Name: clusterimpl.Name, Config: &ciCfg} // This can panic if odCfg is nil. This shouldn't be nil though, as per CDS balancer. I can add check if you want. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass *clusterimpl.LBConfig
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get how whether clusterimpl.LBConfig relates to whether this panics or not. It panics on odCfgReturn.ChildPolicy. I needed to make the function arguments value types to make new memory and not write over heap memory used elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see how it panics, callsite at 161. Pointer to nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it. Let me know if you like solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments boss man :D!
retConfig.Priorities = append(retConfig.Priorities, names...) | ||
retAddrs = append(retAddrs, addrs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, great catch. Switched.
retConfig.Priorities = append(retConfig.Priorities, name) | ||
retAddrs = append(retAddrs, addrs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to one if as well.
} | ||
} | ||
return retConfig, retAddrs, nil | ||
} | ||
|
||
func convertClusterImplMapToOutlierDetection(ciCfgs map[string]*clusterimpl.LBConfig, odCfg *outlierdetection.LBConfig) map[string]*outlierdetection.LBConfig { | ||
odCfgs := make(map[string]*outlierdetection.LBConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched. Why is this faster/better?
|
||
func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg outlierdetection.LBConfig) *outlierdetection.LBConfig { | ||
odCfgRet := odCfg | ||
odCfgRet.ChildPolicy = &internalserviceconfig.BalancerConfig{Name: clusterimpl.Name, Config: &ciCfg} // This can panic if odCfg is nil. This shouldn't be nil though, as per CDS balancer. I can add check if you want. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get how whether clusterimpl.LBConfig relates to whether this panics or not. It panics on odCfgReturn.ChildPolicy. I needed to make the function arguments value types to make new memory and not write over heap memory used elsewhere.
|
||
tests := []struct { | ||
name string | ||
input json.RawMessage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched.
name string | ||
input json.RawMessage | ||
wantCfg serviceconfig.LoadBalancingConfig | ||
wantErr bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. This actually lead me to discover that I was actually getting a parsing error not a validation error (see the 100 below, it shouldn't error - and it's also the wrong field name haha).
name string | ||
input json.RawMessage | ||
wantCfg serviceconfig.LoadBalancingConfig | ||
wantErr bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, another thing I noticed is that I never check if there is a child/the child is a clusterimpl/the child config is valid. These are assumptions I made for my od balancer, which if broken would cause nil panic. What should I do about this? Should I add a validation? Defensively program in my Outlier Detection PR?
bb := balancer.Get(Name) | ||
if bb == nil { | ||
t.Fatalf("balancer.Get(%q) returned nil", Name) | ||
} | ||
parser, ok := bb.(balancer.ConfigParser) | ||
if !ok { | ||
t.Fatalf("balancer %q does not implement the ConfigParser interface", Name) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point. Switched.
type outlierDetectionBalancer struct { | ||
} | ||
|
||
func (b *outlierDetectionBalancer) UpdateClientConnState(s balancer.ClientConnState) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the second option, it broke since this needs to be registered in config validation at a higher level. Thus, I kept implementing balancer.Builder and returned nil from build instead of the type, and deleted this type.
name string | ||
input json.RawMessage | ||
wantCfg serviceconfig.LoadBalancingConfig | ||
wantErr bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't ever be called otherwise since we prepare the config ourselves but still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments :D!
|
||
func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg outlierdetection.LBConfig) *outlierdetection.LBConfig { | ||
odCfgRet := odCfg | ||
odCfgRet.ChildPolicy = &internalserviceconfig.BalancerConfig{Name: clusterimpl.Name, Config: &ciCfg} // This can panic if odCfg is nil. This shouldn't be nil though, as per CDS balancer. I can add check if you want. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see how it panics, callsite at 161. Pointer to nil.
|
||
func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg outlierdetection.LBConfig) *outlierdetection.LBConfig { | ||
odCfgRet := odCfg | ||
odCfgRet.ChildPolicy = &internalserviceconfig.BalancerConfig{Name: clusterimpl.Name, Config: &ciCfg} // This can panic if odCfg is nil. This shouldn't be nil though, as per CDS balancer. I can add check if you want. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it. Let me know if you like solution.
name string | ||
input json.RawMessage | ||
wantCfg serviceconfig.LoadBalancingConfig | ||
wantErr bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added check for child being there (validating child was already there implicitly from unmarshalJSON if present)
odCfgRet := odCfg | ||
odCfgRet.ChildPolicy = &internalserviceconfig.BalancerConfig{Name: clusterimpl.Name, Config: &ciCfg} // This can panic if odCfg is nil. This shouldn't be nil though, as per CDS balancer. I can add check if you want. | ||
return &odCfgRet | ||
func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg *outlierdetection.LBConfig) *outlierdetection.LBConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not pass ciCfg
by pointer? Generally structs should be passed by pointer unless there's a good reason not to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See: go/go-style/decisions#pass-values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to passing by pointer.
@@ -90,6 +91,11 @@ func (bb) ParseConfig(s json.RawMessage) (serviceconfig.LoadBalancingConfig, err | |||
if lbCfg.FailurePercentageEjection != nil && lbCfg.FailurePercentageEjection.EnforcementPercentage > 100 { | |||
return nil, fmt.Errorf("LBConfig.FailurePercentageEjection.EnforcementPercentage = %v; must be <= 100", lbCfg.FailurePercentageEjection.EnforcementPercentage) | |||
} | |||
|
|||
if lbCfg.ChildPolicy == nil { | |||
return nil, errors.New("LBConfig.ChildPolicy needs to be present") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/needs to/must/?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg *outlierdetection.LBConfig) *outlierdetection.LBConfig { | ||
odCfgRet := &outlierdetection.LBConfig{} | ||
if odCfg != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it valid for this to be nil? Why? For the JSON config, can we make the field a value instead of a pointer so it's never nil? Then pass by value to avoid the explicit copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not valid. This config is prepared by the cluster resolver, so we're guaranteed for this not to be nil, but this handles the nil panic I discussed earlier with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had the discovery mechanism hold onto a value, and pass it by value around. I kept this function returning a pointer though, as that is the value of internalserviceconfig.BalancerConfig that I populate.
odCfgRet := odCfg | ||
odCfgRet.ChildPolicy = &internalserviceconfig.BalancerConfig{Name: clusterimpl.Name, Config: &ciCfg} // This can panic if odCfg is nil. This shouldn't be nil though, as per CDS balancer. I can add check if you want. | ||
return &odCfgRet | ||
func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg *outlierdetection.LBConfig) *outlierdetection.LBConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See: go/go-style/decisions#pass-values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments! I'll wait to rebase my full PR since I don't want to force push in case one of you is in the middle of a review on the full one. Maybe I should wait until this is merged to rebase. The balancer is all orthogonal.
@@ -90,6 +91,11 @@ func (bb) ParseConfig(s json.RawMessage) (serviceconfig.LoadBalancingConfig, err | |||
if lbCfg.FailurePercentageEjection != nil && lbCfg.FailurePercentageEjection.EnforcementPercentage > 100 { | |||
return nil, fmt.Errorf("LBConfig.FailurePercentageEjection.EnforcementPercentage = %v; must be <= 100", lbCfg.FailurePercentageEjection.EnforcementPercentage) | |||
} | |||
|
|||
if lbCfg.ChildPolicy == nil { | |||
return nil, errors.New("LBConfig.ChildPolicy needs to be present") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg *outlierdetection.LBConfig) *outlierdetection.LBConfig { | ||
odCfgRet := &outlierdetection.LBConfig{} | ||
if odCfg != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not valid. This config is prepared by the cluster resolver, so we're guaranteed for this not to be nil, but this handles the nil panic I discussed earlier with you.
odCfgRet := odCfg | ||
odCfgRet.ChildPolicy = &internalserviceconfig.BalancerConfig{Name: clusterimpl.Name, Config: &ciCfg} // This can panic if odCfg is nil. This shouldn't be nil though, as per CDS balancer. I can add check if you want. | ||
return &odCfgRet | ||
func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg *outlierdetection.LBConfig) *outlierdetection.LBConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to passing by pointer.
|
||
func makeClusterImplOutlierDetectionChild(ciCfg clusterimpl.LBConfig, odCfg *outlierdetection.LBConfig) *outlierdetection.LBConfig { | ||
odCfgRet := &outlierdetection.LBConfig{} | ||
if odCfg != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had the discovery mechanism hold onto a value, and pass it by value around. I kept this function returning a pointer though, as that is the value of internalserviceconfig.BalancerConfig that I populate.
return err | ||
} | ||
gotCCS := ccs.(balancer.ClientConnState) | ||
if diff := cmp.Diff(gotCCS, wantCCS, cmpopts.IgnoreFields(resolver.State{}, "Addresses", "ServiceConfig", "Attributes")); diff != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work instead?
if diff := cmp.Diff(gotCCS, wantCCS, cmpopts.IgnoreFields(balancer.ClientConnState{}, "ResolverState")); diff != "" {
...
}
Also, why do we have to ignore the resolver state part of the update? Just so that we dont have to specify it in the wantCCS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that works :). I just commented this out to test since I forgot since it's been so long, and yeah super verbose resolver state I would have to match in wantCCS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment :D!
return err | ||
} | ||
gotCCS := ccs.(balancer.ClientConnState) | ||
if diff := cmp.Diff(gotCCS, wantCCS, cmpopts.IgnoreFields(resolver.State{}, "Addresses", "ServiceConfig", "Attributes")); diff != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that works :). I just commented this out to test since I forgot since it's been so long, and yeah super verbose resolver state I would have to match in wantCCS.
This PR adds branching logic for constructing the Priority LB's config. In the case that Outlier Detection is configured on with the corresponding Environment variable, the top level balancer's configuration for each priority will be of type Outlier Detection with a Cluster Impl child rather than a Cluster Impl. This had to include the Outlier Detection's balancer to pass Cluster Resolver tests, as the Outlier Detection policy needed to be registered.
Branched off #5370.
RELEASE NOTES: None