-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add endpoint slice watcher #1528
base: main
Are you sure you want to change the base?
Conversation
2d11975
to
0dada8d
Compare
if derivedWorkload == "" { | ||
w.logger.Warn("failed to infer workload name from Pod name", zap.String("podName", podName)) | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure the chance of failure on this? if it happens, can we directly use service name to resolve the ip? My concern, will we skip the IP resolving if something goes wrong here from deployment name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inferWorkloadName return the podName in the worst case. I think it's very unlikely that the podname will be empty. I think if a pod shows up in endpoint slice, it will at least have some name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we saying that we will resolve RemoteService to the podname in the worst case? Wondering if this is the user experience we want in the trace map since it has high cardinality?
wondering if we can do something like remove the uid...not sure what the edge cases are.
} | ||
|
||
// 3) If none of the patterns matched, return the entire podName. | ||
return podName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope this wont happen. LOL. high cardinality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think the chance is low. Even when it happens, we have the metric limiter to mitigate the high cardinality issue.
plugins/processors/awsapplicationsignals/internal/resolver/endpointslicewatcher.go
Outdated
Show resolved
Hide resolved
0dada8d
to
191bd5a
Compare
if endpoint.TargetRef.Kind == "Pod" { | ||
podName = endpoint.TargetRef.Name | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If inferWorkloadName
relies on the podName
being set and this is the only place that sets the podName, couldn't we skip processing the endpoint if the TargetRef.Kind
isn't Pod?
if endpoint.TargetRef.Kind == "Pod" { | |
podName = endpoint.TargetRef.Name | |
} | |
if endpoint.TargetRef.Kind != "Pod" { | |
continue | |
} | |
podName := endpoint.TargetRef.Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, we should skip if it's not pod. Will update in next commit.
podName := "" | ||
ns := "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why are these declared outside of the if? They are only used if the endpoint.TargetRef != nil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will move it inside the if block.
// Service reference: "kubernetes.io/service-name" | ||
svcName := slice.Labels["kubernetes.io/service-name"] | ||
if svcName != "" { | ||
svcFull := svcName + "@" + slice.Namespace | ||
|
||
// If there's at least one endpoint, derive the workload from it | ||
if len(slice.Endpoints) > 0 { | ||
if endpoint := slice.Endpoints[0]; endpoint.TargetRef != nil { | ||
derived := inferWorkloadName(endpoint.TargetRef.Name) | ||
if derived != "" { | ||
firstWl := derived + "@" + endpoint.TargetRef.Namespace | ||
pairs = append(pairs, kvPair{ | ||
key: svcFull, | ||
value: firstWl, | ||
isService: true, | ||
}) | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Feels like this could be done within the loop so we don't have to derive the workload name again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. I will merge it to the loop.
pairs := w.extractEndpointSliceKeyValuePairs(newSlice) | ||
|
||
// Insert them into our ipToWorkload / serviceToWorkload, and track the keys. | ||
var keys []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can preallocate space for this slice since we know the size.
var keys []string | |
keys := make([]string, 0, len(pairs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update in next commit.
podInformer := sharedInformerFactory.Core().V1().Pods().Informer() | ||
err = podInformer.SetTransform(minimizePod) | ||
if err != nil { | ||
logger.Error("failed to minimize Pod objects", zap.Error(err)) | ||
} | ||
serviceInformer := sharedInformerFactory.Core().V1().Services().Informer() | ||
err = serviceInformer.SetTransform(minimizeService) | ||
if err != nil { | ||
logger.Error("failed to minimize Service objects", zap.Error(err)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is there a reason this isn't part of newPodWatcher
and newServiceWatcher
similar to newEndpointSliceWatcher
? Would be nice to have them be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to have the pod/service watcher to hide the details of transform inside. For consistency, I will change them accordingly.
if derivedWorkload == "" { | ||
w.logger.Warn("failed to infer workload name from Pod name", zap.String("podName", podName)) | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we saying that we will resolve RemoteService to the podname in the worst case? Wondering if this is the user experience we want in the trace map since it has high cardinality?
wondering if we can do something like remove the uid...not sure what the edge cases are.
if err != nil { | ||
logger.Error("failed to minimize Service objects", zap.Error(err)) | ||
} | ||
useListPod := (os.Getenv("USE_LIST_POD") == "true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use an env variable instead of a config option in the processor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to remove the usage of list pods api eventually. That's why I don't want to make the config change which is more permanent. In the worst case, the old logic is kept here so that customers can quickly switch to it. But I don't want too much publicity for the environmental variable for now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we saying that we will resolve RemoteService to the podname in the worst case? Wondering if this is the user experience we want in the trace map since it has high cardinality?
We have the metric limiter that will mitigate the high cardinality for metrics. For traces, the high cardinality issue is handled at backend. I remember that the backend will check how many nodes in the trace and cut off if there are too many. From this perspective, I am not very worried about the worst case. Anyway, the chance for the worst case to happen is pretty low in my opinion.
wondering if we can do something like remove the uid...not sure what the edge cases are.
I am not sure what you mean by "remove the uid".
25b3510
to
42f2f53
Compare
Description of the issue
Describe the problem or feature in addition to a link to the issues.
Description of changes
To reduce the overhead of ListPod calls on Kubernetes API server at cloudwatch agent startup, we want to switch to use endpoint slice API. The endpoint slice API has much smaller response size. We will use it to figure out the ip to workload mapping for Kubernetes services.
License
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Tests
Describe what tests you have done.
Requirements
Before commit the code, please do the following steps.
make fmt
andmake fmt-sh
make lint