-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dropping store, external labels are not unique while using AWS NLB #1356
Comments
Hi, it doesn't use the IP address to compare the connected nodes, only the external labels. I believe the problem here is that at the leaf nodes you have the same external labels and they get marked as duplicates. It's hard to tell from your diagram what is happening but it seems like through the DNS discovery both of the leaf nodes get picked up. Is this what's happening? |
I believe this answers your question:
Those are the exact same pod replying from 2 different IP addresses (as the NLB has backend nodes) Edit: If I remove the NLB and point to the node IP directly, everything works fine |
EDIT: Nevermind, those had some dns issues trying to resolve their Kubernetes services |
So going back to the issue with having 2 pods exposing the labels thru their NLB IP address: I've tried using the NLB setting Proxy Protocol v2 to expose the IP address of the nodes instead of the IP address of the NLB nodes, but it seems like it breaks grpc:
|
I think this is related to #1338 ... I'll try to use a newer version of thanos with this fix |
Using the latest master fixes the issue, but I'm concerned about the fix, as it seems like it's just choosing one of the IP addresses and keep using that forever (or until it gets unhealthy) which can cause issues as the traffic will always go to a single NLB node. The solution is not ideal |
Yes and that's why you need a load balancer like Envoy/Nginx in front of those two nodes with identical labels. Thanos here is doing the correct thing and protecting you from having needless 2x load. What do you think Thanos should do in such cases? |
These are not two nodes, it's only one Kubernetes node behind 2 NLB nodes (AWS creates a load balancer and add AWS nodes in there, they have their own IP that are the ones being resolved when you query Then, my Kubernetes node (the one that has the thanos-query pod) will be behind that NLB, but the IP address that's exposed to the thanos-query Proxy, are the ones from the NLB nodes. So thanos-query can receive the same traffic from 1 pod, with 2 different IP addresses. I use a load balancer to be able to add a group of nodes that can have the thanos-query pods, I can't maintain a list of nodes manually if the pods are moving around between nodes. If I add an Envoy/nginx load balancer behind a NLB, I'll have the same problem, the thanos query proxy node will see these nginx/envoy nodes with the NLB ip addresses, that usually are more than 1. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hello, I'm having some issues using AWS's NLB (network load balancer) ... seems like the IP addresses from the NLB nodes are confusing the
thanos-query
service:Below is a diagram of my scenario
The problem I believe is that thanos-query uses the IP address (which is the IP address of AWS NLB workers, not my Kubernetes workers) to know if the label is unique, (comparing IP address vs labels)
If I modify the srv records to point to the nodes directly (using a single thanos-query pod per cluster), everything works fine, but that means I can't move that pod to another node.
Is there a way to change
thanos-query
to not to use the IP address to compare?Thanks!
PS:
The text was updated successfully, but these errors were encountered: