-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve address before writing UDP packets #264
base: master
Are you sure you want to change the base?
Conversation
Hi @carlosroman, can you help review this pr? This is really a useful feature. |
@guidao I've had a look and very curious if this will resolve #263 or not. Just reading the docs on net.ResolveUDPAddr and if you give the function a "literal IP address" (such as By any chance were you able to test this out to see if it did indeed fix the issue with Cilium? I don't think this will try and resolve Another concern is the overhead of doing that resolve each time we wish to write data. I'll need to run some benchmarks to have a look and see if it does indeed come with a penalty. |
The example in #263 is a bit incorrect, in fact in our environment we are using a dns name, not a literal IP address. In #263 I was trying to determine if we could use the returned error to re-establish a connection, but maybe there is some case I don't know about. So I think bessb's pr might work better. On the performance issue, it might be possible to have the user configure the option to resolve every time or not to resolve or to resolve periodically. |
@guidao yeah, this will probably fix your issue if using a DNS name (assuming you're not using the CGO DNS resolver as that has some quirks). I'll look into seeing how we can release this with a nice config flag for people to use + benchmark it to check if we even need to use a flag. |
Here's a simple example where I'm using the current client code: package main
import (
"github.com/DataDog/datadog-go/v5/statsd"
"log"
"time"
)
func main() {
statsd, err := statsd.New("127.0.0.1:8125")
if err != nil {
log.Fatal(err)
}
log.Println("client initialized")
defer statsd.Close()
for range time.Tick(30 * time.Second) {
statsd.Flush()
}
} If I use
From what I understand, Cilium is hooking into the If I use the changes from the PR, this is what the system calls look like:
In the second case, the original IP address is set on all the I also tested using a hostname (e.g.
So, I agree this may incur some additional overhead, but should account for Pod or DNS changes happening after the client is initialized. |
@bessb Will be checking out if this does help with your issue. I don't believe it will because net.ResolveUDPAddr will just return us |
@bessb Sorry it has been awhile. I've just verified your PR and it does indeed help when running Cilium with a CiliumLocalRedirectPolicy pointing to |
@carlosroman Thank you for your continued updates. |
@bessb Got around to performance testing this and looks like both memory and CPU increased significantly (almost double) with this change. I'm gonna double check everything again and get someone to verify what I'm seeing. If it does indeed cause this performance change then I'll look at refactoring your PR so that it becomes a config option. That way it has to be enabled by a user with the knowledge that it will cause a performance issue. |
return writer, nil | ||
} | ||
|
||
// Write data to the UDP connection with no error handling | ||
func (w *udpWriter) Write(data []byte) (int, error) { | ||
return w.conn.Write(data) | ||
dst, err := net.ResolveUDPAddr("udp", w.addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an expensive operation that you do not want to do before every packet write. If you want to do such a change, it needs to be done infrequently to avoid substantially increasing the overhead of the datadog statsd client.
The current UDP connection logic will resolve an address only when the writer is created. This is undesirable in cases where the address may change at a later point.
In our case, we have agents deployed in Kubernetes clusters using Cilium CNI. Cilium has an eBPF hook for the
connect()
syscall that will resolve to the pod IP the of the agent. When an agent pod is terminated, a new agent pod is spun up with a different IP address. In order to pick up the new IP address it needs to be resolved again whenWrite()
is called.This should also fix #263.