-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsqd: buffer and spread UDP statsd writes #1025
Conversation
👍 seems reasonable; steady packet rates are always preferable to huge bursts every N seconds. |
continue | ||
} | ||
sw := writers.NewSpreadWriter(conn, interval-time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This -time.Second
looks like a tricky bit. It's an assumption that everything besides the deliberate sleeps in SpreadWriter.flush()
takes less than a second on average. That's probably a generally a reasonable assumption. I wonder if it should be made more obvious though, maybe set as a "const" at the top.
nice 👍 |
nsqd/statsd.go
Outdated
) | ||
|
||
const MaxUDPPacketSize = 508 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Been thinking about this…
Looking a typical packet sizes from nsq-> statsd over a period of time, ~70% are in the 80-159 byte range, and ~30% in the 160-319 byte range.
==================================================================================================================================
Packet Lengths:
Topic / Item Count Average Min val Max val Rate (ms) Percent Burst rate Burst start
----------------------------------------------------------------------------------------------------------------------------------
Packet Lengths 8908 143.07 61 246 0.2646 100% 7.3800 31.450
0-19 0 - - - 0.0000 0.00% - -
20-39 0 - - - 0.0000 0.00% - -
40-79 21 73.62 61 79 0.0006 0.24% 0.0800 1.588
80-159 6057 126.76 81 159 0.1799 68.00% 4.4000 21.435
160-319 2830 178.48 160 246 0.0840 31.77% 3.5200 1.456
320-639 0 - - - 0.0000 0.00% - -
640-1279 0 - - - 0.0000 0.00% - -
1280-2559 0 - - - 0.0000 0.00% - -
2560-5119 0 - - - 0.0000 0.00% - -
5120 and greater 0 - - - 0.0000 0.00% - -
----------------------------------------------------------------------------------------------------------------------------------
I agree that 508 bytes would be a solid default as with headers etc, it'd be guaranteed to fit within 572 bytes, and thus avoid potential fragmentation if the statsd server was remote over unknown connectivity.
Based on the the stats above, it would also give a 2-5x reduction in packet rate, which is not to be sniffed at!
What I was wondering is for cases where statsd is listening on a loopback interface, or hosted on a local network where MTUs can be known or easily determined, would it make sense to make this configurable?
As in such environments, it may be preferable to set this to 1400 bytes, or even larger if jumbo frames are enabled and lead to further improvements?
@itwasntandy pushed a commit making it configurable, PTAL I'll squash if we're good |
LGTM |
7df6779
to
f445ec8
Compare
We've been experiencing some significant UDP packet loss between
nsqd
and our metrics agent on our clusters. These nodes handle 100s of topics/channels.This set of changes does two things:
Buffers the (extremely) granular statsd UDP packets and flushes them at the 508 byte mark.
Spreads the packets out over the configured statsd interval rather than bursting each loop.
In combination, this reduces both the number and rate of UDP packets, which should significantly reduce packet loss due to various buffer overflows and whatnot.
Thoughts @ploxiln @jehiah @itwasntandy