Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: buffer and spread UDP statsd writes #1025

Merged
merged 2 commits into from
Apr 22, 2018

Conversation

mreiferson
Copy link
Member

We've been experiencing some significant UDP packet loss between nsqd and our metrics agent on our clusters. These nodes handle 100s of topics/channels.

This set of changes does two things:

  1. Buffers the (extremely) granular statsd UDP packets and flushes them at the 508 byte mark.

  2. Spreads the packets out over the configured statsd interval rather than bursting each loop.

In combination, this reduces both the number and rate of UDP packets, which should significantly reduce packet loss due to various buffer overflows and whatnot.

Thoughts @ploxiln @jehiah @itwasntandy

@jehiah
Copy link
Member

jehiah commented Apr 18, 2018

👍 seems reasonable; steady packet rates are always preferable to huge bursts every N seconds.

continue
}
sw := writers.NewSpreadWriter(conn, interval-time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This -time.Second looks like a tricky bit. It's an assumption that everything besides the deliberate sleeps in SpreadWriter.flush() takes less than a second on average. That's probably a generally a reasonable assumption. I wonder if it should be made more obvious though, maybe set as a "const" at the top.

@ploxiln
Copy link
Member

ploxiln commented Apr 18, 2018

nice 👍

nsqd/statsd.go Outdated
)

const MaxUDPPacketSize = 508

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been thinking about this…

Looking a typical packet sizes from nsq-> statsd over a period of time, ~70% are in the 80-159 byte range, and ~30% in the 160-319 byte range.


==================================================================================================================================
Packet Lengths:
Topic / Item       Count         Average       Min val       Max val       Rate (ms)     Percent       Burst rate    Burst start  
----------------------------------------------------------------------------------------------------------------------------------
Packet Lengths     8908          143.07        61            246           0.2646        100%          7.3800        31.450       
 0-19              0             -             -             -             0.0000        0.00%         -             -            
 20-39             0             -             -             -             0.0000        0.00%         -             -            
 40-79             21            73.62         61            79            0.0006        0.24%         0.0800        1.588        
 80-159            6057          126.76        81            159           0.1799        68.00%        4.4000        21.435       
 160-319           2830          178.48        160           246           0.0840        31.77%        3.5200        1.456        
 320-639           0             -             -             -             0.0000        0.00%         -             -            
 640-1279          0             -             -             -             0.0000        0.00%         -             -            
 1280-2559         0             -             -             -             0.0000        0.00%         -             -            
 2560-5119         0             -             -             -             0.0000        0.00%         -             -            
 5120 and greater  0             -             -             -             0.0000        0.00%         -             -            

----------------------------------------------------------------------------------------------------------------------------------

I agree that 508 bytes would be a solid default as with headers etc, it'd be guaranteed to fit within 572 bytes, and thus avoid potential fragmentation if the statsd server was remote over unknown connectivity.

Based on the the stats above, it would also give a 2-5x reduction in packet rate, which is not to be sniffed at!

What I was wondering is for cases where statsd is listening on a loopback interface, or hosted on a local network where MTUs can be known or easily determined, would it make sense to make this configurable?

As in such environments, it may be preferable to set this to 1400 bytes, or even larger if jumbo frames are enabled and lead to further improvements?

@mreiferson
Copy link
Member Author

@itwasntandy pushed a commit making it configurable, PTAL

I'll squash if we're good

@itwasntandy
Copy link

LGTM

@mreiferson mreiferson merged commit 49b7e3d into nsqio:master Apr 22, 2018
@mreiferson mreiferson deleted the spread-udp-writes branch April 22, 2018 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants