-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discard ping replies if rtt larger than timeout (-t) #32
Comments
I've tested the 3.16-rc1
Don't want to be annoying, but this "issue" in NOT fixed. |
So, you mean that responses received after the timeout specified in -t should be discarded? |
I can understand the argument, but I wonder if the current behavior is possibly also preferred by others and possibly relied upon. |
Yes, it's an arguable thing - should be discarded or not. I belong to a Zabbix team (NMS application), where fping (with "not default" mode, i.e. with -C option) is used to perform icmp ping checks. As for "backward compatibility" - I'm not sure how to resolve that, but current behavior is illogical for me, so I consider it as a bug, which is "to be fixed" instead of keep things "as is" because someone maybe relies it. |
Here is an alternative proposal:
This is probably easier to understand for the users, instead of having a separate timeout option. What do you think? |
@schweikert, thanks for considering this, @zalexua - thanks for reporting :) also, if i would like to do 5 pings with 5 seconds between them, and one second timeout, the -p proposal would make that impossible (without a bit messy wrapper script). |
@schweikert after thinking and discussing your suggestion with one more Zabbix folk here, we do not consider it as an optimal solution. Saying more specifically - it would partially break fping usage in Zabbix. Today you have updated issue title and l want to say that I like it - it cleanly reflects what should be changed. I suppose it should not be any hard technically. |
The problem with that is that in loop/count mode, the default timeout of 500ms is really low. In non-loop/count mode you have retries and the backoff factor, but not in loop/count mode. So this would mean starting to discard perfectly fine ping replies, and it would surprise fping users. In which way it would break Zabbix? Here is another proposal:
|
You said that default value for -t is 1 second, but "man fping" says it's 500 ms. I see the same in sources (#define DEFAULT_TIMEOUT 500). So I'll use 500 ms further, fix me please if I'm wrong. Thinking on - what people use fping with -C option ... I don't think these people want to measure ICMP pings which are longer than 500 ms, by default. If they use count of packets (-C) more than 1 (which is logical), so they should understand that there is some timeout and it has some affect. If I would expect replies with more than 500 ms, I'd need to explicitly allow that by big value for -t option, but not juggling by value for -C option. I said it partially would break fping usage in Zabbix. In Zabbix it's possible to specify custom values for -t and -p options separately. If they would be "merged" on fping side, then current zabbix functionality for icmp ping checks would be partially broken. Your latest proposal sounds good to me. Let me collect more opinions here, will get back to you soon. |
the proposal in #32 (comment) seems reasonable to me |
You are right: it's 500ms by default, not 1000ms. I have corrected it in my comment. Also, I agree that "forbidding" the usage of -t together with -c/-C/-l would be bad and indeed break many installations, so that's certainly a no go. My main concern is not to break existing installations (and I argue that people might not expect pings > 500ms to be discarded when in loop mode). |
The decision to drop packet if rtt is over timeout specified with -t is right IMO. However, I don't see an idea to connect -t and -p options in any way as good one. |
@dimir, thanks for your feedback. Note that -t and -p supposed to be "connected" only if -t is not customized. It sounds not so bad. @schweikert, I think I've collected all opinions I wanted to get. Hope to see some further movements :) |
Sure, I understand that this is only when -t is not specified. And I agree, the proposed solution is not so bad, but not perfect either, IMO. Also I agree that it will be better than current behavior. |
If you want to test, I have released fping-4.0-rc2 with these changes: Testing would be much appreciated! |
I'll do it, definitely. |
Similarly to "man", maybe in --help it worth to add ", up to 2000 ms" after "where it is the -p period" ? Duplicated word "for for" in ChangeLog. |
"for for" is still in the Changelog |
@zalexua: indeed, sorry. fixed now |
I extensively tested the changes yesterday and today. Yesterday, on beginning (before I ran "ping zabbix.jp"), I had a weird result of loss replies, which I could not explain. After a few unexpected ping loss, I ran wireshak and I saw replies (I guess, unfortunately I didn't pay attention on packets itself, but only for timing) , i.e. with -c2 I had 4 packets appeared in wireshark and their timing was fine! After "ping zabbix.jp" I also ran tcpdump on shell, but all further attempts were successful and/or correct. |
Thank you! |
So just chiming in with the inevitable "you broke my use case" comment: I am using fping in Flent to measure bufferbloat, among other things. In this use case, the RTT can vary wildly within a single test, and we want to capture this. So discarding late pings is absolutely the wrong thing to do. If I'm reading the code right, I can get the old behaviour by just passing a really large value to -t, right? Will that affect anything on old versions (when running in -c mode)? |
Actually, playing with v4.0 a bit more, I'd argue that the current handling is wrong. If you're going to honour -t in counter mode (which is a good idea!), it should only apply as a timeout after the last packet has been sent. I.e., if the test runs long enough for earlier packets to be received, don't ignore them but report the result. That would make the new behaviour useful in my case as well :) |
The new behavior was implemented to make the results more consistent and predictable. In particular, it was considered problematic that if you sent 3 pings in count mode, you would possibly get a loss only on the third ping, because the timeout would only affect the last one. The recommended way to measure is to set a long-enough period, so that you don't run into this problem. |
Yeah, I did read the discussion; I'm just disagreeing that this is more "consistent and predictable" :) If you are ignoring late replies you are basically reporting a host as down even though it isn't. And for my measurements I specifically want a high sampling interval, but still be able to tell when latency increases above that interval (which it quite frequently does). If you're only running a few pings, I can see the point of the new behaviour, but if you're running a long test (I usually run with -p 200 -c 300 for a full minute test), it doesn't make sense... On the other hand, having the timeout parameter relate only to the end of the test, would make it possible to enforce a maximum duration for the whole test (which I currently have to do by means of a watchdog timer). Alternative, a time-based parameter instead of -c (i.e., "keep sending pings at this interval for x seconds") would be useful... |
I second @tohojo 's comment. The current behavior of using -t for a per-packet timeout isn't desirable in most cases. RTT can jump suddenly for many reasons, and that information shouldn't be discarded by default. Perhaps there is a use case for a per-packet timeout, but it could be a separate flag. Useful timeout values depend on the environment under test. I think a generally effective default for the final timeout is to wait after the last request has been sent for some multiple of the max RTT (say 3x max RTT), falling back to a fixed timeout if there were no replies, and possibly placing a limit on the timeout if the max RTT is something unreasonably high. |
@tohojo: you are free to disagree of course... For me having a different measurement for the last ping was a convincing enough argument to change that behavior. Note that you can still set a longer -t value, if you want to. Just note that it might not produce the results that you expect. |
Right, I can see how having the last few measurements disappear when latency increases can be odd. But the right way to fix that is doing something like what @peteheist is suggesting, and adjusting the wait interval to the actual observed RTT, instead of just arbitrarily discarding measurement points as you are currently doing. Latency variations are quite common in the internet, actually, and can be an order of magnitude above the base RTT. But yeah, since this behaviour is in a release version of fping, I am going to have to work around it anyway... |
It seems to me that adjusting timeout based on observed RTT will be extremely confusing. It could be some additional feature (market it as "adaptive AI"...), but the normal flags should be easily understandable and testable by average user. Sorry if I missed this in the comments, but what is the issue with specifying as large timeout as you want to handle? |
I want to support @schweikert 's words that "The new behavior was implemented to make the results more consistent and predictable". That's the key point. |
The argument for using a factor of max RTT is that you minimize both the total test time and the likelihood that you discard the last packet erroneously, which I think was the original problem reported. I think most users would be happy with this behavior without having to understand it. The issue with specifying a large timeout is that in case the last reply from the server is lost, there could be an unnecessarily long wait time. You might choose 15 seconds as a time that you think you'll "never" see, but why wait 15 seconds at the end of the test when the maximum round-trip time you saw was 250 milliseconds? But I think the choice of wait strategy at the end of the test is less consequential than the choice to time out individual round-trips. Knowing when and how often there are high round-trip times is often exactly what you're interested in, as it characterizes any bufferbloat, WiFi interference, hardware problems, etc. If those results are clipped, you can't differentiate between high RTT and packet loss. Lastly, I empathize. Anyone who thinks that writing/maintaining a ping-like utility is "easy" either hasn't done it or hasn't been dragged through various corner cases. So, thanks for fping! :) |
I hate to ask for clarification on a closed issue, but does this change mean that if my period is 20ms, my rtt is 40ms, and my count is 100, fping will return 100% packet loss, rather than permitting multiple requests to be in flight simultaneously? |
You are right, in your case 100% packets will be considered as without reply - packet lost. |
If I got it correctly - in "none default" mode ("batch mode" next) fping will receive and will process ECHO replies for previous requests as success till it executes last request disregarding that replies have been received much later that specified in -t option.
For example ping to a site with long distance.
As we see above one last response always lost because fping finished its work before last response came.
And with none default -p option, so two last responses are lost while first response has been catched:
I'd consider this as a fping's bug because it's nasty inconsistency which depends on number of requests sent and it's absolutely not clear.
I believe that timeouts in none default "batch" mode should work as expected disregarding on number of requests.
So in all my examples I should get 100% lost responses.
The text was updated successfully, but these errors were encountered: