-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loadtest find-limit: fix some edge cases and bugs #1357
Conversation
- Do not succeed if txpackets - txdrop == rxpackets - Do not exclude rxdrop from drop percentage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first sight LGTM, but I prefer to wait for @xray7224's review before merging it.
If I understand correctly what the patch does is relaxing the condition that checks for a successful state. Before if more packets were received than sent, the app reported no success. However, I think this is possible in some scenarios (for instance when receiving garbage control packets).
About whether to use the NIC registers or the links stats, I don't have an opinion.
success = (diff.rxpackets == diff.txpackets and diff.rxdrop == 0) and success | ||
success = (diff.rxpackets >= diff.txpackets) | ||
and diff.rxdrop == 0 and diff.txdrop == 0 | ||
and success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shoudn't it still be diff.rxpackets == diff.txpackets
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does diff.rxpackets > diff.txpackets
tell us? Should it signify failure? In my experience this does happen (I have so far attributed it to measurement inaccuracy, due to queueing and whatnot) and feel like it does not imply failure.
In my mind, the only definitive predicate we have is that
diff.rxpackets < diff.txpackets
→ failure
so the predicate for success should be the inverse of that, i.e.
diff.rxpackets >= diff.txpackets
→ success
I might be missing something but there goes my train of thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, for me for two boxes that are directly linked together, I expect rxpackets == txpackets as the sole success condition. That's what I see when I run tests. I never see rxpackets > txpackets. When would that be OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I experience different behavior of snabb loadtest
on master, e.g. here on a Intel i350:
dyser$ sudo ./snabb loadtest find-limit -b 1e9 ./program/snabbnfv/test_fixtures/pcap/http_google.pcap A B 22:00.1 ./program/snabbnfv/test_fixtures/pcap/http_google.pcap B A 23:00.1
Warming up at 1.000000 Gb/s for 5 seconds.
Applying 0.500000 Gbps of load.
A:
TX 116280 packets (0.116280 MPPS), 59709780 bytes (0.500004 Gbps)
RX 118864 packets (0.118864 MPPS), 61217340 bytes (0.512561 Gbps)
Loss: 0 ingress drop + -2584 packets lost (-2.222222%)
B:
TX 116280 packets (0.116280 MPPS), 59709780 bytes (0.500004 Gbps)
RX 118862 packets (0.118862 MPPS), 61215820 bytes (0.512548 Gbps)
Loss: 0 ingress drop + -2582 packets lost (-2.220502%)
Failed; 2 retries remaining.
Applying 0.500000 Gbps of load.
A:
TX 116280 packets (0.116280 MPPS), 59709780 bytes (0.500004 Gbps)
RX 116486 packets (0.116486 MPPS), 59990001 bytes (0.502285 Gbps)
Loss: 0 ingress drop + -206 packets lost (-0.177159%)
B:
TX 116280 packets (0.116280 MPPS), 59709780 bytes (0.500004 Gbps)
RX 116484 packets (0.116484 MPPS), 59989260 bytes (0.502279 Gbps)
Loss: 0 ingress drop + -204 packets lost (-0.175439%)
Failed; 1 retries remaining.
...
I attributed this to measurement inaccuracy, but it could also be a bug somewhere I suppose? My thinking is that even if we read all stats off the NIC counters (which I propose) there can be discrepancies due to say aliasing of the timers that sync NIC stats. \o/
s.lost_packets = s.txpackets - s.rxpackets - s.rxdrop | ||
s.lost_percent = s.lost_packets / s.txpackets * 100 | ||
s.lost_packets = (s.txpackets - s.rxpackets) - s.rxdrop | ||
s.lost_percent = (s.txpackets - s.rxpackets) / s.txpackets * 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU this is a presentation issue. Right now the load tester will print separately the % of packets dropped by the NIC at ingress, and the % that were lost elsewhere. Your change wants to sum the % dropped at ingress to the % lost elsewhere, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I the goal here is to print a definitive lost %. With this change it prints
Loss: #ingress_drop ingress drop + #link_drop packets lost (<total%>)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before, the percentage only included the #link_drop packets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah i see. yes definitely, good change!
Do not succeed if txpackets - txdrop == rxpackets
Do not exclude rxdrop from drop percentage
This also adjusts the success test to be satisfied with
rxpackets >= txpackets
instead of the originalrxpackets == txpackets
.While having a great time with
snabb loadtest
, there are a few things that stood out to me which are possibly worth discussing:Maybe all stats should be read directly off the NIC? i.e., an edge case is if you use a pcap file with packets < 60 bytes, then loadtest will not correctly report actual load applied. Also, since loadtest uses
nic:rxdrop()
for drop stats already I feel like mixing those with link stats introduces some unnecessary error due to the buffering characteristics of links. On the other handRateLimitedRepeater
would then also need to be aware of the padding of small packets, andfind-limit
would then have to compare the "real" txgbps with what it tried to apply to reject cases where the TX NIC is the bottleneck, or rather assert thatrxgbps >= applied_gbps
.Negative loss rates due to imperfect measurements are weird to look at. We could instead lie and pretend the loss rate is 0 and ignore any superfluous packets (i.e.,
rxpackets = math.min(rxpackets, txpackets)
). On the other hand it describes a real phenomenon that users of loadtest possibly should be subjected to?WDYT? Cc @dpino @xray7224 @wingo