loadtest find-limit: fix some edge cases and bugs #1357

eugeneia · 2018-06-18T11:15:09Z

Do not succeed if txpackets - txdrop == rxpackets
Do not exclude rxdrop from drop percentage

This also adjusts the success test to be satisfied with rxpackets >= txpackets instead of the original rxpackets == txpackets.

While having a great time with snabb loadtest, there are a few things that stood out to me which are possibly worth discussing:

Maybe all stats should be read directly off the NIC? i.e., an edge case is if you use a pcap file with packets < 60 bytes, then loadtest will not correctly report actual load applied. Also, since loadtest uses nic:rxdrop() for drop stats already I feel like mixing those with link stats introduces some unnecessary error due to the buffering characteristics of links. On the other hand RateLimitedRepeater would then also need to be aware of the padding of small packets, and find-limit would then have to compare the "real" txgbps with what it tried to apply to reject cases where the TX NIC is the bottleneck, or rather assert that rxgbps >= applied_gbps.
Negative loss rates due to imperfect measurements are weird to look at. We could instead lie and pretend the loss rate is 0 and ignore any superfluous packets (i.e., rxpackets = math.min(rxpackets, txpackets)). On the other hand it describes a real phenomenon that users of loadtest possibly should be subjected to?

WDYT? Cc @dpino @xray7224 @wingo

- Do not succeed if txpackets - txdrop == rxpackets - Do not exclude rxdrop from drop percentage

lukego · 2018-07-30T10:39:05Z

Any feedback on this @dpino @xray7224?

dpino

At first sight LGTM, but I prefer to wait for @xray7224's review before merging it.

If I understand correctly what the patch does is relaxing the condition that checks for a successful state. Before if more packets were received than sent, the app reported no success. However, I think this is possible in some scenarios (for instance when receiving garbage control packets).

About whether to use the NIC registers or the links stats, I don't have an opinion.

wingo · 2018-08-16T09:22:58Z

src/program/loadtest/find-limit/find-limit.lua

-         success = (diff.rxpackets == diff.txpackets and diff.rxdrop == 0) and success
+         success = (diff.rxpackets >= diff.txpackets)
+            and diff.rxdrop == 0 and diff.txdrop == 0
+            and success


Shoudn't it still be diff.rxpackets == diff.txpackets ?

What does diff.rxpackets > diff.txpackets tell us? Should it signify failure? In my experience this does happen (I have so far attributed it to measurement inaccuracy, due to queueing and whatnot) and feel like it does not imply failure.

In my mind, the only definitive predicate we have is that

diff.rxpackets < diff.txpackets → failure

so the predicate for success should be the inverse of that, i.e.

diff.rxpackets >= diff.txpackets → success

I might be missing something but there goes my train of thought.

Hum, for me for two boxes that are directly linked together, I expect rxpackets == txpackets as the sole success condition. That's what I see when I run tests. I never see rxpackets > txpackets. When would that be OK?

I experience different behavior of snabb loadtest on master, e.g. here on a Intel i350:

dyser$ sudo ./snabb loadtest find-limit -b 1e9 ./program/snabbnfv/test_fixtures/pcap/http_google.pcap A B 22:00.1 ./program/snabbnfv/test_fixtures/pcap/http_google.pcap B A 23:00.1 Warming up at 1.000000 Gb/s for 5 seconds. Applying 0.500000 Gbps of load. A: TX 116280 packets (0.116280 MPPS), 59709780 bytes (0.500004 Gbps) RX 118864 packets (0.118864 MPPS), 61217340 bytes (0.512561 Gbps) Loss: 0 ingress drop + -2584 packets lost (-2.222222%) B: TX 116280 packets (0.116280 MPPS), 59709780 bytes (0.500004 Gbps) RX 118862 packets (0.118862 MPPS), 61215820 bytes (0.512548 Gbps) Loss: 0 ingress drop + -2582 packets lost (-2.220502%) Failed; 2 retries remaining. Applying 0.500000 Gbps of load. A: TX 116280 packets (0.116280 MPPS), 59709780 bytes (0.500004 Gbps) RX 116486 packets (0.116486 MPPS), 59990001 bytes (0.502285 Gbps) Loss: 0 ingress drop + -206 packets lost (-0.177159%) B: TX 116280 packets (0.116280 MPPS), 59709780 bytes (0.500004 Gbps) RX 116484 packets (0.116484 MPPS), 59989260 bytes (0.502279 Gbps) Loss: 0 ingress drop + -204 packets lost (-0.175439%) Failed; 1 retries remaining. ...

I attributed this to measurement inaccuracy, but it could also be a bug somewhere I suppose? My thinking is that even if we read all stats off the NIC counters (which I propose) there can be discrepancies due to say aliasing of the timers that sync NIC stats. \o/

wingo · 2018-08-16T09:24:33Z

src/program/loadtest/find-limit/find-limit.lua

-            s.lost_packets = s.txpackets - s.rxpackets - s.rxdrop
-            s.lost_percent = s.lost_packets / s.txpackets * 100
+            s.lost_packets = (s.txpackets - s.rxpackets) - s.rxdrop
+            s.lost_percent = (s.txpackets - s.rxpackets) / s.txpackets * 100


AFAIU this is a presentation issue. Right now the load tester will print separately the % of packets dropped by the NIC at ingress, and the % that were lost elsewhere. Your change wants to sum the % dropped at ingress to the % lost elsewhere, right?

Right, I the goal here is to print a definitive lost %. With this change it prints

Loss: #ingress_drop ingress drop + #link_drop packets lost (<total%>)

Before, the percentage only included the #link_drop packets.

ah i see. yes definitely, good change!

eugeneia force-pushed the loadtest-fixes branch from 76f4916 to 55f4052 Compare June 18, 2018 11:15

loadtest find-limit: fix some edge cases and bugs

283d6a8

- Do not succeed if txpackets - txdrop == rxpackets - Do not exclude rxdrop from drop percentage

eugeneia force-pushed the loadtest-fixes branch from 55f4052 to 283d6a8 Compare June 18, 2018 11:17

dpino approved these changes Aug 7, 2018

View reviewed changes

wingo reviewed Aug 16, 2018

View reviewed changes

Merge branch 'master' into loadtest-fixes

3826bcc

eugeneia added the merged label Jul 18, 2019

eugeneia added a commit to eugeneia/snabb that referenced this pull request Jul 18, 2019

Merge PR snabbco#1357 (loadtest find-limit fixes) into max-next

a5247de

eugeneia mentioned this pull request Aug 8, 2019

Merge max-next into next #1439

Merged

eugeneia merged commit 3826bcc into snabbco:master Nov 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loadtest find-limit: fix some edge cases and bugs #1357

loadtest find-limit: fix some edge cases and bugs #1357

eugeneia commented Jun 18, 2018

lukego commented Jul 30, 2018

dpino left a comment •

edited

Loading

wingo Aug 16, 2018

eugeneia Aug 22, 2018

wingo Aug 22, 2018

eugeneia Aug 23, 2018

wingo Aug 16, 2018

eugeneia Aug 22, 2018

eugeneia Aug 22, 2018

wingo Aug 22, 2018

loadtest find-limit: fix some edge cases and bugs #1357

loadtest find-limit: fix some edge cases and bugs #1357

Conversation

eugeneia commented Jun 18, 2018

lukego commented Jul 30, 2018

dpino left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dpino left a comment •

edited

Loading