Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loadtest find-limit: fix some edge cases and bugs #1357

Merged
merged 2 commits into from
Nov 8, 2019

Conversation

eugeneia
Copy link
Member

  • Do not succeed if txpackets - txdrop == rxpackets

  • Do not exclude rxdrop from drop percentage

This also adjusts the success test to be satisfied with rxpackets >= txpackets instead of the original rxpackets == txpackets.

While having a great time with snabb loadtest, there are a few things that stood out to me which are possibly worth discussing:

  • Maybe all stats should be read directly off the NIC? i.e., an edge case is if you use a pcap file with packets < 60 bytes, then loadtest will not correctly report actual load applied. Also, since loadtest uses nic:rxdrop() for drop stats already I feel like mixing those with link stats introduces some unnecessary error due to the buffering characteristics of links. On the other hand RateLimitedRepeater would then also need to be aware of the padding of small packets, and find-limit would then have to compare the "real" txgbps with what it tried to apply to reject cases where the TX NIC is the bottleneck, or rather assert that rxgbps >= applied_gbps.

  • Negative loss rates due to imperfect measurements are weird to look at. We could instead lie and pretend the loss rate is 0 and ignore any superfluous packets (i.e., rxpackets = math.min(rxpackets, txpackets)). On the other hand it describes a real phenomenon that users of loadtest possibly should be subjected to?

WDYT? Cc @dpino @xray7224 @wingo

 - Do not succeed if txpackets - txdrop == rxpackets

 - Do not exclude rxdrop from drop percentage
@lukego
Copy link
Member

lukego commented Jul 30, 2018

Any feedback on this @dpino @xray7224?

Copy link
Contributor

@dpino dpino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first sight LGTM, but I prefer to wait for @xray7224's review before merging it.

If I understand correctly what the patch does is relaxing the condition that checks for a successful state. Before if more packets were received than sent, the app reported no success. However, I think this is possible in some scenarios (for instance when receiving garbage control packets).

About whether to use the NIC registers or the links stats, I don't have an opinion.

success = (diff.rxpackets == diff.txpackets and diff.rxdrop == 0) and success
success = (diff.rxpackets >= diff.txpackets)
and diff.rxdrop == 0 and diff.txdrop == 0
and success
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoudn't it still be diff.rxpackets == diff.txpackets ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does diff.rxpackets > diff.txpackets tell us? Should it signify failure? In my experience this does happen (I have so far attributed it to measurement inaccuracy, due to queueing and whatnot) and feel like it does not imply failure.

In my mind, the only definitive predicate we have is that

  • diff.rxpackets < diff.txpackets → failure

so the predicate for success should be the inverse of that, i.e.

  • diff.rxpackets >= diff.txpackets → success

I might be missing something but there goes my train of thought.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, for me for two boxes that are directly linked together, I expect rxpackets == txpackets as the sole success condition. That's what I see when I run tests. I never see rxpackets > txpackets. When would that be OK?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I experience different behavior of snabb loadtest on master, e.g. here on a Intel i350:

dyser$ sudo ./snabb loadtest find-limit -b 1e9 ./program/snabbnfv/test_fixtures/pcap/http_google.pcap A B 22:00.1 ./program/snabbnfv/test_fixtures/pcap/http_google.pcap B A 23:00.1
Warming up at 1.000000 Gb/s for 5 seconds.
Applying 0.500000 Gbps of load.
  A:
    TX 116280 packets (0.116280 MPPS), 59709780 bytes (0.500004 Gbps)
    RX 118864 packets (0.118864 MPPS), 61217340 bytes (0.512561 Gbps)
    Loss: 0 ingress drop + -2584 packets lost (-2.222222%)
  B:
    TX 116280 packets (0.116280 MPPS), 59709780 bytes (0.500004 Gbps)
    RX 118862 packets (0.118862 MPPS), 61215820 bytes (0.512548 Gbps)
    Loss: 0 ingress drop + -2582 packets lost (-2.220502%)
Failed; 2 retries remaining.
Applying 0.500000 Gbps of load.
  A:
    TX 116280 packets (0.116280 MPPS), 59709780 bytes (0.500004 Gbps)
    RX 116486 packets (0.116486 MPPS), 59990001 bytes (0.502285 Gbps)
    Loss: 0 ingress drop + -206 packets lost (-0.177159%)
  B:
    TX 116280 packets (0.116280 MPPS), 59709780 bytes (0.500004 Gbps)
    RX 116484 packets (0.116484 MPPS), 59989260 bytes (0.502279 Gbps)
    Loss: 0 ingress drop + -204 packets lost (-0.175439%)
Failed; 1 retries remaining.
...

I attributed this to measurement inaccuracy, but it could also be a bug somewhere I suppose? My thinking is that even if we read all stats off the NIC counters (which I propose) there can be discrepancies due to say aliasing of the timers that sync NIC stats. \o/

s.lost_packets = s.txpackets - s.rxpackets - s.rxdrop
s.lost_percent = s.lost_packets / s.txpackets * 100
s.lost_packets = (s.txpackets - s.rxpackets) - s.rxdrop
s.lost_percent = (s.txpackets - s.rxpackets) / s.txpackets * 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU this is a presentation issue. Right now the load tester will print separately the % of packets dropped by the NIC at ingress, and the % that were lost elsewhere. Your change wants to sum the % dropped at ingress to the % lost elsewhere, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I the goal here is to print a definitive lost %. With this change it prints

Loss: #ingress_drop ingress drop + #link_drop packets lost (<total%>)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, the percentage only included the #link_drop packets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see. yes definitely, good change!

eugeneia added a commit to eugeneia/snabb that referenced this pull request Jul 18, 2019
@eugeneia eugeneia merged commit 3826bcc into snabbco:master Nov 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants