-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPing core improvements #32
PPing core improvements #32
Conversation
7d7c8fe
to
0a80d22
Compare
Right, so I'm afraid I won't have time to properly review this before the holidays, sorry about that :( Will circle back to it in January :) |
No worries, I've gone on vacation myself now. Merry christmas and happy new year! |
0a80d22
to
732e606
Compare
732e606
to
811ad5b
Compare
Hmm, seems Github didn't pick up on the first bit of this being merged. Could you please do a manual rebase onto master and re-push this? |
811ad5b
to
4a8d9fd
Compare
Hmm strange, maybe because this series of commits was missing the merge-commit? Anyways, have rebased on the new merged master, so should hopefully be fine now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the comments are relatively minor, I guess. My main overall comment is that you still have a tendency to lump unrelated things together in single commits, which makes reviewing harder than it needs to be. When splitting things into commits, try to think of it this way: each commit should be self-contained in that it doesn't require any context other than the code changes themselves and its commit message. You should generally be making the commits as small as possible, but not so small that they lack context. It's fine to do partial refactorings, though, as long as you call out upcoming changes ("change X in preparation for later doing Y"). In general, think of it as minimising the amount of stuff the reviewer has to keep in their head at once (which is also helpful when writing commit messages; they are there to help the reviewer, and yourself when you later go back and run git blame
to figure out why some code is the way it is).
I'm also not a huge fan of the "issues with this commit" items at the bottom of each commit message. I view them as a bit of a cop-out: the issues were obviously not important enough to fix, or you would have done that already. And the commit message should be describing the change you are making, not all the other stuff you thought about while making that change :) Tracking TODO items is better done in a TODO file, or an issue tracker (and feel free to use the issue tracker in this repository, BTW). If you do use a TODO file, add a separate commit at the end of your series updating that; this also helps you scope out the series (or pull request, as it were). Like "This series did X, Y and Z, and now I'm updating the TODO list with future items A, B and C that I will tackle in a subsequent series).
Oh, and the above comments are mostly for future reference - you don't need to restructure the commits in this series! And yeah, splitting changes up in readable chunks can be a lot of tedious work, but it gets easier with practice; you'll learn to think in terms of the changes and how you will describe them, which can also improve the code (since you'll sometimes think about and fix issues while you're doing such a split).
pping/pping_kern.c
Outdated
BPF_NOEXIST) == 0) { | ||
new_flow = true; | ||
|
||
if (fe.event_info.event != FLOW_EVENT_OPENING) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit dodgy; why are you keeping an (uninitialised!) flow even structure in the global function state and filling it in piece-mal, instead of just allocating it in the block where you need it? That seems like it's prone to bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So after cursing my old self any trying to figure out what I was thinking at the time, I believe that I initially did this in some attempt to optimize the code by only filling in the fields that where needed when they were needed. With the commit that made PPing "symmetrical" (doing both timestamping and matching on both ingress and egress) I essentially gave up on this idea, but for some reason kept passing this struct around between functions (even if the function itself filled all the members, meaning it could just as well allocate it itself).
I have addressed this in the commit that makes PPing symmetrical, so it is now only allocated and filled inside functions which directly push it to the perf-buffer (as the only reason I ever fill it is so I can push it on the perf-buffer). I've opted to not go back and try and fix it before this commit however, as I rather not spend energy rewriting code that is overwritten by later changes in this PR anyways (and it was also a problem which existed since before this PR).
pping/pping_kern.c
Outdated
|
||
f_state = bpf_map_lookup_elem(&flow_state, &p_id.flow); | ||
if (!f_state) // Creation failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
State creation failures should probably be notified somehow at least occasionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, ocassionally emit a warning to stderr about failing to track flow due to flow-state map being full? Definitly need to make sure that it goes out with a limited frequency though, so that not every single packet triggers a warning message which would tank performance in a period where PPing is likely already under a heavy load.
Ideally I guess one would perhaps get one message per flow which failed to create a new state, but I don't really see a way to track that I've already emitted a warning for a particular flow without keeping some state for that flow...but the reason for emitting this warning to begin with is that we're out of space for tracking flows.
So simply some time-based limit? Although I'm not really sure what frequency would make sense here. 1000 warnings/sec? 100 warnings/sec? 10 warnings/sec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once a second would probably be fine, at least in interactive mode... You could just keep a timestamp in a global variable and have logic like:
if (!f_state) { // creation failed
if (last_warn_time < now - WARN_INTERVAL) {
emit_warning(); // send a perf event of a special type, for instance
last_warn_time = now;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have added a new commit at the end that should address this. In addition to sending a warning if it fails to create a flow state entry (due to the flow state map being full), it also sends a warning in case it fails to create a timestamp entry.
Both are capped to only emitting one warning per second (currently hard-coded, but could of course be made configurable in case you believe it's something the user might want to change), but tracked separately for the flow state and timestamp map.
pping/pping.c
Outdated
@@ -106,6 +103,7 @@ static const struct option long_options[] = { | |||
{ "cleanup-interval", required_argument, NULL, 'c' }, // Map cleaning interval in s, 0 to disable | |||
{ "format", required_argument, NULL, 'F' }, // Which format to output in (standard/json/ppviz) | |||
{ "ingress-hook", required_argument, NULL, 'I' }, // Use tc or XDP as ingress hook | |||
{ "localfilt-off", no_argument, NULL, 'l' }, // Disable local filtering (will start to report "internal" RTTs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
localfilt-off
is a terrible name :)
How about --include-local
instead? Could this be -l
as the shortopt...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Think I first called it just localfilt, but figured that was a very misleadning name as the option turns it off not on, so my quick-fix was to just append off to the end =) Wanted to keep -l to be consistent with Kathie's PPing, but guess that's still fairly reasonable with --local-filt
(which is also a much more descriptive name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See I ended up writing --local-filt
in my previous comment, but of course meant to change it to your suggested --include-local
which I've now also done.
pping/pping.h
Outdated
@@ -135,4 +138,16 @@ union pping_event { | |||
struct flow_event flow_event; | |||
}; | |||
|
|||
/* | |||
* Copies the src to dest, but swapping place on saddr and daddr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments should explain the "why" of a piece of code, not just reiterate the "what" (for that you can just read the code). This function in itself is obvious enough from the name, but why is it needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have changed the comment to instead try to motivate what the function is useful for.
@@ -25,6 +25,8 @@ | |||
#define AF_INET6 10 | |||
#define MAX_TCP_OPTIONS 10 | |||
|
|||
#define IPV6_FLOWINFO_MASK __cpu_to_be32(0x0FFFFFFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't really know. Pretty much just copied how you filled in the bpf_fib_lookup
in sample/bpf/xdp_fwd_kern.c and then you used this thing so I pretty much followed blindly. Should probably take some time figuring out how bpf_fib_lookup
struct is actually used like, but remeber that I thought the struct seemed rather complicated and was unsure which fields needed to be set and with what.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good idea ;)
It's commented in the struct, to a certain extent: https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/bpf.h#L6121
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have added a comment that this is for getting the flow label + traffic class from the IPv6 header which apparently is used in the lookup.
} | ||
|
||
lookup.family = p_info->pid.flow.ipv; | ||
lookup.l4_protocol = p_info->pid.flow.proto; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you should be filling this if you're not filling in the ports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea, as above, I pretty much just copy-pasted the one example I found of how this thing was used, and there the l4 protocol is filled in while the ports are set to 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure about this one honestly, both examples of it that I've found (xdp_fwd_kern and test_tc_neigh_fib) does it like that. Guess there might be forwarding rules based solely on the protocol, i.e. forward TCP traffic but not UDP traffic, without knowledge of the ports. So have left it like this for now.
Question is if it really matters for our use-case though, which is to find out if the packet is going to the same machine that PPing is running on or not. Perhaps it makes more sense to only provide it with IP addresses, as I guess the protocol shouldn't really affect if the local machine is the intended destination of the packet or not.
pping/pping_kern.c
Outdated
@@ -517,10 +648,11 @@ int pping_tc_egress(struct __sk_buff *skb) | |||
.data_end = (void *)(long)skb->data_end, | |||
.pkt_len = skb->len, | |||
.nh = { .pos = pctx.data }, | |||
.ifindex = skb->ingress_ifindex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting an ifindex
member to the value of ingress_ifindex
in an egress hook should make alarm bells go off ;)
AFAICT you're only using this for the fib lookup, so just leave it unset here (and rename the member to ingress_ifindex
as well, for the avoidance of doubt)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
pping/pping_kern.c
Outdated
{ | ||
struct packet_info p_info = { 0 }; | ||
struct flow_state *f_state; | ||
struct flow_event fe; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though you moved the code around, the comment about having this shared uninitialised variable on the stack still applies :)
pping/pping_kern.c
Outdated
fe->flow_event_type = FLOW_EVENT_OPENING; | ||
fe->reason = f_state->opening_reason; | ||
fe->source = EVENT_SOURCE_PKT_DEST; | ||
fe->reserved = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to illustrate my point about the flow events, the above is basically the same as the full initialiser, so there's no reason to pass around pointers to struct flow_event between the different functions. Just create it on the stack where you need it :)
pping/pping_kern.c
Outdated
if (p_id->flow.proto == IPPROTO_TCP) | ||
err = parse_tcp_identifier(ctx, &saddr->port, &daddr->port, fei, | ||
&p_id->identifier); | ||
else if (p_id->flow.proto == IPPROTO_ICMPV6 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect ICMP packets to only be included if the user asked for it? I.e., there should be a command line option to switch between parsing different packet types (with an 'all' option for all supported flow types I suppose). This also gets you past the problem with the ppviz output not including any protocol information: You could just refuse to output that format if the user selects anything other than TCP :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, while having it optional definitly makes sense, I would assume the "all" would be the default, like it would be for ex pcap (i.e. you can add filters for different protocols if you want, but by default it captures everything it can).
But are you suggesting we add a flag for each supported protocol the user wants to track (plus a flag for all), and then assume TCP if user specifies no flags? Or should TCP always be included by default and we instead have a flag if you explicitly want to exclude TCP, whereas for all the other protocols the flag is used to include it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the first makes most sense. A flag for each protocol, a flag for all, and if nothing is specified for the protocol part use TCP as default. But if one or several protocols are explicitly specified this overrides the default and then you use what is specified. The default should be what you expect to be most commonly used I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any strong opinions on what the default should be, just that it should be configurable :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have gone ahead and added separate flags for enabling TCP (--tcp
/-T
), ICMP (--icmp
/-C
) and a shorthand for all supported protocols (--all
/-A
). If no flag is specified the user is notified that it assumes (only) TCP.
Thanks for the extensive feedback, will try to fix most of it tomorrow. As for "issues with this commit" part, I don't intend for them to be cop outs, but rather I want to highlight things that I think might be potential issues. Genereally I don't consider them critical (in that case I would hopefully have fixed them), but by making them very visibel my hope is essentially that you might disagree and find that it completely breaks the application, in which case we can have a discussion about how to fix it ASAP. I also want to note them down while I still remeber them. That said, I realize that the commit messages might not be the best place for that. While I actually have a TODO list I've been pretty bad at updating it, and generally view it more as a list of additional features rather than for tracking potential issues. I also have a list of potential concurrency issues in the README (which was initially spread out as a bunch of comments in the code), so maybe it would make sense to consolidate all that in a dedicated document, or add a separate section in the TODO list with them? While the issue tracker might also be a good way to do that, I have a feeling that it should mainly be used for issues in the current master, not for potential issues within a PR which may have been fixed by the time they're merged? |
Hmm, if the issues you point out are meant for review, I think it's better to include them in the pull request description (which would be the cover letter in an email submission). If they're in the commit, they're a bit ambiguous as well ("issues with this commit" - does that mean they will be fixed with a subsequent commit?). And if they're longer-term things, put them in the TODO or somewhere similar. And yes, if you already have fixed queued up in a subsequent pull request, there's no reason to open an issue. But then that is what the text in the PR description should say. If they're longer term issues, they can be tracked in issues just fine - and if you're not sure, put them in the PR description and ask :) |
The echoed TCP timestamp (TSecr) is only valid if the ACK flag is set. So make sure to only attempt to match on ACK packets. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Allow pping to passivly monitor RTT for ICMP echo request/reply flows. Use the echo identifier as ports, and echo sequence as packet identifier. Additionally, add protocol to standard output format in order to be able to distinguish between TCP and ICMP flows. The ppviz format does not include protocol, making it impossible to distinguish between TCP and ICMP traffic. Will add warning if ppviz format is used together with ICMP traffic in the future. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
4a8d9fd
to
3ff729d
Compare
Hi again @tohojo. Think I've addressed most of your comments now (other than the main one about grouping unrelated changes together in the commits, as I've not split up any of the existing commits...). I have however removed the "issues with this commit" parts from the commit messages. Most of the commits before "timestamping + matching on both ingress and egress" are largely the same, just that I've added an additional commit there which adds command line flags for which protocols PPing should track. Have done some additional refactoring in the ingress+egress commit to make a separate function for sending flow-events (which allocate their own flow-event structs rather than passing the same one around) and some other minor fixes and cleanup. The "consider flow opened on reply" commit has also undergone a few changes to be more consistent with the changes made in ingress+egress commit. Finally, I've added a couple of commits which adds warnings if case PPing fails to create map entries, as well as a separate commit where I add section on "potential problems" to the TODO document. There I've summarized the various issues were previously part of the commit messages. Finally, some of the other notes that were part of the commit messages before but would probably fit better in the PR description are:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits on the mode selection, but otherwise I think we should get this merged so you can make progress on other items :)
Add command-line flags for each protocol that pping should attempt to parse and report RTTs for (currently -T/--tcp and -C/--icmp). If no protocol is specified assume TCP. To clarify this, output a message before start on how ePPing has been configured (stating output format, tracked protocols and which interface to run on). Additionally, as the ppviz format was only designed for TCP it does not have any field for which protocol an entry belongs to. Therefore, emit a warning in case the user selects the ppviz format with anything other than TCP. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Only push flow events for opening/closing flows if the creation/deletion of the flow-state was successful (as indicated by the bpf_map_*_elem() return value). This should avoid outputting several flow creation/deletion messages in case multiple instances are trying to create/delete a flow concurrently, as could theoretically occur previously. Also set the last_timestamp value before creating a new flow, to avoid a race condition where the userspace cleanup might incorrectly determine that a flow is old before the last_timestamp value can be set. Explicitly skip the rate-limit for the first packet of a new flow to avoid it failing the rate-limit. This also fixes an issue where the first packet of a new flow would previously fail the rate-limit if the rate-limit was higher than current time uptime (CLOCK_MONOTONIC). Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add an option (-R, --rtt-rate) to adapt the rate sampling based on the RTT of the flow. The sampling rate will be C * RTT, where C is a configurable constant (ex 1.0 to get one sample every RTT), and RTT is either the current minimum (default) or smoothed RTT of the flow (chosen via the -t or --rtt-type option). The smoothed RTT (sRTT) is updated for each calculated RTT, and is calculated in a similar manner to srtt in the kernel's TCP stack. The sRTT is a moving average of all RTTs, and is calculated according to the formula: srtt = 7/8 * prev_srtt + 1/8 * rtt To allow the user to pass a non-integer C (ex 0.1 to get 10 RTT samples for every RTT-period), fixed-point arithmetic has been used in the eBPF programs (due to lack of support for floats). The maximum value for C has been limited to 10000 in order for it to be unlikely that the C * RTT calculation will overflow (with C = 10000, overflow will only occur if RTT > 28 seconds). Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Perform both timestamping and matching on both ingress and egress hooks. This makes it more similar to Kathie's pping, allowing the tool to capture RTTs in both directions when deployed on just a single interface. Like Kathie's pping, by default filter out RTTs for packets going to the local machine (will only include local processing delays). This behavior can be disabled by passing the -l/--include-local option. As packets that are timestamped on ingress and matched on egress will include the local machines processing delay, add the "match_on_egress" member to the JSON output that can be used to differentiate between RTTs that include the local processing delay, and those which don't. Finally, report the source and destination addresses from the perspective of the reply packet, rather than the timestamped packet, to be consistent with Kathie's pping. Overall, refactor large parts of pping_kern to allow both timestamping and matching, as well as updating both the flow and reverse flow and handle flow-events related to them, in one go. Also update README to reflect changes. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Wait with sending a flow open message until a reply has been seen for the flow. Likewise, only emit a flow closing event if the flow has first been opened (that is, a reply has been seen). This introduces potential (but unlikely) concurrency issues for flow opening/closing messages which are further described in the README. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Refactor code for how events are handled in the user space application. Preparation for adding an additional event type which should not be handled by the normal functions for printing RTT and flow events. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Send a warning notifying the user that PPing failed to create a flow/timestamp entry due to the corresponding map being full. To avoid sending a warning for every packet, only emit warnings every WARN_MAP_FULL_INTERVAL (which is currently hard-coded to 1s). Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Collect potential issues under a new section in the TODO list. These are issues I generally don't think are that severe, but may still be useful to note down and keep in mind. Move the section on potential concurrency issues from README to the new section in the TODO-list. Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
a042d04
to
b8215e7
Compare
Adds RTT-based sampling, ICMP echo support and making ePPing work in both directions, i.e. doing both timestamping and matching on both ingress and egress.
Also contains a couple of commits with fixes to the BPF code.
Commits based PR#23 but slightly cleaned up.
This PR is based on top of PR#31. On request I could try and rebase it on top of master instead (will likely cause some conflicts), but I expect PR#31 to be merged first at which point I would have to rebase this to the new master which would effectively undo all the work in the first rebase.