Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPing - Improve map cleanup #34

Merged
merged 6 commits into from
Mar 10, 2022

Conversation

simosund
Copy link
Contributor

These commits moves the cleanup logic for stale entires from userspace to a BPF iter program. So instead of userspace periodically looping through the maps clearing out old entries, it only triggers the BPF programs periodically, and the BPF program iterates through all the entries removing those which are considered old.

Also adds some slightly more complex logic for when an entry can be removed, for example ICMP flows and unopened TCP flows can be cleared faster than an open TCP flow, and timestamp entries with low RTTs can be cleared much faster than the static 10s limit.

All these commits were created with the changes from PR#23 in mind, more specifically with those that are now in PR#32. While the first two commits could potentially work without PR#32, the commit with more aggressive cleanup is based on the existance of ex. ICMP flows and "unopened" flows, and therefore makes little sense to merge before PR#32. I will therefore leave this as a draft for now.

@simosund
Copy link
Contributor Author

Now that PR#32 has been merged I will rebase this and probably do some minor touchups. Once I'm done with that I will remove the draf-status.

@simosund simosund force-pushed the pping_better_map_cleaning branch from b12cf97 to 4ae060e Compare February 17, 2022 14:09
@simosund
Copy link
Contributor Author

I have now rebased this PR on the current master, done a little cleanup based on comments from previous PRs, and also changed the a couple of things I thought was a bit clumsy in my initial PR.

For reference, the state of this PR before these changes can be found on this branch (which has not been rebased). The first major change from this old version is that I used to send the debug information from the BPF iter programs to the userspace process through an additional map that I would read right after I had triggered the BPF iter programs that updated them in userspace. I now instead treat the debug information as a new type and simply push them through the perf-buffer like all other evens, which requires much fewer changes in the userspace code.

The second major change from the old version is related to adding RTT-based timeouts to the timestamp entries. In the old version I added the current RTT to every timestamp entry to avoid doing a flow lookup for each timestamp entry in the BPF iter program. However, this adds additional information stored for each timestamp entry, and as the normal ePPing program does a flow lookup for every (valid) packet anyways, the overhead of doing a lookup for a few thousand entries every cleanup period (by default 1 second) is likely negligible. So I changed my mind on this and decided to leave the timestamp entries unchanged and simply do the lookup in the BPF-iter program instead.

@simosund simosund marked this pull request as ready for review February 17, 2022 14:40
@simosund simosund force-pushed the pping_better_map_cleaning branch from 4ae060e to 9821231 Compare February 17, 2022 14:55
Copy link
Member

@tohojo tohojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a few nits on this one :)

pping/pping.c Outdated
}
linfo.map.map_fd = bpf_object__find_map_fd_by_name(obj, map_name);
if (linfo.map.map_fd < 0)
return ERR_PTR(linfo.map.map_fd);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're making a lot of assumptions about the inner workings of libbpf_get_error() here (and below). This is bound to break (its semantics are already changing upstream), so I don't think this is a good idea (and it also happens to be a bit ugly).

I'd suggest you change this function to return an int error message, and add a struct bpf_link **link parameter to put the output in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have changed as you suggested to simply returning an int error message and instead pass out the link reference through a parameter. Still use libbpf_get_error() to catch errors for functions that return pointers (as I assume that's what the function is for), but no longer attempt to return error codes masked as pointers myself.

While fixing I also noticed another bug which I fixed. The if (linfo.map.map_fd < 0) check would always fail as linfo.map.map_fd is a __u32 and thus cannot be negative. Therefore I now first save the result from bpf_object__find_map_fd_by_name() in an int which I use for the error check, before assigning it linfo. Alternatively I guess I could also just cast linfo.map.map_fd for the check, but then I have to assume that ints are always 32 bits.

@simosund simosund force-pushed the pping_better_map_cleaning branch from 9821231 to 54cf69f Compare March 8, 2022 09:57
Copy link
Member

@tohojo tohojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better, but there's a bug; and spotted another issue while looking at the code again...

pping/pping.c Outdated
return (__u64)t.tv_sec * NS_PER_SECOND + (__u64)t.tv_nsec;
}
*link = bpf_program__attach_iter(prog, &iter_opts);
err = libbpf_get_error(link);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the wrong pointer (link, not *link). But rather than fixing that, it's better to use a local stack variable to store the return from bpf_program__attach_iter() and only write to *link after you've done the error check. That way the caller can be sure that nothing will be written to the pointer passed in unless the operation succeeds, which is standard practice for APIs like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea. Have changed this now.

pping/pping.c Outdated
fprintf(stderr,
"Failed attaching cleanup program to flow map: %s\n",
get_libbpf_strerror(err));
return err;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaks the link created above...

In fact I don't think you ever destroy the links properly? There should be some calls to bpf_link__destroy() somewhere on shutdown...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is indeed the case. I haven't really fully grasped what these links even are and guess I pretty much just assumed they would automatically be destroyed and cleaned up on program exit together with all the other bpf objects (programs, maps etc. that aren't pinned).

To fix this I will need to restructure the code a bit to expose the links outside of this function so that I can access them later.

However, while thinking about this I just noticed what I think should be a much bigger error in the same function (setup_periodical_map_cleaning()). I allocate the arguments (including these links) which will be used in a different thread on the stack... So the arguments my cleanup function (periodic_map_cleanup()) use in a separate thread may be overwritten by whatever other functions I call in main later. Either I'm just not thinking straight right now, or it's a miracle that this hasn't caused all sorts of funky errors already.

I should probably also handle my cleanup thread on shutdown, so I don't just pull out these links underneath it and hope it never has time to notice before the entire program terminates. So will go ahead and add a separate commit at the beginning of this PR that address these issues, then I can clean up the links in a sensible manner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with other words, it's not really ready for re-review yet as I have not properly addressed your point on cleaning up the links. But before I do that I want this issue with the arguments for separate thread being on the stack of a non-main function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sounds good. Let me know when you want me to take another look...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function (setup_periodical_map_cleaning) should now properly destroy the links if fails. Furthermore, the links should be destroyed in the final teardown of the program (as long as they've been set up that is).

@simosund simosund force-pushed the pping_better_map_cleaning branch from 54cf69f to d081abc Compare March 8, 2022 18:29
simosund added 2 commits March 9, 2022 14:54
Allocate the clean_args on the stack of the main function rather than
the stack of setup_periodical_map_cleaning. The clean_args is used by
periodical_map_cleanup from a different thread, so allocating them on
stack for setup_periodical_map_cleaning which goes out of scope
directly after the thread is created opens up for errors where later
function calls may overwrite the arguments, causing unpredictable
behavior.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Explicitly stop the thread which performs periodical map cleanup on
shutdown, before attempting to free up any resources it might use.

For the cleanup order to make more sense, also setup the perf-buffer
before setting up the periodical map cleaning, so that the thread can
be stopped as the first part of the cleanup. This also matches better
with the next couple of commits where map cleaning debug information
will be pushed through the perf-buffer.

Finally, move the addition of the signalhandler earlier in the code to
eliminate a window where it was possible to terminate the program
without relevant cleanup code having a chance to run.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
@simosund simosund force-pushed the pping_better_map_cleaning branch from d081abc to 7bf5316 Compare March 9, 2022 18:30
To improve the performance of the map cleanup, switch from the
user-spaced loop to using BPF iterators. With BPF iterators, a BPF
program can be run on each element in the map, and can thus be done in
kernel-space. This should hopefully also avoid the issue the previous
userspace loop had with resetting in case an element was removed by
the BPF programs during the cleanup.

Due to removal of userspace logic for map cleanup, no longer provide
any debug information about how many entires there are in each map and
how many of them were removed by the garbage collection. This will be
added back in the next commit.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add some debug info to the periodical map cleanup process. Push debug
information through the events perf buffer by using newly added
map_clean_event.

The old user space map cleanup process had some simple debug
information that was lost when transitioning to using bpf_iter
instead. Therefore, add back similar (but more extensive) debug
information but now collected from the BPF-side. In addition to stats
on entries deleted by the cleanup process, also include stats on
entries deleted by ePPing itself due to matching (for timestamp
entries) or detecting FIN/RST (for flow entries)

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Add conditions that allows removing old flow and timestamp entries
sooner.

For flow map, have added conditions that allow unopened flows and ICMP
flows to be removed earlier than open TCP flows (currently both set to
30 sec instead of 300 sec).

For timestamp entries, allow them to be removed if they're more than
TIMESTAMP_RTT_LIFETIME (currently 8) times higher than the flow's
sRTT.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
Also add description of three potential issues introduced by the
changes to cleanup process.

Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
@simosund simosund force-pushed the pping_better_map_cleaning branch from 7bf5316 to e7201c9 Compare March 10, 2022 09:00
@simosund
Copy link
Contributor Author

Hi again @tohojo, I've now added two commits at the beginning of this PR that fix some errors related to the setup and teardown of the map cleaning in the userspace program. The later commits are largely the same, but builds on top of these commits to properly destroy the links as you pointed out in your comment.

Copy link
Member

@tohojo tohojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, just one small comment / question, otherwise LGTM.

For future reference, though, if you find yourself listing lots of changes done in a single commit (like in the second commit in this series), that is a hint that the commit is not really a single self-contained unit and you might want to split each change into a separate commit. Look for things like "additionally" and "finally" in the commit message to detect his :)

@tohojo tohojo merged commit a6e4dd8 into xdp-project:master Mar 10, 2022
@simosund simosund deleted the pping_better_map_cleaning branch March 22, 2022 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants