-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support Linux stap v3 probes #340
base: master
Are you sure you want to change the base?
Conversation
I'll respond in more detail but: really excited that you've taken this up and we are definitely interested in adding this functionality. As you say, it's not a top priority for Oxide but we'll carve out some time as we can to check this out |
Yes. We would love it if users of this crate could produce useful probes on Linux. This is--in part--why we eschewed the name
I think something in the eBPF universe would make sense? Something that a typical user might employ. Do you have suggestions?
My inclination would be to keep as much of the specific code in the stap3.rs file (and other implementation-specific files such as no-linker.rs) as appropriate.
I think stap3 makes as much sense as anything. Can you walk through consideratwions?
I see: so with the stap implementation always performs a load from that mutable static? This seems.. nuts? Can I possibly be understanding this properly??
Yes... I think. @bnaecker ?
Are there other crates people use for Linux / eBPF / Stap / etc? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good progress. is there a good overview of the mechanism? Would it be appropriate to include an overview of how this thing works or links out? I can't recall what we have in linker.rs and no-linker.rs but certainly that would be warranted there as well.
Yes, this is mostly the case. We do consider that probe to ultimately have a 32-bit, unsigned integer type, not a pointer. When we emit the probe macro itself, we try to borrow the input, but we then dereference the borrow anyway, passing a copy. For these native integer types, it doesn't matter so much since everything fits in a register. We take more care with strings to pass a pointer, but we also need to null-terminate the data anyway, which necessitates another copy. Hope that answers your questions, but happy to dig more into the details if needed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on, we're certainly excited about adding a new platform!
Most of the code looks pretty reasonable to me. I have a few questions about documentation and other small suggestions, but the big thing is testing. You asked about the "does it work" test -- we should definitely try to replicate that on Linux. That test works by calling out to the dtrace
command-line utility, and checking that the probes actually fire with the expected data. Is there a similar CLI tool for SystemTap?
I will have to look into this a bit. I'd like to use just readelf but it might be that this will need to be SystemTap. The reason being that If no equivalent filesystem based API for registering the probes can be found, then the Linux-equivalent test needs to be written in an entirely different way where it depends its own executable path... Though actually now that I think about it, that's probably not too bad? Anyway, I'll look into this.
So I've been using this official document and this header-only implementation as my sources.
Personally: I think I'm leaning into
It does seem a bit nutty, doesn't it? But this is my understanding of the STAPSDT probes. The documentation mentions:
A counter in the instruction stream sounds like a bad idea, so this basically has to be in the data. And of course it can't be on the stack so either the heap or statics it is. Between those two I know I'd pick statics any day. The header-only implementation seems to agree. Another Rust impl actually directly uses a
There are at least sonde and probe. Sonde builds a C library that exposes functions that call the actual probes, and calls those functions through FFI from Rust. This is a perfectly working solution and probably gets compiled down to reasonable stuff with LTO or at least PGO, but I'm not a big fan of having to cross the FFI boundary there. Probe generates STAPSDT probes directly the same way usdt does, but the argument handling at least seems to be pretty bare-bones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comments! I've addressed them now.
I was unable to find an easy answer to this: For one, the test is taking advantage of the Of course, even if such a method to register USDTs exists and is shared across all consumers on Linux (SystempTap, perf, eBPF, ...), I'd still need to figure out the data format and do the writing, which I expect would be a fair bit of work as well. Given that eg. Brendan Gregg's Linux perf USDT notes starts with calling As such, I implemented the |
Reminder: I'm waiting for review on this, specifically concerning the review fixes, if going forward with using readelf looks like it make sense to you, and how specializing the tests for Linux should be done. |
Please excuse the drive-by comment from someone trying to understand the USDT ecosystem.
Oracle's DTrace for Linux does not support the SystemTap format (i.e. the |
At least from my point of view, none of that. Thanks for the comment.
I'm not sure, but that is probably exactly correct. Linux as usual is then plagued by two competing standards :) |
This PR makes the crate capable of generating SystemTap's version 3 probes, also sometimes known as SDTs (no relation to DTrace's USDTs, obviously!). This is very nice on Linux systems that don't have any easy way of installing or building DTrace (such as I-use-ArchLinux-by-the-way). This works on all three of the mechanisms for converting D probe definitions into Rust. It also automatically generates the stap versions of
isenabled
probes for each probe.The open questions in this PR as I see them are:
test_does_it_work
and all tests similar to it do not work, as those rely on DTrace to work. On a generic Linux something similar might be doable with perhaps readelf, or perf, or one of the thousand different eBPF toolsets. What would make the most sense?stap3.rs
file.stap3
?systemtap
?ebpf
?linux
?isenabled
probes make sense here? As I understand it, with DTrace USDT's theisenabled
boolean actually lives in the instruction stream, whereas for SDT's theisenabled
boolean is read from a mutable static. That sounds like it has a fairly real effect on runtime. If that is the case then it sounds like one would want to give the programmer a way to skip theisenabled
probe when it isn't needed.&u32
gets translated into a probe taking au32
by value? See this test.I understand that Oxide doesn't really have skin in the game for expanding the scope this crate to support non-DTrace probes. I'm hoping that you'd still find this potentially worth upstreaming (once the open questions and any issues you have with this PR are resolved): This crate seems to me to be unconditionally the superior way to inject USDTs into Rust code, and it would be great to spread the joy of tracing wider into the Rust ecosystem.
Hoping to hear from you all. Cheers.