Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support Linux stap v3 probes #340

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

aapoalas
Copy link

@aapoalas aapoalas commented Nov 30, 2024

This PR makes the crate capable of generating SystemTap's version 3 probes, also sometimes known as SDTs (no relation to DTrace's USDTs, obviously!). This is very nice on Linux systems that don't have any easy way of installing or building DTrace (such as I-use-ArchLinux-by-the-way). This works on all three of the mechanisms for converting D probe definitions into Rust. It also automatically generates the stap versions of isenabled probes for each probe.

The open questions in this PR as I see them are:

  1. Is there any interest in upstreaming this in the first place?
  2. Currently test_does_it_work and all tests similar to it do not work, as those rely on DTrace to work. On a generic Linux something similar might be doable with perhaps readelf, or perf, or one of the thousand different eBPF toolsets. What would make the most sense?
  3. The methods added for generating GNU assembler argument format are fairly ugly, and I'm not at all sure if they really belong on the DataType enums or if they should go into the stap3.rs file.
  4. What should I call the usdt-impl? stap3? systemtap? ebpf? linux?
  5. Do the isenabled probes make sense here? As I understand it, with DTrace USDT's the isenabled boolean actually lives in the instruction stream, whereas for SDT's the isenabled boolean is read from a mutable static. That sounds like it has a fairly real effect on runtime. If that is the case then it sounds like one would want to give the programmer a way to skip the isenabled probe when it isn't needed.
  6. Is it intended that a Rust probe provider function defined as &u32 gets translated into a probe taking a u32 by value? See this test.

I understand that Oxide doesn't really have skin in the game for expanding the scope this crate to support non-DTrace probes. I'm hoping that you'd still find this potentially worth upstreaming (once the open questions and any issues you have with this PR are resolved): This crate seems to me to be unconditionally the superior way to inject USDTs into Rust code, and it would be great to spread the joy of tracing wider into the Rust ecosystem.

Hoping to hear from you all. Cheers.

@ahl
Copy link
Collaborator

ahl commented Nov 30, 2024

I'll respond in more detail but: really excited that you've taken this up and we are definitely interested in adding this functionality. As you say, it's not a top priority for Oxide but we'll carve out some time as we can to check this out

@ahl
Copy link
Collaborator

ahl commented Dec 5, 2024

  1. Is there any interest in upstreaming this in the first place?

Yes. We would love it if users of this crate could produce useful probes on Linux. This is--in part--why we eschewed the name dtrace and chose usdt instead.

  1. Currently test_does_it_work and all tests similar to it do not work, as those rely on DTrace to work. On a generic Linux something similar might be doable with perhaps readelf, or perf, or one of the thousand different eBPF toolsets. What would make the most sense?

I think something in the eBPF universe would make sense? Something that a typical user might employ. Do you have suggestions?

  1. The methods added for generating GNU assembler argument format are fairly ugly, and I'm not at all sure if they really belong on the DataType enums or if they should go into the stap3.rs file.

My inclination would be to keep as much of the specific code in the stap3.rs file (and other implementation-specific files such as no-linker.rs) as appropriate.

  1. What should I call the usdt-impl? stap3? systemtap? ebpf? linux?

I think stap3 makes as much sense as anything. Can you walk through consideratwions?

  1. Do the isenabled probes make sense here? As I understand it, with DTrace USDT's the isenabled boolean actually lives in the instruction stream, whereas for SDT's the isenabled boolean is read from a mutable static. That sounds like it has a fairly real effect on runtime. If that is the case then it sounds like one would want to give the programmer a way to skip the isenabled probe when it isn't needed.

I see: so with the stap implementation always performs a load from that mutable static? This seems.. nuts? Can I possibly be understanding this properly??

  1. Is it intended that a Rust probe provider function defined as &u32 gets translated into a probe taking a u32 by value? See this test.

Yes... I think. @bnaecker ?

I understand that Oxide doesn't really have skin in the game for expanding the scope this crate to support non-DTrace probes. I'm hoping that you'd still find this potentially worth upstreaming (once the open questions and any issues you have with this PR are resolved): This crate seems to me to be unconditionally the superior way to inject USDTs into Rust code, and it would be great to spread the joy of tracing wider into the Rust ecosystem.

Are there other crates people use for Linux / eBPF / Stap / etc?

Copy link
Collaborator

@ahl ahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good progress. is there a good overview of the mechanism? Would it be appropriate to include an overview of how this thing works or links out? I can't recall what we have in linker.rs and no-linker.rs but certainly that would be warranted there as well.

dtrace-parser/src/lib.rs Outdated Show resolved Hide resolved
usdt-impl/src/stap3.rs Outdated Show resolved Hide resolved
@bnaecker
Copy link
Collaborator

bnaecker commented Dec 6, 2024

Is it intended that a Rust probe provider function defined as &u32 gets translated into a probe taking a u32 by value? See this test.

Yes... I think. @bnaecker ?

Yes, this is mostly the case. We do consider that probe to ultimately have a 32-bit, unsigned integer type, not a pointer. When we emit the probe macro itself, we try to borrow the input, but we then dereference the borrow anyway, passing a copy. For these native integer types, it doesn't matter so much since everything fits in a register. We take more care with strings to pass a pointer, but we also need to null-terminate the data anyway, which necessitates another copy.

Hope that answers your questions, but happy to dig more into the details if needed!

Copy link
Collaborator

@bnaecker bnaecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on, we're certainly excited about adding a new platform!

Most of the code looks pretty reasonable to me. I have a few questions about documentation and other small suggestions, but the big thing is testing. You asked about the "does it work" test -- we should definitely try to replicate that on Linux. That test works by calling out to the dtrace command-line utility, and checking that the probes actually fire with the expected data. Is there a similar CLI tool for SystemTap?

usdt-impl/src/empty.rs Outdated Show resolved Hide resolved
usdt-impl/src/common.rs Outdated Show resolved Hide resolved
dtrace-parser/src/lib.rs Outdated Show resolved Hide resolved
dtrace-parser/src/lib.rs Outdated Show resolved Hide resolved
usdt-impl/src/stap3.rs Outdated Show resolved Hide resolved
usdt-impl/src/stap3.rs Outdated Show resolved Hide resolved
@aapoalas
Copy link
Author

aapoalas commented Dec 6, 2024

  1. I think something in the eBPF universe would make sense? Something that a typical user might employ. Do you have suggestions?

I will have to look into this a bit. I'd like to use just readelf but it might be that this will need to be SystemTap. The reason being that does_it_work relies on the register_probes method working. On Linux what one would usually do is use some CLI command to scan probes out of an executable. Also, as is tradition, at least I am not aware of there being a singular registry of USDT probes and instead each program might well have its own. If that is the case, then my guess would be that SystemTap is the one most likely to have a /dev/dtrace/helper equivalent somewhere.

If no equivalent filesystem based API for registering the probes can be found, then the Linux-equivalent test needs to be written in an entirely different way where it depends its own executable path... Though actually now that I think about it, that's probably not too bad? Anyway, I'll look into this.

  1. I think stap3 makes as much sense as anything. Can you walk through considerations?

So I've been using this official document and this header-only implementation as my sources. stap3 was my first idea based on the official documentation since it mentions "Only version 3 is described here." And of course "stap" comes from "SystemTap". Thinking on this now, I might prefer changing this to stapsdt so that it better matches readelf output (NT_STAPSDT): The version number is a minor detail that I doubt is very widely thought about, given that the version 3 feature request was closed as done in 2010.

systemtap would of course be more self-explanatory and have better SEO, for the little that is worth. But, this might also lead people astray into thinking that this won't work with eBPF.

ebpf is a "jump on the bandwagon" kind of name. Most Linux tracing seems to prefer eBPF nowadays (when not using plain perf) and this would show the crate to be part of that movement. But, this of course also leads astray by excluding SystemTap and perf.

linux would probably be a misleading catch-all. I'd have to dig into the dtrace4linux code to find out what USDT format they use, but I assume there the format is DTrace with minor changes to make the ELF sections work on Linux. If that is the case, then this would obviously be misleading as the correct format for Linux would depend on whether you're using DTrace or perf/SystemTap/eBPF. If dtrace4linux (and Oracle's DTrace on Linux) all use the SystemTap format then this would of course be a perfect option.

Personally: I think I'm leaning into stapsdt. (And I renamed the files to this.) It's... not self-explanatory, but it doesn't particularly need to be. And it doesn't lead astray in any way.

  1. I see: so with the stap implementation always performs a load from that mutable static? This seems.. nuts? Can I possibly be understanding this properly??

It does seem a bit nutty, doesn't it? But this is my understanding of the STAPSDT probes. The documentation mentions:

Semaphores are treated as a counter; your tool should increment the semaphore to enable it, and decrement the semaphore when finished.

A counter in the instruction stream sounds like a bad idea, so this basically has to be in the data. And of course it can't be on the stack so either the heap or statics it is. Between those two I know I'd pick statics any day. The header-only implementation seems to agree. Another Rust impl actually directly uses a static mut per call-site, though this causes problems.

Are there other crates people use for Linux / eBPF / Stap / etc?

There are at least sonde and probe. Sonde builds a C library that exposes functions that call the actual probes, and calls those functions through FFI from Rust. This is a perfectly working solution and probably gets compiled down to reasonable stuff with LTO or at least PGO, but I'm not a big fan of having to cross the FFI boundary there. Probe generates STAPSDT probes directly the same way usdt does, but the argument handling at least seems to be pretty bare-bones.

Copy link
Author

@aapoalas aapoalas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comments! I've addressed them now.

usdt-impl/src/stap3.rs Outdated Show resolved Hide resolved
usdt-impl/src/stap3.rs Outdated Show resolved Hide resolved
usdt-impl/src/stap3.rs Outdated Show resolved Hide resolved
usdt-impl/src/empty.rs Outdated Show resolved Hide resolved
usdt-impl/src/common.rs Outdated Show resolved Hide resolved
dtrace-parser/src/lib.rs Outdated Show resolved Hide resolved
dtrace-parser/src/lib.rs Outdated Show resolved Hide resolved
@aapoalas
Copy link
Author

aapoalas commented Dec 6, 2024

You asked about the "does it work" test -- we should definitely try to replicate that on Linux. That test works by calling out to the dtrace command-line utility, and checking that the probes actually fire with the expected data. Is there a similar CLI tool for SystemTap?

I was unable to find an easy answer to this: For one, the test is taking advantage of the register_probes function writing directly into DTrace's helper file to register the probes. I looked into Linux perf's perf buildid-cache command and it has something similar but it's not a single file but has at least a few parameters. I also tried looking into SystemTap itself but was unable to find anything that would directly point towards such a registering API existing.

Of course, even if such a method to register USDTs exists and is shared across all consumers on Linux (SystempTap, perf, eBPF, ...), I'd still need to figure out the data format and do the writing, which I expect would be a fair bit of work as well. Given that eg. Brendan Gregg's Linux perf USDT notes starts with calling perf buildid-cache, I doubt there is really a feasible way to have the program itself register its own probes the way register_probes does.

As such, I implemented the does_it_work test using readelf -n current_exe. It wasn't too bad to do. I'll need to add cfg's to run the test on Linux and skip it otherwise. Right now I just commented out the original test. And then I need to fix all the other tests that do something similar.

@aapoalas
Copy link
Author

aapoalas commented Jan 1, 2025

Reminder: I'm waiting for review on this, specifically concerning the review fixes, if going forward with using readelf looks like it make sense to you, and how specializing the tests for Linux should be done.

@andreimatei
Copy link

Please excuse the drive-by comment from someone trying to understand the USDT ecosystem.

If dtrace4linux (and Oracle's DTrace on Linux) all use the SystemTap format then this would of course be a perfect option.

Oracle's DTrace for Linux does not support the SystemTap format (i.e. the .note.stapsdt ELF section), does it? Instead, it supports the DTrace Object Format, right?

@aapoalas
Copy link
Author

Please excuse the drive-by comment from someone trying to understand the USDT ecosystem.

At least from my point of view, none of that. Thanks for the comment.

Oracle's DTrace for Linux does not support the SystemTap format (i.e. the .note.stapsdt ELF section), does it? Instead, it supports the DTrace Object Format, right?

I'm not sure, but that is probably exactly correct. Linux as usual is then plagued by two competing standards :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants