feat: Support Linux stap v3 probes #340

aapoalas · 2024-11-30T20:35:15Z

This PR makes the crate capable of generating SystemTap's version 3 probes, also sometimes known as SDTs (no relation to DTrace's USDTs, obviously!). This is very nice on Linux systems that don't have any easy way of installing or building DTrace (such as I-use-ArchLinux-by-the-way). This works on all three of the mechanisms for converting D probe definitions into Rust. It also automatically generates the stap versions of isenabled probes for each probe.

The open questions in this PR as I see them are:

Is there any interest in upstreaming this in the first place?
Currently test_does_it_work and all tests similar to it do not work, as those rely on DTrace to work. On a generic Linux something similar might be doable with perhaps readelf, or perf, or one of the thousand different eBPF toolsets. What would make the most sense?
The methods added for generating GNU assembler argument format are fairly ugly, and I'm not at all sure if they really belong on the DataType enums or if they should go into the stap3.rs file.
What should I call the usdt-impl? stap3? systemtap? ebpf? linux?
Do the isenabled probes make sense here? As I understand it, with DTrace USDT's the isenabled boolean actually lives in the instruction stream, whereas for SDT's the isenabled boolean is read from a mutable static. That sounds like it has a fairly real effect on runtime. If that is the case then it sounds like one would want to give the programmer a way to skip the isenabled probe when it isn't needed.
Is it intended that a Rust probe provider function defined as &u32 gets translated into a probe taking a u32 by value? See this test.

I understand that Oxide doesn't really have skin in the game for expanding the scope this crate to support non-DTrace probes. I'm hoping that you'd still find this potentially worth upstreaming (once the open questions and any issues you have with this PR are resolved): This crate seems to me to be unconditionally the superior way to inject USDTs into Rust code, and it would be great to spread the joy of tracing wider into the Rust ecosystem.

Hoping to hear from you all. Cheers.

ahl · 2024-11-30T23:30:45Z

I'll respond in more detail but: really excited that you've taken this up and we are definitely interested in adding this functionality. As you say, it's not a top priority for Oxide but we'll carve out some time as we can to check this out

ahl · 2024-12-05T23:36:47Z

Is there any interest in upstreaming this in the first place?

Yes. We would love it if users of this crate could produce useful probes on Linux. This is--in part--why we eschewed the name dtrace and chose usdt instead.

Currently test_does_it_work and all tests similar to it do not work, as those rely on DTrace to work. On a generic Linux something similar might be doable with perhaps readelf, or perf, or one of the thousand different eBPF toolsets. What would make the most sense?

I think something in the eBPF universe would make sense? Something that a typical user might employ. Do you have suggestions?

The methods added for generating GNU assembler argument format are fairly ugly, and I'm not at all sure if they really belong on the DataType enums or if they should go into the stap3.rs file.

My inclination would be to keep as much of the specific code in the stap3.rs file (and other implementation-specific files such as no-linker.rs) as appropriate.

What should I call the usdt-impl? stap3? systemtap? ebpf? linux?

I think stap3 makes as much sense as anything. Can you walk through consideratwions?

Do the isenabled probes make sense here? As I understand it, with DTrace USDT's the isenabled boolean actually lives in the instruction stream, whereas for SDT's the isenabled boolean is read from a mutable static. That sounds like it has a fairly real effect on runtime. If that is the case then it sounds like one would want to give the programmer a way to skip the isenabled probe when it isn't needed.

I see: so with the stap implementation always performs a load from that mutable static? This seems.. nuts? Can I possibly be understanding this properly??

Is it intended that a Rust probe provider function defined as &u32 gets translated into a probe taking a u32 by value? See this test.

Yes... I think. @bnaecker ?

I understand that Oxide doesn't really have skin in the game for expanding the scope this crate to support non-DTrace probes. I'm hoping that you'd still find this potentially worth upstreaming (once the open questions and any issues you have with this PR are resolved): This crate seems to me to be unconditionally the superior way to inject USDTs into Rust code, and it would be great to spread the joy of tracing wider into the Rust ecosystem.

Are there other crates people use for Linux / eBPF / Stap / etc?

ahl

good progress. is there a good overview of the mechanism? Would it be appropriate to include an overview of how this thing works or links out? I can't recall what we have in linker.rs and no-linker.rs but certainly that would be warranted there as well.

dtrace-parser/src/lib.rs

usdt-impl/src/stap3.rs

bnaecker · 2024-12-06T01:01:32Z

Is it intended that a Rust probe provider function defined as &u32 gets translated into a probe taking a u32 by value? See this test.

Yes... I think. @bnaecker ?

Yes, this is mostly the case. We do consider that probe to ultimately have a 32-bit, unsigned integer type, not a pointer. When we emit the probe macro itself, we try to borrow the input, but we then dereference the borrow anyway, passing a copy. For these native integer types, it doesn't matter so much since everything fits in a register. We take more care with strings to pass a pointer, but we also need to null-terminate the data anyway, which necessitates another copy.

Hope that answers your questions, but happy to dig more into the details if needed!

bnaecker

Thanks for taking this on, we're certainly excited about adding a new platform!

Most of the code looks pretty reasonable to me. I have a few questions about documentation and other small suggestions, but the big thing is testing. You asked about the "does it work" test -- we should definitely try to replicate that on Linux. That test works by calling out to the dtrace command-line utility, and checking that the probes actually fire with the expected data. Is there a similar CLI tool for SystemTap?

usdt-impl/src/empty.rs

usdt-impl/src/common.rs

dtrace-parser/src/lib.rs

usdt-impl/src/stap3.rs

aapoalas · 2024-12-06T08:23:34Z

I think something in the eBPF universe would make sense? Something that a typical user might employ. Do you have suggestions?

I will have to look into this a bit. I'd like to use just readelf but it might be that this will need to be SystemTap. The reason being that does_it_work relies on the register_probes method working. On Linux what one would usually do is use some CLI command to scan probes out of an executable. Also, as is tradition, at least I am not aware of there being a singular registry of USDT probes and instead each program might well have its own. If that is the case, then my guess would be that SystemTap is the one most likely to have a /dev/dtrace/helper equivalent somewhere.

If no equivalent filesystem based API for registering the probes can be found, then the Linux-equivalent test needs to be written in an entirely different way where it depends its own executable path... Though actually now that I think about it, that's probably not too bad? Anyway, I'll look into this.

I think stap3 makes as much sense as anything. Can you walk through considerations?

So I've been using this official document and this header-only implementation as my sources. stap3 was my first idea based on the official documentation since it mentions "Only version 3 is described here." And of course "stap" comes from "SystemTap". Thinking on this now, I might prefer changing this to stapsdt so that it better matches readelf output (NT_STAPSDT): The version number is a minor detail that I doubt is very widely thought about, given that the version 3 feature request was closed as done in 2010.

systemtap would of course be more self-explanatory and have better SEO, for the little that is worth. But, this might also lead people astray into thinking that this won't work with eBPF.

ebpf is a "jump on the bandwagon" kind of name. Most Linux tracing seems to prefer eBPF nowadays (when not using plain perf) and this would show the crate to be part of that movement. But, this of course also leads astray by excluding SystemTap and perf.

linux would probably be a misleading catch-all. I'd have to dig into the dtrace4linux code to find out what USDT format they use, but I assume there the format is DTrace with minor changes to make the ELF sections work on Linux. If that is the case, then this would obviously be misleading as the correct format for Linux would depend on whether you're using DTrace or perf/SystemTap/eBPF. If dtrace4linux (and Oracle's DTrace on Linux) all use the SystemTap format then this would of course be a perfect option.

Personally: I think I'm leaning into stapsdt. (And I renamed the files to this.) It's... not self-explanatory, but it doesn't particularly need to be. And it doesn't lead astray in any way.

I see: so with the stap implementation always performs a load from that mutable static? This seems.. nuts? Can I possibly be understanding this properly??

It does seem a bit nutty, doesn't it? But this is my understanding of the STAPSDT probes. The documentation mentions:

Semaphores are treated as a counter; your tool should increment the semaphore to enable it, and decrement the semaphore when finished.

A counter in the instruction stream sounds like a bad idea, so this basically has to be in the data. And of course it can't be on the stack so either the heap or statics it is. Between those two I know I'd pick statics any day. The header-only implementation seems to agree. Another Rust impl actually directly uses a static mut per call-site, though this causes problems.

Are there other crates people use for Linux / eBPF / Stap / etc?

There are at least sonde and probe. Sonde builds a C library that exposes functions that call the actual probes, and calls those functions through FFI from Rust. This is a perfectly working solution and probably gets compiled down to reasonable stuff with LTO or at least PGO, but I'm not a big fan of having to cross the FFI boundary there. Probe generates STAPSDT probes directly the same way usdt does, but the argument handling at least seems to be pretty bare-bones.

aapoalas

Thank you for the comments! I've addressed them now.

usdt-impl/src/stap3.rs

usdt-impl/src/empty.rs

usdt-impl/src/common.rs

dtrace-parser/src/lib.rs

aapoalas · 2024-12-06T14:51:48Z

You asked about the "does it work" test -- we should definitely try to replicate that on Linux. That test works by calling out to the dtrace command-line utility, and checking that the probes actually fire with the expected data. Is there a similar CLI tool for SystemTap?

I was unable to find an easy answer to this: For one, the test is taking advantage of the register_probes function writing directly into DTrace's helper file to register the probes. I looked into Linux perf's perf buildid-cache command and it has something similar but it's not a single file but has at least a few parameters. I also tried looking into SystemTap itself but was unable to find anything that would directly point towards such a registering API existing.

Of course, even if such a method to register USDTs exists and is shared across all consumers on Linux (SystempTap, perf, eBPF, ...), I'd still need to figure out the data format and do the writing, which I expect would be a fair bit of work as well. Given that eg. Brendan Gregg's Linux perf USDT notes starts with calling perf buildid-cache, I doubt there is really a feasible way to have the program itself register its own probes the way register_probes does.

As such, I implemented the does_it_work test using readelf -n current_exe. It wasn't too bad to do. I'll need to add cfg's to run the test on Linux and skip it otherwise. Right now I just commented out the original test. And then I need to fix all the other tests that do something similar.

aapoalas · 2025-01-01T20:45:40Z

Reminder: I'm waiting for review on this, specifically concerning the review fixes, if going forward with using readelf looks like it make sense to you, and how specializing the tests for Linux should be done.

andreimatei · 2025-01-11T21:50:33Z

Please excuse the drive-by comment from someone trying to understand the USDT ecosystem.

If dtrace4linux (and Oracle's DTrace on Linux) all use the SystemTap format then this would of course be a perfect option.

Oracle's DTrace for Linux does not support the SystemTap format (i.e. the .note.stapsdt ELF section), does it? Instead, it supports the DTrace Object Format, right?

aapoalas · 2025-01-11T22:48:32Z

Please excuse the drive-by comment from someone trying to understand the USDT ecosystem.

At least from my point of view, none of that. Thanks for the comment.

Oracle's DTrace for Linux does not support the SystemTap format (i.e. the .note.stapsdt ELF section), does it? Instead, it supports the DTrace Object Format, right?

I'm not sure, but that is probably exactly correct. Linux as usual is then plagued by two competing standards :)

aapoalas added 2 commits November 30, 2024 22:21

feat: Support Linux stap v3 probes

3a22675

Split isenabled and main probe from one another

72bcd36

aapoalas added 2 commits December 1, 2024 01:45

Fix entirely wrong calling convention

7fae0e6

Re-combine semaphore and USDT definition

d6dea1b

ahl reviewed Dec 5, 2024

View reviewed changes

dtrace-parser/src/lib.rs Outdated Show resolved Hide resolved

usdt-impl/src/stap3.rs Outdated Show resolved Hide resolved

bnaecker reviewed Dec 6, 2024

View reviewed changes

aapoalas added 3 commits December 6, 2024 12:16

Improve comments, move GNU Assembler helpers to its own module

04b7451

Revert unnecessary change

acb1c84

Revert unnecessary change

253972f

aapoalas commented Dec 6, 2024

View reviewed changes

readelf based does-it-work test

c1fb92e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support Linux stap v3 probes #340

feat: Support Linux stap v3 probes #340

aapoalas commented Nov 30, 2024 •

edited

Loading

ahl commented Nov 30, 2024

ahl commented Dec 5, 2024

ahl left a comment

bnaecker commented Dec 6, 2024

bnaecker left a comment

aapoalas commented Dec 6, 2024 •

edited

Loading

aapoalas left a comment

aapoalas commented Dec 6, 2024

aapoalas commented Jan 1, 2025

andreimatei commented Jan 11, 2025

aapoalas commented Jan 11, 2025

feat: Support Linux stap v3 probes #340

Are you sure you want to change the base?

feat: Support Linux stap v3 probes #340

Conversation

aapoalas commented Nov 30, 2024 • edited Loading

ahl commented Nov 30, 2024

ahl commented Dec 5, 2024

ahl left a comment

Choose a reason for hiding this comment

bnaecker commented Dec 6, 2024

bnaecker left a comment

Choose a reason for hiding this comment

aapoalas commented Dec 6, 2024 • edited Loading

aapoalas left a comment

Choose a reason for hiding this comment

aapoalas commented Dec 6, 2024

aapoalas commented Jan 1, 2025

andreimatei commented Jan 11, 2025

aapoalas commented Jan 11, 2025

aapoalas commented Nov 30, 2024 •

edited

Loading

aapoalas commented Dec 6, 2024 •

edited

Loading