Add SymbolIndex and kallsyms helpers #388

brenns10 · 2024-03-18T21:21:54Z

This is the next step of the DWARFless debugging / CTF feature, #176. Unlike the symbol finder, there aren't too many questions of API, instead we're just implementing the API. This PR adds:

The SymbolIndex type, which holds an immutable list of symbols and indexes them for efficient name and address based lookup. Some interesting items here:
- It uses a hash for name lookup, and an interesting binary search approach for address lookup, in order to support duplicated names and overlapping address ranges.
- I chose SymbolIndex over SymbolTable since I didn't want to overload the common usage of that term applying to ELF symbol tables. That said, SymbolIndex is a bit strange so I'm open to suggestions.
Two C helpers (load_proc_kallsyms() and load_builtin_kallsyms()) enumerate all kallsyms symbols and return a SymbolIndex object. The first uses /proc/kallsyms and the second uses information from VMCOREINFO to find the kallsyms tables.
- The proc version lets you specify a filename, in case you happen to have a copy of /proc/kallsyms from a vmcore.
- /proc/kallsyms includes module symbols, while the built-in vmlinux kallsyms of course does not. I've included an option to parse these symbols, however by default, we do not. This is because you can actually get the ELF symbol objects, which have more information, directly from the struct module.
A Python helper (load_module_kallsyms()) for loading module symbols from the kernel image and creating a symbol index from that. This does require debuginfo for vmlinux.

I also of course have tests for the above.

Below are two issues you'll probably notice and want to discuss:

The guess_long_names() function works around torvalds/linux@73bbb94466fd ("kallsyms: support "big" kernel symbols") which was merged in 6.1. I didn't catch it immediately so I'm not sure whether there's any use in sending a commit to include a version number for kallsyms... Especially since I believe it's unlikely to be backported, but I could be wrong. It has certainly not been backported to stable kernels.
The final commit of the branch adds an optimization where we can mark symbols as having a lifetime. Symbols with static lifetimes are not freed by drgn_symbol_destroy(). This enables the SymbolIndex to provides symbols with zero copying, even all the way out to the Python layer! However, I'm not certain it's worth the complexity it adds. That's why I kept the change as a second commit; it's easy to drop if you don't like it. I haven't measured any performance impact and admittedly, I don't know if it would be noticeable.

brenns10 · 2024-03-18T23:40:24Z

Alright, sorry for the failure (it was an old commit changing how ELF symbols are stored in struct mod_kallsyms). I had been hoping for a clean bill of health on the first try :D
But it's ready now, excited to take the new symbol finder API out for a spin :D

brenns10 · 2024-06-14T17:54:25Z

These latest pushes rebase on the current main branch, with the updated symbol finder API.

osandov

I've only looked at the first two commits so far, so feel free to ignore these comments until I'm done or address them now, up to you. I'll continue reviewing tomorrow.

libdrgn/symbol.c

libdrgn/python/symbol_index.c

libdrgn/symbol.c

osandov

Okay, I finally made it through the whole series. Good stuff! I can't claim that I fully understood the kallsyms format this time around, but I at least tried to look for coding errors.

The guts of the implementation look good, but the drgn.helpers.linux.kallsyms interface is a bit overwhelming. There are lots of similar-looking helpers that I wouldn't know how to make use as an end user. I wonder if we can pare it down a bit. What are the most important use cases that you're targeting?

tests/linux_kernel/helpers/test_kallsyms.py

libdrgn/kallsyms.c

osandov · 2024-08-15T20:13:26Z

libdrgn/kallsyms.c

+		} else {
+			err = drgn_program_read_memory(prog, addresses,
+						loc->kallsyms_addresses,
+						kr->num_syms * sizeof(uint32_t),


Is this supposed to be sizeof(uint64_t)?

Yes, what a catch. I'm not sure how I didn't notice the broken addresses for these.

I'm able to test:

!BASE_RELATIVE && !ABSOLUTE_PERCPU - this means kallsyms addresses are just an array of unsigned long. This is not the default, but it's possible to build a normal x86_64 kernel with this. This was affected by this issue and will be fixed.

BASE_RELATIVE && ABSOLUTE_PERCPU - this means kallsyms addresses are 32-bit with a base address, and they have special handling for negative values for percpu region. This is the default for all x86_64 kernels.

BASE_RELATIVE && !ABSOLUTE_PERCPU - same as the previous, but no handling of negatives for the percpu region. Most x86_64 kernels fail to build because they can't encode percpu addresses. But this is the default for aarch64 kernels and so it gets lots of testing too.

There is no !BASE_RELATIVE && ABSOLUTE_PERCPU - or maybe there is but it doesn't result in a functional difference, so there's nothing to test.

libdrgn/kallsyms.c

osandov · 2024-08-15T20:19:09Z

libdrgn/kallsyms.c

+				}
+			} else {
+				if (kind_ret)
+					*kind_ret = *token_ptr;


Can we return without ever setting this, leaving the caller with uninitialized garbage?

Yes, I think that's worth guarding against. I think the easiest place to do it is actually in drgn_load_builtin_kallsyms(), where we can simply check whether the string builder length is zero and raise an error.

There's a lot of stuff in the kallsyms tables that, if fuzzed, could cause these parsers to behave weirdly or maybe crash. I'm not certain how paranoid we want to be here. At this point, if the correct set of symbols are provided, with real values, and we've managed to read the token and name table successfully (without faulting or mallocing a buffer that's way too large), then it seems likely that the data will match the kallsyms format. So in practical terms, these sorts of things shouldn't happen. What are your opinions here? I can add paranoia to this, but it does feel like it will take rather unclear code and make it even worse...

brenns10 · 2024-08-15T20:57:28Z

Thanks for the review! I'll take a look through it in detail and try to get an update ASAP, rebased on main as well. It looks like you made a lot of good catches, so thanks for that.

The guts of the implementation look good, but the drgn.helpers.linux.kallsyms interface is a bit overwhelming. There are lots of similar-looking helpers that I wouldn't know how to make use as an end user. I wonder if we can pare it down a bit. What are the most important use cases that you're targeting?

Yeah, I have even confused myself... I've occasionally imported from there and not known which function I want to use. So it does need to be simplified.

The primary use cases I can think of are:

The full non-DWARF approach where you use kallsyms, CTF || BTF, and ORC or frame pointers for unwinding. In this case, you need to do a delicate dance: first load vmlinux kallsyms, then the types (you need the symbol table to find the BTF, though CTF doesn't require the symbol table first), then the module kallsyms (since module kallsyms requires vmlinux types). In the final CTF branch, there's a simple drgn.helpers.linux.ctf:load_ctf() function that just does everything for you in the right order, so most of the time nobody would need to think about it.
The partial DWARF approach, where you have vmlinux DWARF loaded, but you didn't extract all the modules. This actually happens all the time for us internally, because we typically extract only the debuginfo we need. In that case, it's very nice to be able to add a symbol finder for the kernel modules, so that symbolization works as expected without loading the module debuginfo.

That would mean at a minimum, we'd really just need load_vmlinux_kallsyms() and load_module_kallsyms(). I don't think the load_builtin_kallsyms() and load_proc_kallsyms() functions are really necessary to be part of the API (though keeping them as private/internal functions would be nice, because I could see myself wanting to use them hackily).

The module_kallsyms() function is exceptionally poorly named. I included this as a public function because I could imagine a world where a user may have some kernel modules' DWARF debuginfo loaded, but they want to load the other modules' symbols. They could use this function to get that subset of symbols. Or, alternatively, I could see it being useful for a user to simply ask: what symbols are part of this kernel module? But to sum up, I don't have a single great use case for this function. I wouldn't be disappointed to scrap it from the API, or at a minimum, it should be renamed to something clearer (e.g. get_module_symbols()).

In commit c8ff872 ("Support systems without qsort_r"), usage of qsort_r was eliminated because it is a glibc extension. There was discussion of creating a utility function that implements qsort_r(), but the approach described uses thread local variables, so it is not actually reentrant, and it was dropped to avoid confusion. However, upcoming commits will also prefer a comparator function which takes an argument, and they also won't require a reentrant implementation. Add this helper in with a name that shouldn't spark confusion: qsort_arg(). Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>

This will be reused in an upcoming commit. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>

The Symbol Finder API gives us the ability to register a dynamic callback for symbol lookup. However, many common use cases are satisfied by a simple static list of symbols. Correct and efficient lookup in this simple case is rather tricky. Implement a new type, SymbolIndex, which can take a list of symbols and index them for efficient lookup by name or address. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>

When properly configured, Linux contains its own symbol table within kernel memory at runtime. It is exposed as the /proc/kallsyms file, which is the easiest way to consume it, for live kernels. However, with recent changes to the Linux kernel in 6.0, necessary symbols are exposed within VMCOREINFO that allow us to interpret the data structures inside kernel memory, without needing debuginfo. This allows us to write helpers that can load kallsyms on vmcores, or on live systems. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>

Add Python helpers which load module kallsyms and return a symbol index for them. Unlike the /proc/kallsyms and built-in kallsyms, these are quite easy to handle using regular Python & drgn code, so implement them as Python helpers. There are (at least) two use cases for these helpers: 1. After loading CTF and built-in vmlinux kallsyms, support for module kallsyms is still necessary. 2. Sometimes, people only extract vmlinux DWARF debuginfo. Adding module symbols can allow stack traces and other symbolization to work even without module debuginfo. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>

The SymbolIndex structure owns the memory for symbols and names, and once added to the Program, it cannot be removed. Making copies of any of these symbols is purely a waste: we should be able to treat them as static. So add a lifetime and allow the SymbolIndex to specify static, avoiding unnecessary copies and frees. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>

brenns10 · 2024-08-21T17:16:40Z

Ok after a few botched pushes I think I've addressed everything here. I've also rebased on main to drop my drgn.helpers.linux.module changes which conflicted with #411. I did leave open the comments with more substantial replies in case you want to continue those discussions, though I think I addressed those too. Thanks!

brenns10 mentioned this pull request Mar 18, 2024

DWARFless Debugging #176

Open

5 tasks

brenns10 force-pushed the kallsyms_finder branch from 44f9b17 to acd7534 Compare March 18, 2024 23:00

brenns10 mentioned this pull request May 20, 2024

Allow naming and configuring order of type, object, and symbol finders #393

Merged

brenns10 force-pushed the kallsyms_finder branch 2 times, most recently from 6c8cb73 to 429053f Compare June 14, 2024 17:53

osandov reviewed Jul 30, 2024

View reviewed changes

osandov requested changes Aug 15, 2024

View reviewed changes

brenns10 added 4 commits August 20, 2024 17:51

python: Add helper for returning list of Symbols

0901e20

This will be reused in an upcoming commit. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>

brenns10 force-pushed the kallsyms_finder branch 2 times, most recently from 5f1155f to e268fbd Compare August 21, 2024 00:59

brenns10 added 2 commits August 20, 2024 19:45

brenns10 force-pushed the kallsyms_finder branch from e268fbd to 8a9c865 Compare August 21, 2024 02:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SymbolIndex and kallsyms helpers #388

Add SymbolIndex and kallsyms helpers #388

brenns10 commented Mar 18, 2024

brenns10 commented Mar 18, 2024

brenns10 commented Jun 14, 2024

osandov left a comment

osandov left a comment

osandov Aug 15, 2024

brenns10 Aug 20, 2024

osandov Aug 15, 2024

brenns10 Aug 20, 2024

brenns10 commented Aug 15, 2024

brenns10 commented Aug 21, 2024

Add SymbolIndex and kallsyms helpers #388

Are you sure you want to change the base?

Add SymbolIndex and kallsyms helpers #388

Conversation

brenns10 commented Mar 18, 2024

brenns10 commented Mar 18, 2024

brenns10 commented Jun 14, 2024

osandov left a comment

Choose a reason for hiding this comment

osandov left a comment

Choose a reason for hiding this comment

osandov Aug 15, 2024

Choose a reason for hiding this comment

brenns10 Aug 20, 2024

Choose a reason for hiding this comment

osandov Aug 15, 2024

Choose a reason for hiding this comment

brenns10 Aug 20, 2024

Choose a reason for hiding this comment

brenns10 commented Aug 15, 2024

brenns10 commented Aug 21, 2024