-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SymbolIndex and kallsyms helpers #388
base: main
Are you sure you want to change the base?
Conversation
44f9b17
to
acd7534
Compare
Alright, sorry for the failure (it was an old commit changing how ELF symbols are stored in |
6c8cb73
to
429053f
Compare
These latest pushes rebase on the current main branch, with the updated symbol finder API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only looked at the first two commits so far, so feel free to ignore these comments until I'm done or address them now, up to you. I'll continue reviewing tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I finally made it through the whole series. Good stuff! I can't claim that I fully understood the kallsyms format this time around, but I at least tried to look for coding errors.
The guts of the implementation look good, but the drgn.helpers.linux.kallsyms
interface is a bit overwhelming. There are lots of similar-looking helpers that I wouldn't know how to make use as an end user. I wonder if we can pare it down a bit. What are the most important use cases that you're targeting?
libdrgn/kallsyms.c
Outdated
} else { | ||
err = drgn_program_read_memory(prog, addresses, | ||
loc->kallsyms_addresses, | ||
kr->num_syms * sizeof(uint32_t), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to be sizeof(uint64_t)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, what a catch. I'm not sure how I didn't notice the broken addresses for these.
I'm able to test:
!BASE_RELATIVE && !ABSOLUTE_PERCPU
- this means kallsyms addresses are just an array of unsigned long. This is not the default, but it's possible to build a normal x86_64 kernel with this. This was affected by this issue and will be fixed.BASE_RELATIVE && ABSOLUTE_PERCPU
- this means kallsyms addresses are 32-bit with a base address, and they have special handling for negative values for percpu region. This is the default for all x86_64 kernels.BASE_RELATIVE && !ABSOLUTE_PERCPU
- same as the previous, but no handling of negatives for the percpu region. Most x86_64 kernels fail to build because they can't encode percpu addresses. But this is the default for aarch64 kernels and so it gets lots of testing too.
There is no !BASE_RELATIVE && ABSOLUTE_PERCPU
- or maybe there is but it doesn't result in a functional difference, so there's nothing to test.
libdrgn/kallsyms.c
Outdated
} | ||
} else { | ||
if (kind_ret) | ||
*kind_ret = *token_ptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we return without ever setting this, leaving the caller with uninitialized garbage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's worth guarding against. I think the easiest place to do it is actually in drgn_load_builtin_kallsyms()
, where we can simply check whether the string builder length is zero and raise an error.
There's a lot of stuff in the kallsyms tables that, if fuzzed, could cause these parsers to behave weirdly or maybe crash. I'm not certain how paranoid we want to be here. At this point, if the correct set of symbols are provided, with real values, and we've managed to read the token and name table successfully (without faulting or mallocing a buffer that's way too large), then it seems likely that the data will match the kallsyms format. So in practical terms, these sorts of things shouldn't happen. What are your opinions here? I can add paranoia to this, but it does feel like it will take rather unclear code and make it even worse...
Thanks for the review! I'll take a look through it in detail and try to get an update ASAP, rebased on main as well. It looks like you made a lot of good catches, so thanks for that.
Yeah, I have even confused myself... I've occasionally imported from there and not known which function I want to use. So it does need to be simplified. The primary use cases I can think of are:
That would mean at a minimum, we'd really just need The |
In commit c8ff872 ("Support systems without qsort_r"), usage of qsort_r was eliminated because it is a glibc extension. There was discussion of creating a utility function that implements qsort_r(), but the approach described uses thread local variables, so it is not actually reentrant, and it was dropped to avoid confusion. However, upcoming commits will also prefer a comparator function which takes an argument, and they also won't require a reentrant implementation. Add this helper in with a name that shouldn't spark confusion: qsort_arg(). Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
This will be reused in an upcoming commit. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
The Symbol Finder API gives us the ability to register a dynamic callback for symbol lookup. However, many common use cases are satisfied by a simple static list of symbols. Correct and efficient lookup in this simple case is rather tricky. Implement a new type, SymbolIndex, which can take a list of symbols and index them for efficient lookup by name or address. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
When properly configured, Linux contains its own symbol table within kernel memory at runtime. It is exposed as the /proc/kallsyms file, which is the easiest way to consume it, for live kernels. However, with recent changes to the Linux kernel in 6.0, necessary symbols are exposed within VMCOREINFO that allow us to interpret the data structures inside kernel memory, without needing debuginfo. This allows us to write helpers that can load kallsyms on vmcores, or on live systems. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
5f1155f
to
e268fbd
Compare
Add Python helpers which load module kallsyms and return a symbol index for them. Unlike the /proc/kallsyms and built-in kallsyms, these are quite easy to handle using regular Python & drgn code, so implement them as Python helpers. There are (at least) two use cases for these helpers: 1. After loading CTF and built-in vmlinux kallsyms, support for module kallsyms is still necessary. 2. Sometimes, people only extract vmlinux DWARF debuginfo. Adding module symbols can allow stack traces and other symbolization to work even without module debuginfo. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
The SymbolIndex structure owns the memory for symbols and names, and once added to the Program, it cannot be removed. Making copies of any of these symbols is purely a waste: we should be able to treat them as static. So add a lifetime and allow the SymbolIndex to specify static, avoiding unnecessary copies and frees. Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
e268fbd
to
8a9c865
Compare
Ok after a few botched pushes I think I've addressed everything here. I've also rebased on |
This is the next step of the DWARFless debugging / CTF feature, #176. Unlike the symbol finder, there aren't too many questions of API, instead we're just implementing the API. This PR adds:
SymbolIndex
type, which holds an immutable list of symbols and indexes them for efficient name and address based lookup. Some interesting items here:SymbolIndex
overSymbolTable
since I didn't want to overload the common usage of that term applying to ELF symbol tables. That said,SymbolIndex
is a bit strange so I'm open to suggestions.load_proc_kallsyms()
andload_builtin_kallsyms()
) enumerate all kallsyms symbols and return aSymbolIndex
object. The first uses/proc/kallsyms
and the second uses information fromVMCOREINFO
to find the kallsyms tables./proc/kallsyms
from a vmcore./proc/kallsyms
includes module symbols, while the built-in vmlinux kallsyms of course does not. I've included an option to parse these symbols, however by default, we do not. This is because you can actually get the ELF symbol objects, which have more information, directly from thestruct module
.load_module_kallsyms()
) for loading module symbols from the kernel image and creating a symbol index from that. This does require debuginfo for vmlinux.I also of course have tests for the above.
Below are two issues you'll probably notice and want to discuss:
guess_long_names()
function works around torvalds/linux@73bbb94466fd ("kallsyms: support "big" kernel symbols") which was merged in 6.1. I didn't catch it immediately so I'm not sure whether there's any use in sending a commit to include a version number for kallsyms... Especially since I believe it's unlikely to be backported, but I could be wrong. It has certainly not been backported to stable kernels.drgn_symbol_destroy()
. This enables theSymbolIndex
to provides symbols with zero copying, even all the way out to the Python layer! However, I'm not certain it's worth the complexity it adds. That's why I kept the change as a second commit; it's easy to drop if you don't like it. I haven't measured any performance impact and admittedly, I don't know if it would be noticeable.