-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Padding between debug_str_offsets is not supported by DWARF verifier #81558
Comments
@llvm/issue-subscribers-debuginfo Author: Felipe de Azevedo Piovezan (felipepiovezan)
If we run the verifier with --all on `llvm/test/DebugInfo/X86/dwarfdump-str-offsets.s`,
it will claim the input is invalid because it adds padding between debug_str_offsets:
Now, if we interpret the DWARF spec literally, it doesn't talk about padding. Dwarfdump, on the other hand, takes a different approach. It calls So one utility acknowledges (and prints!) those gaps because it uses an indirect way |
The current behavior of --verify is that it only verifies debug_info, debug_abbrev and debug_names. This seems fairly arbitrary and might have been unintentional, as originally the absence of any section flags implied "all". This patch changes the behavior so that the verifier now verifies everything by default. It revealed two tests that had potentially invalid DWARF: 1. dwarfdump-str-offsets.s is adding padding between two debug_str_offset contributions. The standard does not explicitly allow this behavior. See issue llvm#81558 2. dwarf5-macro.test uses a checked-in binary that has invalid debug_str_offsets. One of its entries points to the _middle_ of the string section: error: .debug_str_offsets: contribution 0x0: index 0x4: invalid string offset *0x18 == 0x455D, is neither zero nor immediately following a null character If we look at the closest offset to 0x455D in debug_str: ``` 0x0000454e: "__SLONG32_TYPE int" ``` 0x455D points to "int".
@llvm/issue-subscribers-tools-llvm-dwarfdump Author: Felipe de Azevedo Piovezan (felipepiovezan)
If we run the verifier with --all on `llvm/test/DebugInfo/X86/dwarfdump-str-offsets.s`,
it will claim the input is invalid because it adds padding between debug_str_offsets:
Now, if we interpret the DWARF spec literally, it doesn't talk about padding. Dwarfdump, on the other hand, takes a different approach. It calls So one utility acknowledges (and prints!) those gaps because it uses an indirect way |
Eh - there's been some chatter in the DWARF committee (@pogo59 ) about documenting whether sections can contain gaps, or padding, and how to implement that padding - and Paul's mostly been advocating, I tihnk, for saying that you can't have arbitrary gaps in sections and you should use encoding-valid representations of padding (exactly what is is different in different sections). Historically, padding has been allowed, somewhat incidentally (because you only need to read the .debug_info (and debug_aranges, and debug_pubnames maybe) sections directly, and everything else is read indirectly) - sections read indirectly naturally support unspecified gaps/padding/garbage because there's no reason for a reader to read those bytes unless it knows some secret contract that suggests it should read it. Binutils dwarfdump has been pretty good about not dumping these gaps - llvm-dwarfdump less so, I think it does read some sections end-to-end even when they are only needed to be accessed indirectly. I'm surprised we do the right thing with debug_str_offsets - maybe due to the DWARFv4 DWARFv5 encoding changes in debug_str_offsets.dwo, not sure... shrug I don't think it's super important either way, and I'm not sure my opinion here is the popular one, probably not, I'm fine with not allowing gaps - wouldn't especially mind if llvm-dwarfdump dumped the debug_str_offsets section without consulting an units, just parsing table after table. |
Thank you for the background, that helps a lot!
Yup, this makes sense, I guess it is only the verifier / dump tools that are special in that way.
No strong preferences either way from me too, but it would be nice to hear if there are any good use cases for padding. |
Drive-by comment -- something similar came up a long time ago to do with padding in .debug_loc, recorded now at #42635. The tl;dr is that GNU ld will dead-strip code and install holes in .debug_loc, that readelf/dwarfdump can cope with, but the LLVM tools can't. I think the conclusion at the time was that it's just not a section you can read without also reading .debug_info. Less relevant to this issue as DWARF5 uses .debug_loclists instead, but this is certainly an existing category of limitation of the LLVM tools. |
The current behavior of --verify is that it only verifies debug_info, debug_abbrev and debug_names. This seems fairly arbitrary and might have been unintentional, as originally the absence of any section flags implied "all". This patch changes the behavior so that the verifier now verifies everything by default. It revealed two tests that had potentially invalid DWARF: 1. dwarfdump-str-offsets.s is adding padding between two debug_str_offset contributions. The standard does not explicitly allow this behavior. See issue #81558 2. dwarf5-macro.test uses a checked-in binary that has invalid debug_str_offsets. One of its entries points to the _middle_ of the string section: error: .debug_str_offsets: contribution 0x0: index 0x4: invalid string offset *0x18 == 0x455D, is neither zero nor immediately following a null character If we look at the closest offset to 0x455D in debug_str: ``` 0x0000454e: "__SLONG32_TYPE int" ``` 0x455D points to "int".
If we run the verifier with --all on
llvm/test/DebugInfo/X86/dwarfdump-str-offsets.s
,it will claim the input is invalid because it adds padding between debug_str_offsets:
Now, if we interpret the DWARF spec literally, it doesn't talk about padding.
It really makes it sound like offset_of_dwarf_version + length must point to
the next contribution. Indeed, this is what the verifier expects (and why this
test fails to verify).
Dwarfdump, on the other hand, takes a different approach. It calls
collectContributionData(Units)
, which scans all compile units, extracts theirDW_AT_str_offsets_base and uses those to build a list of "this is where the
different debug_str_offset contributions start". It even "subtracts" the header
size from those to find the headers.
So one utility acknowledges (and prints!) those gaps because it uses an indirect way
to find where they are, but the other rejects them. How to reconcile these?
The text was updated successfully, but these errors were encountered: