Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DWARFDump] Make --verify handle all sections by default #81559

Merged
merged 2 commits into from
Feb 13, 2024

Conversation

felipepiovezan
Copy link
Contributor

The current behavior of --verify is that it only verifies debug_info, debug_abbrev and debug_names. This seems fairly arbitrary and might have been unintentional, as originally the absence of any section flags implied "all".

This patch changes the behavior so that the verifier now verifies everything by default. It revealed two tests that had potentially invalid DWARF:

  1. dwarfdump-str-offsets.s is adding padding between two debug_str_offset contributions. The standard does not explicitly allow this behavior. See issue Padding between debug_str_offsets is not supported by DWARF verifier #81558

  2. dwarf5-macro.test uses a checked-in binary that has invalid debug_str_offsets. One of its entries points to the middle of the string section:

error: .debug_str_offsets: contribution 0x0: index 0x4: invalid string offset *0x18 == 0x455D, is neither zero nor immediately following a null character

If we look at the closest offset to 0x455D in debug_str:

0x0000454e: "__SLONG32_TYPE int"

0x455D points to "int".

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 13, 2024

@llvm/pr-subscribers-debuginfo

Author: Felipe de Azevedo Piovezan (felipepiovezan)

Changes

The current behavior of --verify is that it only verifies debug_info, debug_abbrev and debug_names. This seems fairly arbitrary and might have been unintentional, as originally the absence of any section flags implied "all".

This patch changes the behavior so that the verifier now verifies everything by default. It revealed two tests that had potentially invalid DWARF:

  1. dwarfdump-str-offsets.s is adding padding between two debug_str_offset contributions. The standard does not explicitly allow this behavior. See issue Padding between debug_str_offsets is not supported by DWARF verifier #81558

  2. dwarf5-macro.test uses a checked-in binary that has invalid debug_str_offsets. One of its entries points to the middle of the string section:

error: .debug_str_offsets: contribution 0x0: index 0x4: invalid string offset *0x18 == 0x455D, is neither zero nor immediately following a null character

If we look at the closest offset to 0x455D in debug_str:

0x0000454e: "__SLONG32_TYPE int"

0x455D points to "int".


Full diff: https://github.com/llvm/llvm-project/pull/81559.diff

5 Files Affected:

  • (modified) llvm/test/DebugInfo/X86/dwarfdump-str-offsets.s (+4-1)
  • (modified) llvm/test/DebugInfo/X86/skeleton-unit-verify.s (+2)
  • (modified) llvm/test/tools/llvm-dwarfdump/X86/verify_file_encoding.yaml (+2)
  • (modified) llvm/test/tools/llvm-dwarfutil/ELF/X86/dwarf5-macro.test (+5-2)
  • (modified) llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp (+2-1)
diff --git a/llvm/test/DebugInfo/X86/dwarfdump-str-offsets.s b/llvm/test/DebugInfo/X86/dwarfdump-str-offsets.s
index 1725813aac7707..66dfb5f83acb3e 100644
--- a/llvm/test/DebugInfo/X86/dwarfdump-str-offsets.s
+++ b/llvm/test/DebugInfo/X86/dwarfdump-str-offsets.s
@@ -1,6 +1,9 @@
 # RUN: llvm-mc -triple x86_64-unknown-linux %s -filetype=obj -o %t.o
 # RUN: llvm-dwarfdump -v %t.o 2> %t.err | FileCheck --check-prefixes=COMMON,SPLIT,OFFSETS %s
-# RUN: llvm-dwarfdump -verify %t.o | FileCheck --check-prefix=VERIFY %s
+
+# FIXME: the verifier does not accept padding between debug-str-offset
+# sections, which this test uses.
+# RUN: llvm-dwarfdump -verify --debug-info %t.o | FileCheck --check-prefix=VERIFY %s
 # RUN: llvm-dwarfdump -debug-str-offsets %t.o | FileCheck --check-prefix=OFFSETS %s
 # 
 # Check that we don't report an error on a non-existent range list table.
diff --git a/llvm/test/DebugInfo/X86/skeleton-unit-verify.s b/llvm/test/DebugInfo/X86/skeleton-unit-verify.s
index 92a3df486da39d..6aaac18169b604 100644
--- a/llvm/test/DebugInfo/X86/skeleton-unit-verify.s
+++ b/llvm/test/DebugInfo/X86/skeleton-unit-verify.s
@@ -11,6 +11,8 @@
 # CHECK-NEXT: DW_TAG_skeleton_unit
 # CHECK-NEXT: error: Skeleton compilation unit has children.
 # CHECK-NEXT: Verifying dwo Units...
+# CHECK-NEXT: Verifying .debug_line...
+# CHECK-NEXT: Verifying .debug_str_offsets...
 # CHECK-NEXT: Errors detected.
 
         .section .debug_abbrev,"",@progbits
diff --git a/llvm/test/tools/llvm-dwarfdump/X86/verify_file_encoding.yaml b/llvm/test/tools/llvm-dwarfdump/X86/verify_file_encoding.yaml
index fe31436e9f6e35..4afb7758214414 100644
--- a/llvm/test/tools/llvm-dwarfdump/X86/verify_file_encoding.yaml
+++ b/llvm/test/tools/llvm-dwarfdump/X86/verify_file_encoding.yaml
@@ -51,6 +51,8 @@
 # CHECK-NEXT:               DW_AT_call_file   [DW_FORM_sdata] (4)
 # CHECK-NEXT:               DW_AT_call_line   [DW_FORM_sdata] (5){{[[:space:]]}}
 # CHECK-NEXT: Verifying dwo Units...
+# CHECK-NEXT: Verifying .debug_line...
+# CHECK-NEXT: Verifying .debug_str_offsets...
 # CHECK-NEXT: error: Aggregated error counts:
 # CHECK-NEXT: error: Invalid encoding in DW_AT_decl_file occurred 4 time(s).
 # CHECK-NEXT: error: Invalid file index in DW_AT_call_line occurred 1 time(s).
diff --git a/llvm/test/tools/llvm-dwarfutil/ELF/X86/dwarf5-macro.test b/llvm/test/tools/llvm-dwarfutil/ELF/X86/dwarf5-macro.test
index 518244a01ab5fa..267530ab417e1a 100644
--- a/llvm/test/tools/llvm-dwarfutil/ELF/X86/dwarf5-macro.test
+++ b/llvm/test/tools/llvm-dwarfutil/ELF/X86/dwarf5-macro.test
@@ -45,12 +45,15 @@
 
 ## Check that macro table preserved during simple copying.
 #
+# FIXME: the input of this test is itself invalid w.r.t. debug_str_offsets,
+# which also causes the next two calls to --verify to fail, so we only very
+# debug_info on those.
 #RUN: llvm-dwarfutil --no-garbage-collection %p/Inputs/dwarf5-macro.out %t1
-#RUN: llvm-dwarfdump -verify %t1 | FileCheck %s
+#RUN: llvm-dwarfdump -verify --debug-info %t1 | FileCheck %s
 #RUN: llvm-dwarfdump -a  %t1 | FileCheck %s --check-prefix=MACRO
 
 #RUN: llvm-dwarfutil --linker parallel --no-garbage-collection %p/Inputs/dwarf5-macro.out %t1
-#RUN: llvm-dwarfdump -verify %t1 | FileCheck %s
+#RUN: llvm-dwarfdump -verify %t1 --debug-info | FileCheck %s
 #RUN: llvm-dwarfdump -a  %t1 | FileCheck %s --check-prefix=MACRO
 
 ## Check that macro table preserved during updating accelerator tables.
diff --git a/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp b/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
index 8cdd84bcc867cb..f745c0a51ac882 100644
--- a/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
+++ b/llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
@@ -866,7 +866,8 @@ int main(int argc, char **argv) {
     if (Verbose)
       DumpType = DIDT_All;
     else
-      DumpType = DIDT_DebugInfo;
+      // If no options are passed, verify everything but dump only debug_info.
+      DumpType = Verify ? DIDT_All : DIDT_DebugInfo;
   }
 
   // Unless dumping a specific DIE, default to --show-children.

The current behavior of --verify is that it only verifies debug_info,
debug_abbrev and debug_names. This seems fairly arbitrary and might have been
unintentional, as originally the absence of any section flags implied "all".

This patch changes the behavior so that the verifier now verifies everything by
default. It revealed two tests that had potentially invalid DWARF:

1. dwarfdump-str-offsets.s is adding padding between two debug_str_offset
contributions. The standard does not explicitly allow this behavior. See issue
llvm#81558

2. dwarf5-macro.test uses a checked-in binary that has invalid
debug_str_offsets. One of its entries points to the _middle_ of the string
section:

error: .debug_str_offsets: contribution 0x0: index 0x4: invalid string offset
*0x18 == 0x455D, is neither zero nor immediately following a null character

If we look at the closest offset to 0x455D in debug_str:

```
0x0000454e: "__SLONG32_TYPE int"
```

0x455D points to "int".
@dwblaikie
Copy link
Collaborator

0x0000454e: "__SLONG32_TYPE int"

0x455D points to "int".

It's possible that this is valid DWARF and the verifier is being overly pedantic. This would be a tail merging operation - I don't think LLVM does this, but it's possible that other producers do?

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -866,7 +866,8 @@ int main(int argc, char **argv) {
if (Verbose)
DumpType = DIDT_All;
else
DumpType = DIDT_DebugInfo;
// If no options are passed, verify everything but dump only debug_info.
Copy link
Member

@JDevlieghere JDevlieghere Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this comment is slightly ambiguous. I think you can sidestep it by using an else if so it looks symmetrical with the verbose dumping.

if (Verbose)
  DumpType = DIDT_All;
else if(Verify)
  DumpType = DIDT_All;
else 
  DumpType = DIDT_DebugInfo;

or maybe even simpler:

if (Verbose || Verify)
  DumpType = DIDT_All;
else 
  DumpType = DIDT_DebugInfo;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, good point! I like your second suggestion.

@felipepiovezan
Copy link
Contributor Author

felipepiovezan commented Feb 13, 2024

It's possible that this is valid DWARF and the verifier is being overly pedantic. This would be a tail merging operation - I don't think LLVM does this, but it's possible that other producers do?

The spec doesn't say anything, only that strps must point to some offset inside the string table. If we interpret this literally, it would be valid DWARF indeed. I see no reason to disallow that 🤔

Looking into the commit history and... you're the one who implemented this check! :) You actually point this out in the commit description (5e74b2e):

This assumes no suffix merging - if anyone implements that, then this
checking should just be removed for the most part (we could still check
the offsets are within the bounds of .debug_str[.dwo], but nothing more
- any offset in the range would be valid, the offsets wouldn't have to
land at the start of a string)

It curious because the affected test -- if we are to believe the comments there -- was generated by Clang... I tried to generate its binary inputs again, but found a crash in Clang's handling of debug macros, so I decided to put it off for now and just disable the verifier for that check; this should be fine, the test is not focused on debug strings.

@felipepiovezan felipepiovezan merged commit 5296149 into llvm:main Feb 13, 2024
5 of 6 checks passed
@felipepiovezan felipepiovezan deleted the felipe/verify-all-by-default branch February 13, 2024 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants