Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug_names incorrect parent due to collision between CU and TU #93886

Closed
dwblaikie opened this issue May 30, 2024 · 5 comments · Fixed by #95039 or #95339
Closed

debug_names incorrect parent due to collision between CU and TU #93886

dwblaikie opened this issue May 30, 2024 · 5 comments · Fixed by #95039 or #95339
Assignees

Comments

@dwblaikie
Copy link
Collaborator

from #91808 (comment)

namespace A {
namespace B {
struct C { };
struct D { };
}  // namespace B
}  // namespace A
void f1(A::B::C, A::B::D) { }
$ clang++-tot names.cpp -g2 -O0 -gpubnames -c -fdebug-types-section && llvm-dwarfdump-tot -debug-names names.o | grep "Name \|Entry \|Tag: \|DW_IDX\|String: "
      String: 0x000000a0 "_Z2f1N1A1B1CENS0_1DE"
      Entry @ 0xe5 {
        Tag: DW_TAG_subprogram
        DW_IDX_die_offset: 0x00000023
        DW_IDX_parent: <parent not indexed>
      String: 0x000000b5 "A"
      Entry @ 0xeb {
        Tag: DW_TAG_namespace
        DW_IDX_type_unit: 0x00
        DW_IDX_die_offset: 0x00000023
        DW_IDX_parent: <parent not indexed>
      Entry @ 0xf1 {
        Tag: DW_TAG_namespace
        DW_IDX_type_unit: 0x01
        DW_IDX_die_offset: 0x00000023
        DW_IDX_parent: <parent not indexed>
      Entry @ 0xf7 {
        Tag: DW_TAG_namespace
        DW_IDX_die_offset: 0x00000044
        DW_IDX_parent: <parent not indexed>
      String: 0x000000b7 "B"
      Entry @ 0x103 {
        Tag: DW_TAG_namespace
        DW_IDX_type_unit: 0x00
        DW_IDX_die_offset: 0x00000025
        DW_IDX_parent: Entry @ 0xe5
      Entry @ 0x10d {
        Tag: DW_TAG_namespace
        DW_IDX_type_unit: 0x01
        DW_IDX_die_offset: 0x00000025
        DW_IDX_parent: Entry @ 0xf1
      Entry @ 0x117 {
        Tag: DW_TAG_namespace
        DW_IDX_die_offset: 0x00000046
        DW_IDX_parent: Entry @ 0xf7

So the second and third "B" entries match up with the appropriate "A" entries, but the first jumps over to refer to a different DIE in the CU that has the same offset.

This is because the UniqueID on a unit is only unique within that type of unit (it's unique across CUs and, separately, unique across TUs) - so the "DieOffsetAndUnitID" is not a globally unique identifier for an entry - it'd need the type of unit in there too to completely uniquify it.

Adding this patch is enough to expose the issue more directly:

diff --git a/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp b/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
index 5b679fd3b9f9..cc8b8d1881ed 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
@@ -615,8 +615,10 @@ Dwarf5AccelTableWriter::Dwarf5AccelTableWriter(
 
   for (auto &Bucket : Contents.getBuckets())
     for (auto *Hash : Bucket)
-      for (auto *Value : Hash->getValues<DWARF5AccelTableData *>())
-        IndexedOffsets.insert(Value->getDieOffsetAndUnitID());
+      for (auto *Value : Hash->getValues<DWARF5AccelTableData *>()) {
+        auto Inserted = IndexedOffsets.insert(Value->getDieOffsetAndUnitID()).second;
+        assert(Inserted);
+      }
 
   populateAbbrevsMap();
 }

The two different units (one CU, one TU) with the same unit ID end up colliding in the IndexedOffsets set - and only one ends up in there, and so then the loop here:

for (OffsetAndUnitID Offset : IndexedOffsets)
  DIEOffsetToAccelEntryLabel.insert({Offset, Asm->createTempSymbol("")});

Only inserts the copy once, oh... and /this/ code:

if (EmittedAccelEntrySymbols.insert(EntrySymbol).second)
  Asm->OutStreamer->emitLabel(EntrySymbol);

Silently skips emitting the label even though it matches more than one entry unintentionally, because it can match more than one entry /intentionally/ (if there's multiple entries for exactly the same entity, but known by different names (like the mangled name and unmangled name)).

So, yeah.

Hmm, I guess at least the type unit number probably isn't gapless - there are cases where we create a type unit and then potentially throw it away (see the TypeUnitsUnderConstruction stuff) - but we don't reuse the type unit number.

So, maybe it's possible to use a single numbering for CUs and TUs, gaps and all?

The only thing I was worried about was that some code might be using the numbering to index into an array of units at some point, but if that's not the case - great! Probably more intuitive that they be totally unique numbers.

@llvmbot
Copy link
Collaborator

llvmbot commented May 30, 2024

@llvm/issue-subscribers-debuginfo

Author: David Blaikie (dwblaikie)

from https://github.com//pull/91808#issuecomment-2138480664
namespace A {
namespace B {
struct C { };
struct D { };
}  // namespace B
}  // namespace A
void f1(A::B::C, A::B::D) { }
$ clang++-tot names.cpp -g2 -O0 -gpubnames -c -fdebug-types-section &amp;&amp; llvm-dwarfdump-tot -debug-names names.o | grep "Name \|Entry \|Tag: \|DW_IDX\|String: "
      String: 0x000000a0 "_Z2f1N1A1B1CENS0_1DE"
      Entry @ 0xe5 {
        Tag: DW_TAG_subprogram
        DW_IDX_die_offset: 0x00000023
        DW_IDX_parent: &lt;parent not indexed&gt;
      String: 0x000000b5 "A"
      Entry @ 0xeb {
        Tag: DW_TAG_namespace
        DW_IDX_type_unit: 0x00
        DW_IDX_die_offset: 0x00000023
        DW_IDX_parent: &lt;parent not indexed&gt;
      Entry @ 0xf1 {
        Tag: DW_TAG_namespace
        DW_IDX_type_unit: 0x01
        DW_IDX_die_offset: 0x00000023
        DW_IDX_parent: &lt;parent not indexed&gt;
      Entry @ 0xf7 {
        Tag: DW_TAG_namespace
        DW_IDX_die_offset: 0x00000044
        DW_IDX_parent: &lt;parent not indexed&gt;
      String: 0x000000b7 "B"
      Entry @ 0x103 {
        Tag: DW_TAG_namespace
        DW_IDX_type_unit: 0x00
        DW_IDX_die_offset: 0x00000025
        DW_IDX_parent: Entry @ 0xe5
      Entry @ 0x10d {
        Tag: DW_TAG_namespace
        DW_IDX_type_unit: 0x01
        DW_IDX_die_offset: 0x00000025
        DW_IDX_parent: Entry @ 0xf1
      Entry @ 0x117 {
        Tag: DW_TAG_namespace
        DW_IDX_die_offset: 0x00000046
        DW_IDX_parent: Entry @ 0xf7

So the second and third "B" entries match up with the appropriate "A" entries, but the first jumps over to refer to a different DIE in the CU that has the same offset.

This is because the UniqueID on a unit is only unique within that type of unit (it's unique across CUs and, separately, unique across TUs) - so the "DieOffsetAndUnitID" is not a globally unique identifier for an entry - it'd need the type of unit in there too to completely uniquify it.

Adding this patch is enough to expose the issue more directly:

diff --git a/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp b/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
index 5b679fd3b9f9..cc8b8d1881ed 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
@@ -615,8 +615,10 @@ Dwarf5AccelTableWriter::Dwarf5AccelTableWriter(
 
   for (auto &amp;Bucket : Contents.getBuckets())
     for (auto *Hash : Bucket)
-      for (auto *Value : Hash-&gt;getValues&lt;DWARF5AccelTableData *&gt;())
-        IndexedOffsets.insert(Value-&gt;getDieOffsetAndUnitID());
+      for (auto *Value : Hash-&gt;getValues&lt;DWARF5AccelTableData *&gt;()) {
+        auto Inserted = IndexedOffsets.insert(Value-&gt;getDieOffsetAndUnitID()).second;
+        assert(Inserted);
+      }
 
   populateAbbrevsMap();
 }

The two different units (one CU, one TU) with the same unit ID end up colliding in the IndexedOffsets set - and only one ends up in there, and so then the loop here:

for (OffsetAndUnitID Offset : IndexedOffsets)
  DIEOffsetToAccelEntryLabel.insert({Offset, Asm-&gt;createTempSymbol("")});

Only inserts the copy once, oh... and /this/ code:

if (EmittedAccelEntrySymbols.insert(EntrySymbol).second)
  Asm-&gt;OutStreamer-&gt;emitLabel(EntrySymbol);

Silently skips emitting the label even though it matches more than one entry unintentionally, because it can match more than one entry /intentionally/ (if there's multiple entries for exactly the same entity, but known by different names (like the mangled name and unmangled name)).

So, yeah.

Hmm, I guess at least the type unit number probably isn't gapless - there are cases where we create a type unit and then potentially throw it away (see the TypeUnitsUnderConstruction stuff) - but we don't reuse the type unit number.

So, maybe it's possible to use a single numbering for CUs and TUs, gaps and all?

The only thing I was worried about was that some code might be using the numbering to index into an array of units at some point, but if that's not the case - great! Probably more intuitive that they be totally unique numbers.

@dwblaikie
Copy link
Collaborator Author

@ayermolo @felipepiovezan

@ayermolo
Copy link
Contributor

ayermolo commented Jun 4, 2024

I can take a look later this week unless @felipepiovezan already plans to do it.

@felipepiovezan
Copy link
Contributor

I can take a look later this week unless @felipepiovezan already plans to do it.

I would appreciate the help :) I have some stuff that I need to look at before going on PTO, so it would be tricky to look into this in a reasonable timeframe

@ayermolo
Copy link
Contributor

ayermolo commented Jun 5, 2024

OK, will take a look Friday or Monday. Also busy with internal stuff.

@ayermolo ayermolo self-assigned this Jun 10, 2024
ayermolo added a commit to ayermolo/llvm-project that referenced this issue Jun 10, 2024
This fixes llvm#93886. The UnitID is not
unique between CUs and TUs. This led to DW_IDX_parent to point ot an entry for a
DIE in CU if it had the same relative offset as TU die.

Added a IsTU to the hash for parent chain.
ayermolo added a commit that referenced this issue Jun 12, 2024
This fixes #93886. The UnitID
is not
unique between CUs and TUs. This led to DW_IDX_parent to point ot an
entry for a
DIE in a CU if it had the same relative offset as a TU die.

Added a IsTU to the hash for parent chain.
ayermolo added a commit to ayermolo/llvm-project that referenced this issue Jun 13, 2024
This fixes llvm#93886. The UnitID is not
unique between CUs and TUs. This led to DW_IDX_parent to point ot an entry for a
DIE in CU if it had the same relative offset as TU die.

Added a IsTU to the hash for parent chain.
ayermolo added a commit to ayermolo/llvm-project that referenced this issue Jun 13, 2024
This fixes llvm#93886. The UnitID is not
unique between CUs and TUs. This led to DW_IDX_parent to point ot an entry for a
DIE in CU if it had the same relative offset as TU die.

Added a IsTU to the hash for parent chain.
ayermolo added a commit that referenced this issue Jun 14, 2024
This fixes #93886. The UnitID
is not
unique between CUs and TUs. This led to DW_IDX_parent to point ot an
entry for a
DIE in CU if it had the same relative offset as TU die.

Added a IsTU to the hash for parent chain.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants