Skip to content

Commit

Permalink
[AsmPrinter][DebugNames] Implement DW_IDX_parent entries (llvm#77457)
Browse files Browse the repository at this point in the history
This implements the ideas discussed in [1].

To summarize, this commit changes AsmPrinter so that it outputs
DW_IDX_parent information for debug_name entries. It will enable
debuggers to speed up queries for fully qualified types (based on a
DWARFDeclContext) significantly, as debuggers will no longer need to
parse the entire CU in order to inspect the parent chain of a DIE.
Instead, a debugger can simply take the parent DIE offset from the
accelerator table and peek at its name in the debug_info/debug_str
sections.

The implementation uses two types of DW_FORM for the DW_IDX_parent
attribute:

1. DW_FORM_ref4, which points to the accelerator table entry for the
parent.
2. DW_FORM_flag_present, when the entry has a parent that is not in the
table (that is, the parent doesn't have a name, or isn't allowed to be
in the table as per the DWARF spec). This is space-efficient, since it
takes 0 bytes.

The implementation works by:

1. Changing how abbreviations are encoded (so that they encode which
form, if
any, was used to encode IDX_Parent)
2. Creating an MCLabel per accelerator table entry, so that they may be
referred by IDX_parent references.

When all patches related to this are merged, we are able to show that
evaluating an expression such as:

```
lldb --batch -o 'b CodeGenFunction::GenerateCode' -o run -o 'expr Fn' -- \
  clang++ -c -g test.cpp -o /dev/null
```

is far faster: from ~5000 ms to ~1500ms.

Building llvm-project + clang with and without this patch, and looking
at its impact on object file size:

```
ls -la $(find build_stage2_Debug_idx_parent_assert_dwarf5 -name \*.cpp.o) | awk '{s+=$5}  END {printf "%\047d\n", s}'
11,507,327,592

-la $(find build_stage2_Debug_no_idx_parent_assert_dwarf5 -name \*.cpp.o) | awk '{s+=$5}  END {printf "%\047d\n", s}'
11,436,446,616
```

That is, an increase of 0.62% in total object file size.

Looking only at debug_names:

```
$stage1_build/bin/llvm-objdump --section-headers $(find build_stage2_Debug_idx_parent_assert_dwarf5 -name \*.cpp.o) | grep __debug_names | awk '{s+="0x"$3}  END {printf "%\047d\n", s}'
440,772,348

$stage1_build/bin/llvm-objdump --section-headers $(find build_stage2_Debug_no_idx_parent_assert_dwarf5 -name \*.cpp.o) | grep __debug_names | awk '{s+="0x"$3}  END {printf "%\047d\n", s}'
369,867,920
```

That is an increase of 19%.

DWARF Linkers need to be changed in order to support this. This commit
already brings support to "base" linker, but it does not attempt to
modify the parallel linker. Accelerator entries refer to the
corresponding DIE offset, and this patch also requires the parent DIE
offset -- it's not clear how the parallel linker can access this. It may
be obvious to someone familiar with it, but it would be nice to get help
from its authors.

[1]:
https://discourse.llvm.org/t/rfc-improve-dwarf-5-debug-names-type-lookup-parsing-speed/74151/

(cherry picked from commit b667783)
  • Loading branch information
felipepiovezan committed Feb 2, 2024
1 parent 90234a9 commit 1af2ff6
Show file tree
Hide file tree
Showing 12 changed files with 356 additions and 58 deletions.
53 changes: 48 additions & 5 deletions llvm/include/llvm/CodeGen/AccelTable.h
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,20 @@ class AppleAccelTableData : public AccelTableData {
static uint32_t hash(StringRef Buffer) { return djbHash(Buffer); }
};

/// Helper class to identify an entry in DWARF5AccelTable based on their DIE
/// offset and UnitID.
struct OffsetAndUnitID : std::pair<uint64_t, uint32_t> {
using Base = std::pair<uint64_t, uint32_t>;
OffsetAndUnitID(Base B) : Base(B) {}

OffsetAndUnitID(uint64_t Offset, uint32_t UnitID) : Base(Offset, UnitID) {}
uint64_t offset() const { return first; };
uint32_t unitID() const { return second; };
};

template <>
struct DenseMapInfo<OffsetAndUnitID> : DenseMapInfo<OffsetAndUnitID::Base> {};

/// The Data class implementation for DWARF v5 accelerator table. Unlike the
/// Apple Data classes, this class is just a DIE wrapper, and does not know to
/// serialize itself. The complete serialization logic is in the
Expand All @@ -270,9 +284,12 @@ class DWARF5AccelTableData : public AccelTableData {

DWARF5AccelTableData(const DIE &Die, const uint32_t UnitID,
const bool IsTU = false);
DWARF5AccelTableData(const uint64_t DieOffset, const unsigned DieTag,
const unsigned UnitID, const bool IsTU = false)
: OffsetVal(DieOffset), DieTag(DieTag), UnitID(UnitID), IsTU(IsTU) {}
DWARF5AccelTableData(const uint64_t DieOffset,
const std::optional<uint64_t> DefiningParentOffset,
const unsigned DieTag, const unsigned UnitID,
const bool IsTU = false)
: OffsetVal(DieOffset), ParentOffset(DefiningParentOffset),
DieTag(DieTag), UnitID(UnitID), IsTU(IsTU) {}

#ifndef NDEBUG
void print(raw_ostream &OS) const override;
Expand All @@ -282,19 +299,44 @@ class DWARF5AccelTableData : public AccelTableData {
assert(isNormalized() && "Accessing DIE Offset before normalizing.");
return std::get<uint64_t>(OffsetVal);
}

OffsetAndUnitID getDieOffsetAndUnitID() const {
return {getDieOffset(), UnitID};
}

unsigned getDieTag() const { return DieTag; }
unsigned getUnitID() const { return UnitID; }
bool isTU() const { return IsTU; }
void normalizeDIEToOffset() {
assert(!isNormalized() && "Accessing offset after normalizing.");
OffsetVal = std::get<const DIE *>(OffsetVal)->getOffset();
const DIE *Entry = std::get<const DIE *>(OffsetVal);
ParentOffset = getDefiningParentDieOffset(*Entry);
OffsetVal = Entry->getOffset();
}
bool isNormalized() const {
return std::holds_alternative<uint64_t>(OffsetVal);
}

std::optional<uint64_t> getParentDieOffset() const {
if (auto OffsetAndId = getParentDieOffsetAndUnitID())
return OffsetAndId->offset();
return {};
}

std::optional<OffsetAndUnitID> getParentDieOffsetAndUnitID() const {
assert(isNormalized() && "Accessing DIE Offset before normalizing.");
if (!ParentOffset)
return std::nullopt;
return OffsetAndUnitID(*ParentOffset, getUnitID());
}

/// If `Die` has a non-null parent and the parent is not a declaration,
/// return its offset.
static std::optional<uint64_t> getDefiningParentDieOffset(const DIE &Die);

protected:
std::variant<const DIE *, uint64_t> OffsetVal;
std::optional<uint64_t> ParentOffset;
uint32_t DieTag : 16;
uint32_t UnitID : 15;
uint32_t IsTU : 1;
Expand Down Expand Up @@ -341,7 +383,8 @@ class DWARF5AccelTable : public AccelTable<DWARF5AccelTableData> {
void addTypeEntries(DWARF5AccelTable &Table) {
for (auto &Entry : Table.getEntries()) {
for (auto *Data : Entry.second.getValues<DWARF5AccelTableData *>()) {
addName(Entry.second.Name, Data->getDieOffset(), Data->getDieTag(),
addName(Entry.second.Name, Data->getDieOffset(),
Data->getParentDieOffset(), Data->getDieTag(),
Data->getUnitID(), true);
}
}
Expand Down
127 changes: 113 additions & 14 deletions llvm/lib/CodeGen/AsmPrinter/AccelTable.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#include "llvm/CodeGen/AccelTable.h"
#include "DwarfCompileUnit.h"
#include "DwarfUnit.h"
#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/Twine.h"
#include "llvm/BinaryFormat/Dwarf.h"
Expand Down Expand Up @@ -207,7 +208,7 @@ class Dwarf5AccelTableWriter : public AccelTableWriter {
};

Header Header;
DenseMap<uint32_t, SmallVector<DWARF5AccelTableData::AttributeEncoding, 2>>
DenseMap<uint32_t, SmallVector<DWARF5AccelTableData::AttributeEncoding, 3>>
Abbreviations;
ArrayRef<std::variant<MCSymbol *, uint64_t>> CompUnits;
ArrayRef<std::variant<MCSymbol *, uint64_t>> TypeUnits;
Expand All @@ -220,6 +221,8 @@ class Dwarf5AccelTableWriter : public AccelTableWriter {
MCSymbol *EntryPool = Asm->createTempSymbol("names_entries");
// Indicates if this module is built with Split Dwarf enabled.
bool IsSplitDwarf = false;
/// Stores the DIE offsets which are indexed by this table.
DenseSet<OffsetAndUnitID> IndexedOffsets;

void populateAbbrevsMap();

Expand All @@ -228,8 +231,11 @@ class Dwarf5AccelTableWriter : public AccelTableWriter {
void emitBuckets() const;
void emitStringOffsets() const;
void emitAbbrevs() const;
void emitEntry(const DWARF5AccelTableData &Entry) const;
void emitData() const;
void emitEntry(
const DWARF5AccelTableData &Entry,
const DenseMap<OffsetAndUnitID, MCSymbol *> &DIEOffsetToAccelEntryLabel,
DenseSet<MCSymbol *> &EmittedAccelEntrySymbols) const;
void emitData();

public:
Dwarf5AccelTableWriter(
Expand Down Expand Up @@ -395,36 +401,90 @@ void Dwarf5AccelTableWriter::Header::emit(Dwarf5AccelTableWriter &Ctx) {
Asm->OutStreamer->emitBytes({AugmentationString, AugmentationStringSize});
}

static uint32_t constexpr LowerBitSize = dwarf::DW_IDX_type_hash;
std::optional<uint64_t>
DWARF5AccelTableData::getDefiningParentDieOffset(const DIE &Die) {
if (auto *Parent = Die.getParent();
Parent && !Parent->findAttribute(dwarf::Attribute::DW_AT_declaration))
return Parent->getOffset();
return {};
}

enum IdxParentEncoding : uint8_t {
NoIndexedParent = 0, /// Parent information present but parent isn't indexed.
Ref4 = 1, /// Parent information present and parent is indexed.
NoParent = 2, /// Parent information missing.
};

static uint32_t constexpr NumBitsIdxParent = 2;

uint8_t encodeIdxParent(const std::optional<dwarf::Form> MaybeParentForm) {
if (!MaybeParentForm)
return NoParent;
switch (*MaybeParentForm) {
case dwarf::Form::DW_FORM_flag_present:
return NoIndexedParent;
case dwarf::Form::DW_FORM_ref4:
return Ref4;
default:
// This is not crashing on bad input: we should only reach this if the
// internal compiler logic is faulty; see getFormForIdxParent.
llvm_unreachable("Bad form for IDX_parent");
}
}

static uint32_t constexpr ParentBitOffset = dwarf::DW_IDX_type_hash;
static uint32_t constexpr TagBitOffset = ParentBitOffset + NumBitsIdxParent;
static uint32_t getTagFromAbbreviationTag(const uint32_t AbbrvTag) {
return AbbrvTag >> LowerBitSize;
return AbbrvTag >> TagBitOffset;
}

/// Constructs a unique AbbrevTag that captures what a DIE accesses.
/// Using this tag we can emit a unique abbreviation for each DIE.
static uint32_t constructAbbreviationTag(
const unsigned Tag,
const std::optional<DWARF5AccelTable::UnitIndexAndEncoding> &EntryRet) {
const std::optional<DWARF5AccelTable::UnitIndexAndEncoding> &EntryRet,
std::optional<dwarf::Form> MaybeParentForm) {
uint32_t AbbrvTag = 0;
if (EntryRet)
AbbrvTag |= 1 << EntryRet->Encoding.Index;
AbbrvTag |= 1 << dwarf::DW_IDX_die_offset;
AbbrvTag |= Tag << LowerBitSize;
AbbrvTag |= 1 << dwarf::DW_IDX_parent;
AbbrvTag |= encodeIdxParent(MaybeParentForm) << ParentBitOffset;
AbbrvTag |= Tag << TagBitOffset;
return AbbrvTag;
}

static std::optional<dwarf::Form>
getFormForIdxParent(const DenseSet<OffsetAndUnitID> &IndexedOffsets,
std::optional<OffsetAndUnitID> ParentOffset) {
// No parent information
if (!ParentOffset)
return std::nullopt;
// Parent is indexed by this table.
if (IndexedOffsets.contains(*ParentOffset))
return dwarf::Form::DW_FORM_ref4;
// Parent is not indexed by this table.
return dwarf::Form::DW_FORM_flag_present;
}

void Dwarf5AccelTableWriter::populateAbbrevsMap() {
for (auto &Bucket : Contents.getBuckets()) {
for (auto *Hash : Bucket) {
for (auto *Value : Hash->getValues<DWARF5AccelTableData *>()) {
std::optional<DWARF5AccelTable::UnitIndexAndEncoding> EntryRet =
getIndexForEntry(*Value);
unsigned Tag = Value->getDieTag();
uint32_t AbbrvTag = constructAbbreviationTag(Tag, EntryRet);
std::optional<dwarf::Form> MaybeParentForm = getFormForIdxParent(
IndexedOffsets, Value->getParentDieOffsetAndUnitID());
uint32_t AbbrvTag =
constructAbbreviationTag(Tag, EntryRet, MaybeParentForm);
if (Abbreviations.count(AbbrvTag) == 0) {
SmallVector<DWARF5AccelTableData::AttributeEncoding, 2> UA;
SmallVector<DWARF5AccelTableData::AttributeEncoding, 3> UA;
if (EntryRet)
UA.push_back(EntryRet->Encoding);
UA.push_back({dwarf::DW_IDX_die_offset, dwarf::DW_FORM_ref4});
if (MaybeParentForm)
UA.push_back({dwarf::DW_IDX_parent, *MaybeParentForm});
Abbreviations.try_emplace(AbbrvTag, UA);
}
}
Expand Down Expand Up @@ -496,15 +556,34 @@ void Dwarf5AccelTableWriter::emitAbbrevs() const {
}

void Dwarf5AccelTableWriter::emitEntry(
const DWARF5AccelTableData &Entry) const {
const DWARF5AccelTableData &Entry,
const DenseMap<OffsetAndUnitID, MCSymbol *> &DIEOffsetToAccelEntryLabel,
DenseSet<MCSymbol *> &EmittedAccelEntrySymbols) const {
std::optional<DWARF5AccelTable::UnitIndexAndEncoding> EntryRet =
getIndexForEntry(Entry);
uint32_t AbbrvTag = constructAbbreviationTag(Entry.getDieTag(), EntryRet);
std::optional<OffsetAndUnitID> MaybeParentOffset =
Entry.getParentDieOffsetAndUnitID();
std::optional<dwarf::Form> MaybeParentForm =
getFormForIdxParent(IndexedOffsets, MaybeParentOffset);
uint32_t AbbrvTag =
constructAbbreviationTag(Entry.getDieTag(), EntryRet, MaybeParentForm);
auto AbbrevIt = Abbreviations.find(AbbrvTag);
assert(AbbrevIt != Abbreviations.end() &&
"Why wasn't this abbrev generated?");
assert(getTagFromAbbreviationTag(AbbrevIt->first) == Entry.getDieTag() &&
"Invalid Tag");

auto EntrySymbolIt =
DIEOffsetToAccelEntryLabel.find(Entry.getDieOffsetAndUnitID());
assert(EntrySymbolIt != DIEOffsetToAccelEntryLabel.end());
MCSymbol *EntrySymbol = EntrySymbolIt->getSecond();

// Emit the label for this Entry, so that IDX_parents may refer to it.
// Note: a DIE may have multiple accelerator Entries; this check avoids
// creating/emitting multiple labels for the same DIE.
if (EmittedAccelEntrySymbols.insert(EntrySymbol).second)
Asm->OutStreamer->emitLabel(EntrySymbol);

Asm->emitULEB128(AbbrevIt->first, "Abbreviation code");

for (const auto &AttrEnc : AbbrevIt->second) {
Expand All @@ -520,20 +599,34 @@ void Dwarf5AccelTableWriter::emitEntry(
assert(AttrEnc.Form == dwarf::DW_FORM_ref4);
Asm->emitInt32(Entry.getDieOffset());
break;
case dwarf::DW_IDX_parent: {
if (AttrEnc.Form == dwarf::Form::DW_FORM_flag_present)
break;
auto ParentSymbolIt = DIEOffsetToAccelEntryLabel.find(*MaybeParentOffset);
assert(ParentSymbolIt != DIEOffsetToAccelEntryLabel.end());
Asm->emitLabelDifference(ParentSymbolIt->getSecond(), EntryPool, 4);
break;
}
default:
llvm_unreachable("Unexpected index attribute!");
}
}
}

void Dwarf5AccelTableWriter::emitData() const {
void Dwarf5AccelTableWriter::emitData() {
DenseMap<OffsetAndUnitID, MCSymbol *> DIEOffsetToAccelEntryLabel;

for (OffsetAndUnitID Offset : IndexedOffsets)
DIEOffsetToAccelEntryLabel.insert({Offset, Asm->createTempSymbol("")});

Asm->OutStreamer->emitLabel(EntryPool);
DenseSet<MCSymbol *> EmittedAccelEntrySymbols;
for (auto &Bucket : Contents.getBuckets()) {
for (auto *Hash : Bucket) {
// Remember to emit the label for our offset.
Asm->OutStreamer->emitLabel(Hash->Sym);
for (const auto *Value : Hash->Values)
emitEntry(*static_cast<const DWARF5AccelTableData *>(Value));
for (const auto *Value : Hash->getValues<DWARF5AccelTableData *>())
emitEntry(*Value, DIEOffsetToAccelEntryLabel, EmittedAccelEntrySymbols);
Asm->OutStreamer->AddComment("End of list: " + Hash->Name.getString());
Asm->emitInt8(0);
}
Expand All @@ -555,6 +648,12 @@ Dwarf5AccelTableWriter::Dwarf5AccelTableWriter(
CompUnits(CompUnits), TypeUnits(TypeUnits),
getIndexForEntry(std::move(getIndexForEntry)),
IsSplitDwarf(IsSplitDwarf) {

for (auto &Bucket : Contents.getBuckets())
for (auto *Hash : Bucket)
for (auto *Value : Hash->getValues<DWARF5AccelTableData *>())
IndexedOffsets.insert(Value->getDieOffsetAndUnitID());

populateAbbrevsMap();
}

Expand Down
18 changes: 12 additions & 6 deletions llvm/lib/DWARFLinker/Classic/DWARFLinker.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2240,14 +2240,20 @@ void DWARFLinker::emitAcceleratorEntriesForUnit(CompileUnit &Unit) {
} break;
case AccelTableKind::DebugNames: {
for (const auto &Namespace : Unit.getNamespaces())
DebugNames.addName(Namespace.Name, Namespace.Die->getOffset(),
Namespace.Die->getTag(), Unit.getUniqueID());
DebugNames.addName(
Namespace.Name, Namespace.Die->getOffset(),
DWARF5AccelTableData::getDefiningParentDieOffset(*Namespace.Die),
Namespace.Die->getTag(), Unit.getUniqueID());
for (const auto &Pubname : Unit.getPubnames())
DebugNames.addName(Pubname.Name, Pubname.Die->getOffset(),
Pubname.Die->getTag(), Unit.getUniqueID());
DebugNames.addName(
Pubname.Name, Pubname.Die->getOffset(),
DWARF5AccelTableData::getDefiningParentDieOffset(*Pubname.Die),
Pubname.Die->getTag(), Unit.getUniqueID());
for (const auto &Pubtype : Unit.getPubtypes())
DebugNames.addName(Pubtype.Name, Pubtype.Die->getOffset(),
Pubtype.Die->getTag(), Unit.getUniqueID());
DebugNames.addName(
Pubtype.Name, Pubtype.Die->getOffset(),
DWARF5AccelTableData::getDefiningParentDieOffset(*Pubtype.Die),
Pubtype.Die->getTag(), Unit.getUniqueID());
} break;
}
}
Expand Down
3 changes: 2 additions & 1 deletion llvm/lib/DWARFLinker/Parallel/DWARFLinkerImpl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1358,7 +1358,8 @@ void DWARFLinkerImpl::emitDWARFv5DebugNamesSection(const Triple &TargetTriple) {
case DwarfUnit::AccelType::Namespace:
case DwarfUnit::AccelType::Type: {
DebugNames->addName(*DebugStrStrings.getExistingEntry(Info.String),
Info.OutOffset, Info.Tag, CU->getUniqueID());
Info.OutOffset, std::nullopt /*ParentDIEOffset*/,
Info.Tag, CU->getUniqueID());
} break;

default:
Expand Down
16 changes: 15 additions & 1 deletion llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
//===----------------------------------------------------------------------===//
#include "llvm/DebugInfo/DWARF/DWARFVerifier.h"
#include "llvm/ADT/IntervalMap.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallSet.h"
#include "llvm/BinaryFormat/Dwarf.h"
#include "llvm/DebugInfo/DWARF/DWARFAbbreviationDeclaration.h"
Expand Down Expand Up @@ -1272,6 +1273,20 @@ unsigned DWARFVerifier::verifyNameIndexAttribute(
NI.getUnitOffset(), Abbr.Code, AttrEnc.Form, dwarf::DW_FORM_data8);
return 1;
}
return 0;
}

if (AttrEnc.Index == dwarf::DW_IDX_parent) {
constexpr static auto AllowedForms = {dwarf::Form::DW_FORM_flag_present,
dwarf::Form::DW_FORM_ref4};
if (!is_contained(AllowedForms, AttrEnc.Form)) {
error() << formatv("NameIndex @ {0:x}: Abbreviation {1:x}: DW_IDX_parent "
"uses an unexpected form {2} (should be "
"DW_FORM_ref4 or DW_FORM_flag_present).\n",
NI.getUnitOffset(), Abbr.Code, AttrEnc.Form);
return 1;
}
return 0;
}

// A list of known index attributes and their expected form classes.
Expand All @@ -1286,7 +1301,6 @@ unsigned DWARFVerifier::verifyNameIndexAttribute(
{dwarf::DW_IDX_compile_unit, DWARFFormValue::FC_Constant, {"constant"}},
{dwarf::DW_IDX_type_unit, DWARFFormValue::FC_Constant, {"constant"}},
{dwarf::DW_IDX_die_offset, DWARFFormValue::FC_Reference, {"reference"}},
{dwarf::DW_IDX_parent, DWARFFormValue::FC_Constant, {"constant"}},
};

ArrayRef<FormClassTable> TableRef(Table);
Expand Down
Loading

0 comments on commit 1af2ff6

Please sign in to comment.