Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DebugInfo][RemoveDIs] Add documentation for updating code to handle debug records #93562

Merged
merged 2 commits into from
May 30, 2024

Conversation

SLTozer
Copy link
Contributor

@SLTozer SLTozer commented May 28, 2024

Although the patch that enables debug records by default has been temporarily reverted, it will (eventually) be reverted and everyone's code will be subjected to the new debug info format. Although this is broadly a good thing, it is important that the documentation has enough information to guide users through the update; this patch adds what should hopefully be enough detail for most users to either find the answers, or find out how to find those answers.

@llvmbot
Copy link
Collaborator

llvmbot commented May 28, 2024

@llvm/pr-subscribers-debuginfo

Author: Stephen Tozer (SLTozer)

Changes

Although the patch that enables debug records by default has been temporarily reverted, it will (eventually) be reverted and everyone's code will be subjected to the new debug info format. Although this is broadly a good thing, it is important that the documentation has enough information to guide users through the update; this patch adds what should hopefully be enough detail for most users to either find the answers, or find out how to find those answers.


Full diff: https://github.com/llvm/llvm-project/pull/93562.diff

1 Files Affected:

  • (modified) llvm/docs/RemoveDIsDebugInfo.md (+200-13)
diff --git a/llvm/docs/RemoveDIsDebugInfo.md b/llvm/docs/RemoveDIsDebugInfo.md
index 9e50a2a604aa6..96b2f2560a343 100644
--- a/llvm/docs/RemoveDIsDebugInfo.md
+++ b/llvm/docs/RemoveDIsDebugInfo.md
@@ -24,12 +24,16 @@ The debug records are not instructions, do not appear in the instruction list, a
 
 # Great, what do I need to do!
 
-Approximately nothing -- we've already instrumented all of LLVM to handle these new records ("`DbgRecords`") and behave identically to past LLVM behaviour. We plan on turning this on by default some time soon, with IR converted to the intrinsic form of debug info at terminals (textual IR, bitcode) for a short while, before then changing the textual IR and bitcode formats.
+Very little -- we've already instrumented all of LLVM to handle these new records ("`DbgRecords`") and behave identically to past LLVM behaviour. This is currently being turned on by default, so that `DbgRecords` will be used by default in memory, IR, and bitcode.
+
+## API Changes
 
 There are two significant changes to be aware of. Firstly, we're adding a single bit of debug relevant data to the `BasicBlock::iterator` class (it's so that we can determine whether ranges intend on including debug info at the beginning of a block or not). That means when writing passes that insert LLVM IR instructions, you need to identify positions with `BasicBlock::iterator` rather than just a bare `Instruction *`. Most of the time this means that after identifying where you intend on inserting something, you must also call `getIterator` on the instruction position -- however when inserting at the start of a block you _must_ use `getFirstInsertionPt`, `getFirstNonPHIIt` or `begin` and use that iterator to insert, rather than just fetching a pointer to the first instruction.
 
 The second matter is that if you transfer sequences of instructions from one place to another manually, i.e. repeatedly using `moveBefore` where you might have used `splice`, then you should instead use the method `moveBeforePreserving`. `moveBeforePreserving` will transfer debug info records with the instruction they're attached to. This is something that happens automatically today -- if you use `moveBefore` on every element of an instruction sequence, then debug intrinsics will be moved in the normal course of your code, but we lose this behaviour with non-instruction debug info.
 
+For a more in-depth overview of how to update existing code to support debug records, see [the guide below](#how-to-update-existing-code).
+
 # C-API changes
 
 All the functions that have been added are temporary and will be deprecated in the future. The intention is that they'll help downstream projects adapt during the transition period.
@@ -58,13 +62,9 @@ LLVMDIBuilderInsertDbgValueBefore  # Same as above.
 LLVMDIBuilderInsertDbgValueAtEnd   # Same as above.
 ```
 
-# Anything else?
-
-Not really, but here's an "old vs new" comparison of how to do certain things and quickstart for how this "new" debug info is structured.
-
-## Skipping debug records, ignoring debug-uses of Values, stably counting instructions...
+# The new "Debug Record" model
 
-This will all happen transparently without needing to think about it!
+Below is a brief overview of the new representation that replaces debug intrinsics; for an instructive guide on updating old code, see [here](#how-to-update-existing-code).
 
 ## What exactly have you replaced debug intrinsics with?
 
@@ -106,21 +106,204 @@ Not shown are the links from DbgRecord to other parts of the `Value`/`Metadata`
 
 The various kinds of debug intrinsic (value, declare, assign, label) are all stored in `DbgRecord` subclasses, with a "RecordKind" field distinguishing `DbgLabelRecord`s from `DbgVariableRecord`s, and a `LocationType` field in the `DbgVariableRecord` class further disambiguating the various debug variable intrinsics it can represent.
 
-## Finding debug info records
+# How to update existing code
+
+Any existing code that interacts with debug intrinsics in some way will need to be updated to interact with debug records in the same way. A few quick rules to keep in mind when updating code:
+
+- Debug records will not be seen when iterating over instructions; to find the debug records that appear immediately before an instruction, you'll need to iterate over `Instruction::getDbgRecordRange()`.
+- Debug records have interfaces that are identical to those of debug intrinsics, meaning that any code that operates on debug intrinsics can be trivially applied to debug records as well. The exceptions for this are Instruction or CallInst methods that don't logically apply to debug records, and `isa/cast/dyn_cast` methods, are replaced by methods on the `DbgRecord` class itself.
+- Debug records cannot appear in a module that also contains debug intrinsics; the two are mutually exclusive. As debug records are the future format, handling records correctly should be prioritized in new code.
+- Until support for intrinsics is no longer present, a valid hotfix for code that only handles debug intrinsics and is non-trivial to update is to convert the module to the intrinsic format using `Module::setIsNewDbgInfoFormat`, and convert it back afterwards.
+  - This can also be performed within a lexical scope for a module or an individual function using the class `ScopedDbgInfoFormatSetter`:
+  ```
+  void handleModule(Module &M) {
+    {
+      ScopedDbgInfoFormatSetter FormatSetter(M, false);
+      handleModuleWithDebugIntrinsics(M);
+    }
+    // Module returns to previous debug info format after exiting the above block.
+  }
+  ```
+
+Below is a rough guide on how existing code that currently supports debug intrinsics can be updated to support debug records.
+
+## Creating debug records
+
+Debug records will automatically be created by the `DIBuilder` class when the new format is enabled. As with instructions, it is also possible to call `DbgRecord::clone` to create an unattached copy of an existing record.
+
+## Skipping debug records, ignoring debug-uses of Values, stably counting instructions...
+
+This will all happen transparently without needing to think about it!
+
+```
+for (Instruction &I : BB) {
+  // Old: Skips debug intrinsics
+  if (isa<DbgInfoIntrinsic>(&I))
+    continue;
+  // New: No extra code needed, debug records are skipped by default.
+  ...
+}
+```
+
+## Finding debug records
 
 Utilities such as `findDbgUsers` and the like now have an optional argument that will return the set of `DbgVariableRecord` records that refer to a `Value`. You should be able to treat them the same as intrinsics.
 
-## Examining debug info records at positions
+```
+// Old:
+  SmallVector<DbgVariableIntrinsic *> DbgUsers;
+  findDbgUsers(DbgUsers, V);
+  for (auto *DVI : DbgUsers) {
+    if (DVI->getParent() != BB)
+      DVI->replaceVariableLocationOp(V, New);
+  }
+// New:
+  SmallVector<DbgVariableIntrinsic *> DbgUsers;
+  SmallVector<DbgVariableRecord *> DVRUsers;
+  findDbgUsers(DbgUsers, V, &DVRUsers);
+  for (auto *DVI : DbgUsers)
+    if (DVI->getParent() != BB)
+      DVI->replaceVariableLocationOp(V, New);
+  for (auto *DVR : DVRUsers)
+    if (DVR->getParent() != BB)
+      DVR->replaceVariableLocationOp(V, New);
+```
+
+## Examining debug records at positions
 
 Call `Instruction::getDbgRecordRange()` to get the range of `DbgRecord` objects that are attached to an instruction.
 
-## Moving around, deleting
+```
+for (Instruction &I : BB) {
+  // Old: Uses a data member of a debug intrinsic, and then skips to the next
+  // instruction.
+  if (DbgInfoIntrinsic *DII = dyn_cast<DbgInfoIntrinsic>(&I)) {
+    recordDebugLocation(DII->getDebugLoc());
+    continue;
+  }
+  // New: Iterates over the debug records that appear before `I`, and treats
+  // them identically to the intrinsic block above.
+  // NB: This should always appear at the top of the for-loop, so that we
+  // process the debug records preceding `I` before `I` itself.
+  for (DbgRecord &DR = I.getDbgRecordRange()) {
+    recordDebugLocation(DR.getDebugLoc());
+  }
+  processInstruction(I);
+}
+```
+
+This can also be passed through the function `filterDbgVars` to specifically
+iterate over DbgVariableRecords, which are more commonly used.
 
-You can use `DbgRecord::removeFromParent` to unlink a `DbgRecord` from it's marker, and then `BasicBlock::insertDbgRecordBefore` or `BasicBlock::insertDbgRecordAfter` to re-insert the `DbgRecord` somewhere else. You cannot insert a `DbgRecord` at an arbitary point in a list of `DbgRecord`s (if you're doing this with `dbg.value`s then it's unlikely to be correct).
+```
+for (Instruction &I : BB) {
+  // Old: If `I` is a DbgVariableIntrinsic we record the variable, and apply
+  // extra logic if it is an `llvm.dbg.declare`.
+  if (DbgVariableIntrinsic *DVI = dyn_cast<DbgVariableIntrinsic>(&I)) {
+    recordVariable(DVI->getVariable());
+    if (DbgDeclareInst *DDI = dyn_cast<DbgDeclareInst>(DVI))
+      recordDeclareAddress(DDI->getAddress());
+    continue;
+  }
+  // New: `filterDbgVars` is used to iterate over only DbgVariableRecords.
+  for (DbgVariableRecord &DVR = filterDbgVars(I.getDbgRecordRange())) {
+    recordVariable(DVR.getVariable());
+    // Debug variable records are not cast to subclasses; simply call the
+    // appropriate `isDbgX()` check, and use the methods as normal.
+    if (DVR.isDbgDeclare())
+      recordDeclareAddress(DVR.getAddress());
+  }
+  // ...
+}
+```
+
+## Processing individual debug records
 
-Erase `DbgRecord`s by calling `eraseFromParent` or `deleteInstr` if it's already been removed.
+In most cases, any code that operates on debug intrinsics can be extracted to a template function or auto lambda (if it is not already in one) that can be applied to both debug intrinsics and debug records - though keep in mind the main exception that isa/cast/dyn_cast do not apply to DbgVariableRecord types.
 
-## What about dangling `DbgRecord`s?
+```
+// Old: Function that operates on debug variable intrinsics in a BasicBlock, and
+// collects llvm.dbg.declares.
+void processDbgInfoInBlock(BasicBlock &BB,
+                           SmallVectorImpl<DbgDeclareInst*> &DeclareIntrinsics) {
+  for (Instruction &I : BB) {
+    if (DbgVariableIntrinsic *DVI = dyn_cast<DbgVariableIntrinsic>(&I)) {
+      processVariableValue(DebugVariable(DVI), DVI->getValue());
+      if (DbgDeclareInst *DDI = dyn_cast<DbgDeclareInst>(DVI))
+        Declares.push_back(DDI);
+      else if (!isa<Constant>(DVI->getValue()))
+        DVI->setKillLocation();
+    }
+  }
+}
+
+// New: Template function is used to deduplicate handling of intrinsics and
+// records.
+// An overloaded function is also used to handle isa/cast/dyn_cast operations
+// for intrinsics and records, since those functions cannot be directly applied
+// to DbgRecords.
+DbgDeclareInst *DynCastToDeclare(DbgVariableIntrinsic *DVI) {
+  return dyn_cast<DbgDeclareInst>(DVI);
+}
+DbgVariableRecord *DynCastToDeclare(DbgVariableRecord *DVR) {
+  return DVR->isDbgDeclare() ? DVR : nullptr;
+}
+
+template<typename DbgVarTy, DbgDeclTy>
+void processDbgVariable(DbgVarTy *DbgVar,
+                       SmallVectorImpl<DbgDeclTy*> &Declares) {
+    processVariableValue(DebugVariable(DbgVar), DbgVar->getValue());
+    if (DbgDeclTy *DbgDeclare = DynCastToDeclare(DbgVar))
+      Declares.push_back(DbgDeclare);
+    else if (!isa<Constant>(DbgVar->getValue()))
+      DbgVar->setKillLocation();
+};
+
+void processDbgInfoInBlock(BasicBlock &BB,
+                           SmallVectorImpl<DbgDeclareInst*> &DeclareIntrinsics,
+                           SmallVectorImpl<DbgVariableRecord*> &DeclareRecords) {
+  for (Instruction &I : BB) {
+    if (DbgVariableIntrinsic *DVI = dyn_cast<DbgVariableIntrinsic>(&I))
+      processDbgVariable(DVI, DeclareIntrinsics);
+    for (DbgVariableRecord *DVR : filterDbgVars(I.getDbgRecordRange()))
+      processDbgVariable(DVR, DeclareRecords);
+  }
+}
+```
+
+## Moving and deleting debug records
+
+You can use `DbgRecord::removeFromParent` to unlink a `DbgRecord` from it's marker, and then `BasicBlock::insertDbgRecordBefore` or `BasicBlock::insertDbgRecordAfter` to re-insert the `DbgRecord` somewhere else. You cannot insert a `DbgRecord` at an arbitary point in a list of `DbgRecord`s (if you're doing this with `llvm.dbg.value`s then it's unlikely to be correct).
+
+Erase `DbgRecord`s by calling `eraseFromParent`.
+
+```
+// Old: Move a debug intrinsic to the start of the block, and delete all other intrinsics for the same variable in the block.
+void moveDbgIntrinsicToStart(DbgVariableIntrinsic *DVI) {
+  BasicBlock *ParentBB = DVI->getParent();
+  DVI->removeFromParent();
+  for (Instruction &I : ParentBB) {
+    if (auto *BlockDVI = dyn_cast<DbgVariableIntrinsic>(&I))
+      if (BlockDVI->getVariable() == DVI->getVariable())
+        BlockDVI->eraseFromParent();
+  }
+  DVI->insertBefore(ParentBB->getFirstInsertionPt());
+}
+
+// New: Perform the same operation, but for a debug record.
+void moveDbgRecordToStart(DbgVariableRecord *DVR) {
+  BasicBlock *ParentBB = DVR->getParent();
+  DVR->removeFromParent();
+  for (Instruction &I : ParentBB) {
+    for (auto &BlockDVR : filterDbgVars(I.getDbgRecordRange()))
+      if (BlockDVR->getVariable() == DVR->getVariable())
+        BlockDVR->eraseFromParent();
+  }
+  DVR->insertBefore(ParentBB->getFirstInsertionPt());
+}
+```
+
+## What about dangling debug records?
 
 If you have a block like so:
 
@@ -134,3 +317,7 @@ If you have a block like so:
 your optimisation pass may wish to erase the terminator and then do something to the block. This is easy to do when debug info is kept in instructions, but with `DbgRecord`s there is no trailing instruction to attach the variable information to in the block above, once the terminator is erased. For such degenerate blocks, `DbgRecord`s are stored temporarily in a map in `LLVMContext`, and are re-inserted when a terminator is reinserted to the block or other instruction inserted at `end()`.
 
 This can technically lead to trouble in the vanishingly rare scenario where an optimisation pass erases a terminator and then decides to erase the whole block. (We recommend not doing that).
+
+## Anything else?
+
+The above guide does not comprehensively cover every pattern that could apply to debug intrinsics; as mentioned at the [start of the guide](#how-to-update-existing-code), you can temporarily convert the target module from debug records to intrinsics as a stopgap measure. Most operations that can be performed on debug intrinsics have exact equivalents for debug records, but if you encounter any exceptions, reading the class docs (linked [here](#what-exactly-have-you-replaced-debug-intrinsics-with)) may give some insight, there may be examples in the existing codebase, and you can always ask for help on the [forums](https://discourse.llvm.org/tag/debuginfo).
\ No newline at end of file

@SLTozer SLTozer requested a review from pogo59 May 28, 2024 14:59
@jryans
Copy link
Member

jryans commented May 30, 2024

What's the status of #91725? That seems to update some of the same text (at least the summary paragraph at the top). Maybe it was forgotten? Perhaps best to merge that first and then rebase this to avoid conflicts.

@SLTozer
Copy link
Contributor Author

SLTozer commented May 30, 2024

What's the status of #91725? That seems to update some of the same text (at least the summary paragraph at the top). Maybe it was forgotten? Perhaps best to merge that first and then rebase this to avoid conflicts.

That patch has been deliberately left hanging until the corresponding textual IR update has landed/is about to land, just in case any change to the approach becomes necessary before then. It's more likely that this patch will land first, because this patch exists to assist with #89799, which is currently reverted and waiting for a reduced reproducer, and the IR patch is blocked on that patch. So in short, it's just a dependency tree situation; this change should really have been made before landing the aforementioned PR in the first place.

Copy link
Member

@jryans jryans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, thanks for working on this! 😄

Any existing code that interacts with debug intrinsics in some way will need to be updated to interact with debug records in the same way. A few quick rules to keep in mind when updating code:

- Debug records will not be seen when iterating over instructions; to find the debug records that appear immediately before an instruction, you'll need to iterate over `Instruction::getDbgRecordRange()`.
- Debug records have interfaces that are identical to those of debug intrinsics, meaning that any code that operates on debug intrinsics can be trivially applied to debug records as well. The exceptions for this are Instruction or CallInst methods that don't logically apply to debug records, and `isa/cast/dyn_cast` methods, are replaced by methods on the `DbgRecord` class itself.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Debug records have interfaces that are identical to those of debug intrinsics, meaning that any code that operates on debug intrinsics can be trivially applied to debug records as well. The exceptions for this are Instruction or CallInst methods that don't logically apply to debug records, and `isa/cast/dyn_cast` methods, are replaced by methods on the `DbgRecord` class itself.
- Debug records have interfaces that are identical to those of debug intrinsics, meaning that any code that operates on debug intrinsics can be trivially applied to debug records as well. The exceptions for this are `Instruction` or `CallInst` methods that don't logically apply to debug records, and `isa`/`cast`/`dyn_cast` methods, are replaced by methods on the `DbgRecord` class itself.


Debug records will automatically be created by the `DIBuilder` class when the new format is enabled. As with instructions, it is also possible to call `DbgRecord::clone` to create an unattached copy of an existing record.

## Skipping debug records, ignoring debug-uses of Values, stably counting instructions...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Skipping debug records, ignoring debug-uses of Values, stably counting instructions...
## Skipping debug records, ignoring debug-uses of `Value`s, stably counting instructions, etc.


Erase `DbgRecord`s by calling `eraseFromParent` or `deleteInstr` if it's already been removed.
In most cases, any code that operates on debug intrinsics can be extracted to a template function or auto lambda (if it is not already in one) that can be applied to both debug intrinsics and debug records - though keep in mind the main exception that isa/cast/dyn_cast do not apply to DbgVariableRecord types.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In most cases, any code that operates on debug intrinsics can be extracted to a template function or auto lambda (if it is not already in one) that can be applied to both debug intrinsics and debug records - though keep in mind the main exception that isa/cast/dyn_cast do not apply to DbgVariableRecord types.
In most cases, any code that operates on debug intrinsics can be extracted to a template function or auto lambda (if it is not already in one) that can be applied to both debug intrinsics and debug records - though keep in mind the main exception that `isa`/`cast`/`dyn_cast` do not apply to `DbgVariableRecord` types.

@SLTozer SLTozer merged commit a8e03ae into llvm:main May 30, 2024
4 of 7 checks passed
@dwblaikie
Copy link
Collaborator

That patch has been deliberately left hanging until the #91724 has landed/is about to land, just in case any change to the approach becomes necessary before then.

Given that that patch only changes the default behavior - if someone wanted to get ahead of that default switch they'd want the documentation now rather than later. And if things change, we can update the documentation - might be worth submitting this now so it's not sort of floating out here all branch-like. But I don't feel super strongly about it.

@SLTozer
Copy link
Contributor Author

SLTozer commented May 30, 2024

Given that that patch only changes the default behavior - if someone wanted to get ahead of that default switch they'd want the documentation now rather than later.

That's a reasonable argument - I'd just assumed that nobody was going to be proactively applying these changes, but it's easier to do so if the information is available sooner I suppose!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants