Skip to content

[dsymutil] Fix line table sequence mapping for stmt_sequence attributes #143656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

alx32
Copy link
Contributor

@alx32 alx32 commented Jun 11, 2025

This commit resolves an issue in dsymutil where DW_AT_LLVM_stmt_sequence attributes were incorrectly patched during the .debug_line table rewrite. This led to invalid offsets being written to the DWARF, severing the crucial link between a function's DIE and its specific line entries.

The root cause was the patching logic's sole reliance on the DWARFDebugLine::LineTable parser's LT->Sequences output. This parser does not reliably identify every sequence start, as it was not designed for the current DW_AT_LLVM_stmt_sequence model where each function has its own distinct sequence, which may not start with a new address range.

This patch implements a more resilient, hybrid mapping strategy. It first populates a map with the sequences the parser correctly identifies. It then adds to this map by manually reconstructing all sequence boundaries, identifying the start of the table and every row following an end_sequence flag. These boundaries are correlated with a sorted list of all DW_AT_LLVM_stmt_sequence attributes, ensuring a complete and accurate mapping from original to new offsets, even with partial matches.

The existing stmt-seq-macho.test actually exhibited this issue - so we can just add an additional check to ensure the issue is fixed.

@alx32 alx32 force-pushed the 03_fix_dwarf_linker_stmt_seq branch from 3f6c523 to 61f0636 Compare June 11, 2025 05:58
@llvm llvm deleted a comment from github-actions bot Jun 11, 2025
@llvm llvm deleted a comment from llvmbot Jun 11, 2025
@alx32
Copy link
Contributor Author

alx32 commented Jun 13, 2025

This issue needs further investigation. This seems to solve the issue on smaller code samples, but there are issues on larger codebases. Need to find a comprehensive fix.

@alx32 alx32 closed this Jun 13, 2025
DataCorrupted added a commit that referenced this pull request Jul 30, 2025
…49618)

Second attempt to fix
https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434/29?u=alx32

(First attempt: #143656)

Context: the sequence offset to row index we parsed may not be complete.
And we need to add manual matching to it.

#143656 attempts to do trivial
1:1 matching, however, sometimes they don't line up perfectly, as shown
below:
 
 ```
// While SeqOffToOrigRow parsed from CU could be the ground truth,
          // e.g.
          //
          // SeqOff     Row
          // 0x08        9
          // 0x14       15
          //
// The StmtAttrs and SeqStartRows may not match perfectly, e.g.
          //
          // StmtAttrs  SeqStartRows
          // 0x04        3
          // 0x08        5
          // 0x10        9
          // 0x12       11
          // 0x14       15
          //
// In this case, we don't want to assign 5 to 0x08, since we know 0x08
// maps to 9. If we do a dummy 1:1 mapping 0x10 will be mapped to 9
// which is incorrect. The expected behavior is ignore 5, realign the
          // table based on the result from the line table:
          //
          // StmtAttrs  SeqStartRows
          // 0x04        3
          //   --        5
          // 0x08        9 <- LineTableMapping ground truth
          // 0x10       11
          // 0x12       --
          // 0x14       15 <- LineTableMapping ground truth
 ```
 
In this case, we need to use the mapping we read from the line table as
a ground truth and organize them properly to prevent duplicated
offset/missing offset.
 
 Test:
 
 Updated the test case

---------

Signed-off-by: Peter Rong <PeterRong@meta.com>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 30, 2025
…offsets (#149618)

Second attempt to fix
https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434/29?u=alx32

(First attempt: llvm/llvm-project#143656)

Context: the sequence offset to row index we parsed may not be complete.
And we need to add manual matching to it.

llvm/llvm-project#143656 attempts to do trivial
1:1 matching, however, sometimes they don't line up perfectly, as shown
below:

 ```
// While SeqOffToOrigRow parsed from CU could be the ground truth,
          // e.g.
          //
          // SeqOff     Row
          // 0x08        9
          // 0x14       15
          //
// The StmtAttrs and SeqStartRows may not match perfectly, e.g.
          //
          // StmtAttrs  SeqStartRows
          // 0x04        3
          // 0x08        5
          // 0x10        9
          // 0x12       11
          // 0x14       15
          //
// In this case, we don't want to assign 5 to 0x08, since we know 0x08
// maps to 9. If we do a dummy 1:1 mapping 0x10 will be mapped to 9
// which is incorrect. The expected behavior is ignore 5, realign the
          // table based on the result from the line table:
          //
          // StmtAttrs  SeqStartRows
          // 0x04        3
          //   --        5
          // 0x08        9 <- LineTableMapping ground truth
          // 0x10       11
          // 0x12       --
          // 0x14       15 <- LineTableMapping ground truth
 ```

In this case, we need to use the mapping we read from the line table as
a ground truth and organize them properly to prevent duplicated
offset/missing offset.

 Test:

 Updated the test case

---------

Signed-off-by: Peter Rong <PeterRong@meta.com>
DataCorrupted added a commit that referenced this pull request Jul 31, 2025
…1427)

Reverts the [revert](#151424)
and fixed some typos.

Original PR description:

Second attempt to fix
https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434/29?u=alx32

(First attempt: #143656)

Context: the sequence offset to row index we parsed may not be complete.
And we need to add manual matching to it.

#143656 attempts to do trivial
1:1 matching, however, sometimes they don't line up perfectly, as shown
below:

// While SeqOffToOrigRow parsed from CU could be the ground truth,
         // e.g.
         //
         // SeqOff     Row
         // 0x08        9
         // 0x14       15
         //
         // The StmtAttrs and SeqStartRows may not match perfectly, e.g.
         //
         // StmtAttrs  SeqStartRows
         // 0x04        3
         // 0x08        5
         // 0x10        9
         // 0x12       11
         // 0x14       15
         //
// In this case, we don't want to assign 5 to 0x08, since we know 0x08
// maps to 9. If we do a dummy 1:1 mapping 0x10 will be mapped to 9
// which is incorrect. The expected behavior is ignore 5, realign the
         // table based on the result from the line table:
         //
         // StmtAttrs  SeqStartRows
         // 0x04        3
         //   --        5
         // 0x08        9 <- LineTableMapping ground truth
         // 0x10       11
         // 0x12       --
         // 0x14       15 <- LineTableMapping ground truth
In this case, we need to use the mapping we read from the line table as
a ground truth and organize them properly to prevent duplicated
offset/missing offset.

Test:

Updated the test case

---------

Signed-off-by: Peter Rong <PeterRong@meta.com>
Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 31, 2025
…offset (#151427)

Reverts the [revert](llvm/llvm-project#151424)
and fixed some typos.

Original PR description:

Second attempt to fix
https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434/29?u=alx32

(First attempt: llvm/llvm-project#143656)

Context: the sequence offset to row index we parsed may not be complete.
And we need to add manual matching to it.

llvm/llvm-project#143656 attempts to do trivial
1:1 matching, however, sometimes they don't line up perfectly, as shown
below:

// While SeqOffToOrigRow parsed from CU could be the ground truth,
         // e.g.
         //
         // SeqOff     Row
         // 0x08        9
         // 0x14       15
         //
         // The StmtAttrs and SeqStartRows may not match perfectly, e.g.
         //
         // StmtAttrs  SeqStartRows
         // 0x04        3
         // 0x08        5
         // 0x10        9
         // 0x12       11
         // 0x14       15
         //
// In this case, we don't want to assign 5 to 0x08, since we know 0x08
// maps to 9. If we do a dummy 1:1 mapping 0x10 will be mapped to 9
// which is incorrect. The expected behavior is ignore 5, realign the
         // table based on the result from the line table:
         //
         // StmtAttrs  SeqStartRows
         // 0x04        3
         //   --        5
         // 0x08        9 <- LineTableMapping ground truth
         // 0x10       11
         // 0x12       --
         // 0x14       15 <- LineTableMapping ground truth
In this case, we need to use the mapping we read from the line table as
a ground truth and organize them properly to prevent duplicated
offset/missing offset.

Test:

Updated the test case

---------

Signed-off-by: Peter Rong <PeterRong@meta.com>
Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants