Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debuginfo: Add support for split-debuginfo on platforms that allow it #34651

Closed
michaelwoerister opened this issue Jul 4, 2016 · 32 comments
Closed
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-incr-comp Area: Incremental compilation C-feature-request Category: A feature request, i.e: not implemented / a PR. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@michaelwoerister
Copy link
Member

DWARF v5 will standardize something sometimes called "Split DWARF" or "debuginfo fission". The gist of it is this: Debuginfo can be very large (gigabytes) and can contain lots of relocations, so the linker will spend a lot of time copying and relocating it into the final executable. "Split DWARF" is an approach that allows to basically skip this linker step: Since debuginfo is already located in the individual object files generated during compilation, why not just "link" to the debuginfo in there and let the debugger do any relocations on demand. This can potentially mean a drastic reduction of compile times for builds with debuginfo.

LLVM already supports this on Linux, as far as I know, and it might be a good fit for incremental compilation where the linker could easily become the most time-consuming step (although that might be nixed by using gold's incremental linking feature).

It might also be interesting for providing pre-built debuginfo separately via *.dwp files.

@michaelwoerister michaelwoerister added A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-incr-comp Area: Incremental compilation labels Jul 4, 2016
@nagisa
Copy link
Member

nagisa commented Jul 4, 2016

Technically it is already possible to have “split” DWARF debug info, even without v5, which you can load into gdb.

The only problem is generating it separately in the first place, because currently the only way to get split debug info to the best of my knowledge, is to use objcopy --only-keep-debug and then strip the original library/executable.

@michaelwoerister
Copy link
Member Author

The only problem is generating it separately in the first place, because currently the only way to get split debug info to the best of my knowledge, is to use objcopy --only-keep-debug and then strip the original library/executable.

How does gdb know where to find the debuginfo in that case? With the DWARF 5 approach, the 'skeleton' debuginfo entries store the path to the object files.

@nagisa
Copy link
Member

nagisa commented Jul 4, 2016

You use command like add-symbol-file. Sure not the most seamless experience, but, like I said, it at least works.

@tromey
Copy link
Contributor

tromey commented Jul 6, 2016

How does gdb know where to find the debuginfo in that case?

There are two ways, but the better of the two is the build-id feature. See the docs. This is what all the distros use.

However, this form of splitting is very different from Fission. Fission trades off link time speed for some debugger performance (I believe, I haven't used it in anger). Build-id is more about being able to split off the debuginfo for separate packaging -- and, contra Fission, it slows down the development cycle, as the splitting is a separate step.

It's not always appropriate to use Fission so this would have to be a compile-time flag of some kind.

@michaelwoerister
Copy link
Member Author

@tromey Thanks for the background info! Would you agree that Fission with .dwp packages would be the cleaner way of distributing debuginfo separately? I know, this feature just wasn't available before, so distros had to do it differently. But now that it is there (in some places at least :)), it seems like the superior approach.

It's not always appropriate to use Fission so this would have to be a compile-time flag of some kind.

Yes, that's how I envisioned it.

@tromey
Copy link
Contributor

tromey commented Jul 6, 2016

Would you agree that Fission with .dwp packages would be the cleaner way of distributing debuginfo separately?

They are maybe mildly cleaner (but maybe not); but the approach currently taken by the distros has the advantage that all the tools already work with it; whereas nobody did the work to make everything use Fission. Also the distros, IME, were more space-sensitive and less link-time-sensitive than developers; leading to dwz instead, which is somewhat tied into the build-id and packaging schemes.

@michaelwoerister
Copy link
Member Author

@tromey That is good to know. Thanks!

@alexcrichton
Copy link
Member

Adding a cc from #47240 to this as well. OSX supports this by default but we're actually going out of our way to undo the compile time win by running dsymutil by default. It looks like we don't need to though and instead need to ensure that we leave the object files without deleting them. That's probably also relevant for this where we'll need to avoid deleting some intermediate artifacts!

@rocallahan
Copy link

LLVM already supports this on Linux, as far as I know,

To be clear: there are two flavours of split-DWARF: the pre-standard GNU flavour and the DWARF5 standardized flavour, and they are different. (Par for the course due to DWARF's broken development model :-(.) LLVM/gcc/gdb support the former but not the latter and probably there is not yet any tool support for the DWARF5 standard flavour. See http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2017-February/004325.html. So your choice is between using the GNU flavour or waiting some unknown length of time for the standard flavour to be usable.

Using split-DWARF would create inconvenience for some users. E.g. we move Rust binaries around between machines/containers and sometimes need to debug binaries on machines other than where they were built. Split-DWARF would mean we have to move around the object files as well for debugging to work, which would be less efficient than just not splitting. So we'd want split-DWARF to be an option we could turn off (and linking performance without split-DWARF would continue to matter for us).

@briansmith
Copy link
Contributor

Split-DWARF would mean we have to move around the object files as well for debugging to work

Really? Is there nothing like Microsoft's PDB for Linux?

@rocallahan
Copy link

You mean a way to link the debuginfo from all your object files into a single large debuginfo file for the binary?

I don't know if such a tool exists, but even if it did you'd lose most of the performance benefits described at the top of this issue, plus it would still be less convenient than having the debuginfo packed into the binary itself.

@briansmith
Copy link
Contributor

You mean a way to link the debuginfo from all your object files into a single large debuginfo file for the binary?

Yes

I don't know if such a tool exists, but even if it did you'd lose most of the performance benefits described at the top of this issue, plus it would still be less convenient than having the debuginfo packed into the binary itself.

Linking all the object files into a single large debuginfo only would have to be done when cutting releases, not during day-to-day development. Basically one would publish the big debuginfo file as a release artifact and probably build some automation for retrieving it to assist with debugging release versions of the software.

@rocallahan
Copy link

Linking all the object files into a single large debuginfo only would have to be done when cutting releases

For you perhaps, but in #34651 (comment) I explained that we frequently need to move binaries around and debug them.

@briansmith
Copy link
Contributor

For you perhaps, but in #34651 (comment) I explained that we frequently need to move binaries around and debug them.

Right, I think different users have different goals. My use case is handled well by split dwarf because I care about build speed first for day-to-day builds and I want a PDB-like thing for making small final releases that can still be debugged well, even if those final releases take a long time to build.

@rocallahan
Copy link

I want a PDB-like thing for making small final releases that can still be debugged well, even if those final releases take a long time to build.

Your final release situation is already served well by making a binary with full debuginfo and then moving the debuginfo out into an external debuginfo file. So split-DWARF is only needed to improve your build times.

@michaelwoerister
Copy link
Member Author

Yes, this would be a performance optimization first and foremost. And it would probably not be the default setting. That being said, there are no concrete plans for implementing this yet.

@luser
Copy link
Contributor

luser commented Mar 20, 2018

I don't know about the DWARF5 flavor, but with the GNU flavor there's a dwp tool that functions like dsymutil to generate a file.dwp containing the linked debug info from a -gsplit-dwarf build:
https://gcc.gnu.org/wiki/DebugFissionDWP

@luser
Copy link
Contributor

luser commented Mar 20, 2018

Note that @alexcrichton landed a patch to rustc to allow it to stop running dsymutil after every build (because it's slow):
#47784

and he's planning on making this the default behavior in cargo to get faster rebuilds on Mac (#47240). If that change lands then Mac builds will be functionally equivalent to split-DWARF builds on Linux.

@alexcrichton
Copy link
Member

@luser ah unfortunately the plan to turn it on by default was shot down when it was realized that line numbers disappered from RUST_BACKTRACE=1

@luser
Copy link
Contributor

luser commented Mar 20, 2018

Bummer! Presumably we'd hit similar issues with split-DWARF on Linux. We need to lock fitzgen in a room and make him finish unwind-rs. 😉

@jonhoo
Copy link
Contributor

jonhoo commented Oct 22, 2019

@alexcrichton This might be a long shot, but has anything changed lately that might unblock this change?

@alexcrichton
Copy link
Member

I believe the current state of this can roughly be summarized as:

  • This isn't implemented by default anywhere
    • On OSX you can pass -Zrun-dsymutil=no to simulate what compile times would be like
    • On Linux we have not implemented the requisite support to use fission with DWARF
    • I don't think anyone's looked into Windows MSVC or MinGW
  • Switching on by default at least needs to preserve debugger backtraces and RUST_BACKTRACE=1 backtraces by default
    • For the latter we've switched to the backtrace crate where development can more easily happen. Notably the gimli-symbolize feature is pretty mature now, and development can likely happen on that feature to see what it would take to get RUST_BACKTRACE to support this

Getting all that done I believe is a bare minimum for even considering turning this on by default, but just adding an option could likely be stabilized much sooner. We could likely add an option with a Linux implementation and stabilize the OSX functionality under that option (and probably do some cursory Windows investigation too)

@memoryruins
Copy link
Contributor

Does #73441 help toward this goal? though I now see rust-lang/backtrace-rs#287 is currently an open issue.

@Trass3r
Copy link

Trass3r commented Jul 22, 2020

I don't know about the DWARF5 flavor, but with the GNU flavor there's a dwp tool that functions like dsymutil to generate a file.dwp containing the linked debug info from a -gsplit-dwarf build:
https://gcc.gnu.org/wiki/DebugFissionDWP

It still doesn't support DWARF5 but gcc's split dwarf works just fine, also in gdb.
For reference, this is similar to running mspdbcmf on a /DEBUG:FASTLINK pdb on Windows.

bors added a commit to rust-lang-ci/rust that referenced this issue Dec 16, 2020
…nagisa

cg_llvm: split dwarf support

cc rust-lang#34651

This PR adds initial support for Split DWARF to rustc, based on the implementation in Clang.

##### Current Status
This PR currently has functioning split-dwarf, running rustc with `-Zsplit-dwarf=split` when compiling a binary will produce a `dwp` alongside the binary, which contains the linked dwarf objects.

```shell-session
$ rustc -Cdebuginfo=2 -Zsplit-dwarf=split -C save-temps ./foo.rs
$ ls foo*
foo
foo.belfx9afw9cmv8.rcgu.dwo
foo.belfx9afw9cmv8.rcgu.o
foo.foo.7rcbfp3g-cgu.0.rcgu.dwo
foo.foo.7rcbfp3g-cgu.0.rcgu.o
foo.foo.7rcbfp3g-cgu.1.rcgu.dwo
foo.foo.7rcbfp3g-cgu.1.rcgu.o
foo.foo.7rcbfp3g-cgu.2.rcgu.dwo
foo.foo.7rcbfp3g-cgu.2.rcgu.o
foo.foo.7rcbfp3g-cgu.3.rcgu.dwo
foo.foo.7rcbfp3g-cgu.3.rcgu.o
foo.foo.7rcbfp3g-cgu.4.rcgu.dwo
foo.foo.7rcbfp3g-cgu.4.rcgu.o
foo.foo.7rcbfp3g-cgu.5.rcgu.dwo
foo.foo.7rcbfp3g-cgu.5.rcgu.o
foo.foo.7rcbfp3g-cgu.6.rcgu.dwo
foo.foo.7rcbfp3g-cgu.6.rcgu.o
foo.foo.7rcbfp3g-cgu.7.rcgu.dwo
foo.foo.7rcbfp3g-cgu.7.rcgu.o
foo.dwp
foo.rs
$ readelf -wi foo.foo.7rcbfp3g-cgu.0.rcgu.o
# ...
  Compilation Unit @ offset 0x90:
   Length:        0x2c (32-bit)
   Version:       4
   Abbrev Offset: 0x5b
   Pointer Size:  8
 <0><9b>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <9c>   DW_AT_stmt_list   : 0xe8
    <a0>   DW_AT_comp_dir    : (indirect string, offset: 0x13b): /home/david/Projects/rust/rust0
    <a4>   DW_AT_GNU_dwo_name: (indirect string, offset: 0x15b): foo.foo.7rcbfp3g-cgu.0.rcgu.dwo
    <a8>   DW_AT_GNU_dwo_id  : 0x357472a2b032d7b9
    <b0>   DW_AT_low_pc      : 0x0
    <b8>   DW_AT_ranges      : 0x40
    <bc>   DW_AT_GNU_addr_base: 0x0
# ...
```

##### To-Do
I've opened this PR as a draft to get feedback and work out how we'd expect rustc to work when Split DWARF is requested. It might be easier to read the PR commit-by-commit.

- [ ] Add error when Split DWARF is requested on platforms where it doesn't make sense.
- [x] Determine whether or not there should be a single `dwo` output from rustc, or one per codegen-unit as exists currently.
- [x] Add tests.
- [x] Fix `single` mode - currently single mode doesn't change the invocation of `addPassesToEmitFile`, which is correct, but it also needs to change the split dwarf path provided to `createCompileUnit` and `createTargetMachine` so that it's just the final binary (currently it is still a non-existent `dwo` file).

r? `@nagisa`
cc `@michaelwoerister` `@eddyb` `@alexcrichton` `@rust-lang/wg-incr-comp`
@alexcrichton
Copy link
Member

For reference, #79570 is attempting to stabilize this option for macOS and provides a path for stabilization for the recent split-dwarf support from #77117

@nbraud
Copy link

nbraud commented Sep 19, 2021

@alexcrichton What are the blockers for stabilizing -Csplit-debuginfo on Linux?

@davidtwco
Copy link
Member

@alexcrichton What are the blockers for stabilizing -Csplit-debuginfo on Linux?

My understanding is that #81024 is the primary blocker, it’s on my todo list and I hope to get to it soon.

@Logarithmus
Copy link
Contributor

@davidtwco thanks for your hard work, but where is the stabilization tracking issue? Also can I currently try it on latest nightly?

@bjorn3
Copy link
Member

bjorn3 commented Jan 10, 2022

You can try it on nightly using -Csplit-debuginfo.

@davidtwco
Copy link
Member

You'll also need -Zunstable-options for Split DWARF.

@ayosec
Copy link
Contributor

ayosec commented Sep 20, 2022

For those looking for an alternative to -Csplit-debuginfo on Linux, objcopy can be used to keep the debugging symbols in another file:

Example:

cp /usr/bin/alacritty .


# Keep symbols in 'alacritty.debug', and compress
# the data.
objcopy                       \
    --compress-debug-sections \
    --only-keep-debug         \
    alacritty                 \
    alacritty.debug


# Remove debugging symbols from 'alacritty' file,
# and link both files, so GDB can load symbols
# from 'alacritty.debug'.
objcopy                                 \
    --strip-debug                       \
    --add-gnu-debuglink=alacritty.debug \
    alacritty

The final size is much smaller, even the .debug file, since data is now compressed.

7.3M alacritty
7.6M alacritty.debug
 39M /usr/bin/alacritty

GDB uses both files to get symbols:

$ gdb ./alacritty
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
[…]
Reading symbols from ./alacritty...
Reading symbols from /…/alacritty.debug...

@Noratrieb Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
@wesleywiser
Copy link
Member

Visited during compiler team tracking issue triage. Since #79570 and #98051 stabilized -Csplit-debuginfo, I believe this issue can now be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-debuginfo Area: Debugging information in compiled programs (DWARF, PDB, etc.) A-incr-comp Area: Incremental compilation C-feature-request Category: A feature request, i.e: not implemented / a PR. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests