-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split debug info for emscripten #9871
Comments
The split-dwarf will still add some amount of information to the wasm file. I was proposing at WebAssembly/debugging#1 to have entire DWARF data moved to the external file. Benefits will be that a wasm file will contain no debug info -- it is easy for tools to not deal with multiple section in different files. |
I agree that that's preferable; the downside would just be that we'd be more different from other platforms. It's probably worth looking more at why the ELF solution splits the debug info the way it does, and how it works on other platforms which have done external DWARF such as Apple and HP. |
There's also still the question of whether to have entirely separate files or embed the info in the object files. |
Just to emphasize, being forced to serve wasm symbol files over a network is a showstopper. Meaningful applications have symbol files ranging in gigabytes of data. I don't have a preference whether there should be two binaries, a stripped and a full one, or whether to do split-dwarf. Both can be achieved in post-processing though, so I don't even think there is a true upside to either. |
I agree with @pfaffe here. The everything in one binary approach is only going to work for reasonable applications if we can load the I do however also see the benefit of having everything in one binary. That's going to make a lot of steps in the pipeline a lot easier. |
Indeed, that's why I filed this issue in the first place.
Let's also bear in mind the eventual use case where the debugger isn't running as a native application but is somehow factored into a standardized debugger module or language component integrated into devtools. It's not exactly clear yet what the consequences of that would be (reduced memory or other resources available? debug info might actually be served on the side by the server? maybe there will be similar filesystem APIs and nothing is much different?) but worth keeping in mind for the longer term. We can already use the stripped+full-debug-in-one-binary workflow without any extra tool support; putting the stripped binary on the server and loading the full binary in the debugger would be analogous to loading a full native binary in the debugger and then attaching to a process running the stripped binary. But the difference is that stripping would basically be mandatory even for local testing of a large app, which isn't the case for native, and adds extra friction. So it would be nice if we could make it easier. I investigated at this a bit more late yesterday, using a static debug build of clang. My hope was that the skeleton debug info that gets left behind in the main executable when using The monolithic debug clang is a ~1.3G binary, about 1.1G of which is debug info:
Debug info for split dwarf:
... in those sections, now only 165M, on the order of the text size. Not fantastic but probably workable for local debugging. However it now also has pubtypes/pubnames sections (accelerated access tables), which are not present by default in the monolithic build:
This is another 685M, for a total of 850M, which isn't that much less than the original, and probably back into showstopper territory. It's not obvious to me why the split version would need the table when the monolithic version wouldn't (perhaps to avoid having to searching a bunch of different files on name lookup is much worse than searching through a single one?), nor why the tables are almost as big as the debug info itself. |
Any progress on this? The way we are currently prototyping this for DevTools is: Given that we already know that we will need to deal with applications that have gigabytes of debug data, I think it makes sense to focus on the separate cc @hashseed |
@bmeurer, in your diagram I assume you meant to write I started looking into implemented |
I think And yes, I think we probably want to use wasm containers, unless there's some compelling reason not to. |
+cc @paolosevMSFT |
#10568 implemented basic splitting. |
I've started working on support for gsplit-dwarf for wasm: https://reviews.llvm.org/D85685 |
Since split-dwarf is currently implemented, I'm going to close this issue. We can open new ones for bugs or future features (for example, since we support split-dwarf and will probably use it in some form, we should also support DWP files.) |
Filed #13251 |
Here's an overview of debug-info splitting options on ELF platforms and how we might apply them to wasm. It doesn't discuss yet how it would be implemented.
Currently LLVM supports outputting debug info in wasm object files, in the traditional GNU manner where all debug info is in the object file in several sections such as
.debug_info
, and linked into the executable in the same sections. For deployment, binaries can be built optimized but with debug info, and then stripped; a copy of the full binary can then be archived, and used to symbolize or debug the stripped binary.This has the advantage that it uses the minimal number of files, the simplest compile+link flow (every build system can handle it), and the debugger needs only one file to debug the binary. It has the disadvantage that the linker must merge all the debug info on every incremental build (slowing the link), and the resulting executable is very large. This latter disadvantage is especially important for wasm, because (even when just debugging) the binary must be sent over a network (even if a fast one) and loaded into the VM (which is much more expensive than a Linux loader which just needs to mmap the sections).
GCC and LLVM for ELF targets support "split-dwarf" mode, using the
-gsplit-dwarf
flag (a good overview is here), which splits most of the debug info out from each object into a separatedwo
file. In that case the object file has a much smaller.debug_info
section and its dwo file has a large.debug_info.dwo
section with most of the info. The debugger must then look up any required dwo file on demand when debugging. This mitigates both of the aforementioned disadvantages of traditional debug info. One disadvantage of this approach is that all of the dwo files must be available to debug the binary, and moving them around is annoying (many files, need to preserve directory hierarchy and pathnames). A second disadvantage is that there are now 2 output files for each C file, which makes things more challenging for build systems.The
dwp
tool can be used to combine all of the debug info indwo
files into a single.dwp
file that goes alongside the executable, which mitigates that first disadvantage, at the cost of invoking another tool at link time. Clang has a decent solution for the second issue in the form of a second split-dwarf variant,-gsplit-dwarf=single
. This splits the debug info into the same sections as the other split-dwarf variant but puts the.debug_info.dwo
section in the object file instead of a separate dwo file. The linker only links the.debug_info
sections into the final executable (not the.debug_info.dwo
sections), keeping the advantages of splitting without the extra build system pain.dwp
works the same way.In all these cases, there is still a small amount of debug info in the final binary, which is unfortunate but I don't know of a better way in the ELF world.
It sounds to me like using
-gsplit-dwarf=single
is a good goal to shoot for. It allows small wasm binaries (which is the thing we need most), without making things harder on build systems than they need to be. It would require the linker and debugger to support split dwarf and would lead to a slightly more complex optimal deployment method (extra link steps and/or storing/distributing/serving .o/.dwo/.dwp files) but it seems worth it.Thoughts?
@sbc @yurydelendik @azakai @pfaffe @bmeurer
The text was updated successfully, but these errors were encountered: