Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbol stripping in NativeAOT to reduce binary size #69847

Closed
am11 opened this issue May 26, 2022 · 9 comments · Fixed by #70233
Closed

Symbol stripping in NativeAOT to reduce binary size #69847

am11 opened this issue May 26, 2022 · 9 comments · Fixed by #70233
Labels
area-NativeAOT-coreclr size-reduction Issues impacting final app size primary for size sensitive workloads
Milestone

Comments

@am11
Copy link
Member

am11 commented May 26, 2022

Repro:

# bash on linux-x64

$ dotnet7p5 new console -n nativeapp1
$ dotnet7p5 publish nativeapp1 --use-current-runtime -p:PublishAot=true -c Release -o artifacts

# check the app size (in bytes)
$ stat -c%s artifacts/nativeapp1
17962760

# extract symbols in .dbg file, strip unneeded symbols from them binary and link .dbg with binary
# see https://github.com/dotnet/runtime/blob/5d3288d/eng/native/functions.cmake#L374
$ objcopy --only-keep-debug artifacts/nativeapp1 artifacts/nativeapp1.dbg
$ objcopy --strip-unneeded artifacts/nativeapp1
$ objcopy --add-gnu-debuglink=artifacts/nativeapp1.dbg artifacts/nativeapp1

# check the size again
$ stat -c%s artifacts/nativeapp1
5895664

# size of dbg
$ stat -c%s artifacts/nativeapp1.dbg
12070608

# test if debug symbols are read by the debugger
$ gdb artifacts/nativeapp1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from artifacts/nativeapp1...
Reading symbols from /home/am11/projects/artifacts/nativeapp1.dbg...
(gdb) 

Extracting symbols (in a separate .dbg file) reduced the hello world binary size by 67%. We should consider doing this by default.

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label May 26, 2022
@am11 am11 added size-reduction Issues impacting final app size primary for size sensitive workloads area-NativeAOT-coreclr labels May 26, 2022
@ghost
Copy link

ghost commented May 26, 2022

Tagging subscribers to 'size-reduction': @eerhardt, @SamMonoRT, @marek-safar
See info in area-owners.md if you want to be subscribed.

Issue Details

Repro:

# bash on linux-x64

$ dotnet7p5 new console -n nativeapp1
$ dotnet7p5 publish nativeapp1 --use-current-runtime -p:PublishAot=true -c Release -o artifacts

# check the app size (in bytes)
$ stat -c%s artifacts/nativeapp1
17962760

# extract symbols in .dbg file, stip unneeded symbols from them binary and link .dbg with binary
# see https://github.com/dotnet/runtime/blob/5d3288d/eng/native/functions.cmake#L374
$ objcopy --only-keep-debug artifacts/nativeapp1 artifacts/nativeapp1.dbg
$ objcopy --strip-unneeded artifacts/nativeapp1
$ objcopy --add-gnu-debuglink=artifacts/nativeapp1.dbg artifacts/nativeapp1

# check the size again
$ stat -c%s artifacts/nativeapp1
5895664

# size of dbg
$ stat -c%s artifacts/nativeapp1.dbg
12070608

# test if debug symbols are read by the debugger
$ gdb artifacts/nativeapp1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from artifacts/nativeapp1...
Reading symbols from /home/am11/projects/artifacts/nativeapp1.dbg...
(gdb) 

Extracting symbols (in a separate .dbg file) reduced the hello world binary size by 67%. We should consider doing this by default.

Author: am11
Assignees: -
Labels:

untriaged, size-reduction, area-NativeAOT-coreclr

Milestone: -

@MichalStrehovsky
Copy link
Member

MichalStrehovsky commented May 28, 2022

For NativeAOT, we try to follow platform conventions whenever reasonable. AFAIK debug information embedded in the executable is the platform convention on Unix-like systems. Is that not the case?

FWIW, we document this on https://aka.ms/OptimizeNativeAOT (bottom of the doc).

@am11
Copy link
Member Author

am11 commented May 28, 2022

Is that not the case?

I am not sure what is the particular Unix wide convention and if there is one, but most binaries I have found on my Ubuntu 20.04 box are stripped:

# number of stripped binaries in /usr/bin
$ find /usr/bin -exec file -L {} \; | grep stripped | grep -v "not stripped" | wc -l
1697

# number of non-stripped binaries in /usr/bin
$ find /usr/bin -exec file -L {} \; | grep "not stripped" | wc -l
6

Also, all .NET binaries are stripped on linux, including apphost / singlefilehost, so the apps published with corehost are also stripped.

@MichalStrehovsky
Copy link
Member

I mean the default settings for the compiler that produces the executable. Maybe what we want is an easier way to opt in?

(I'm not Unix person myself, so don't have an opinion besides "follow the platform convention").

@am11
Copy link
Member Author

am11 commented May 28, 2022

gcc/clang defaults usually favor Unix legacy. e.g. a.out is the default binary name which is based on the convention from pre-ELF / pre-System-V 1970's era, default output type is executable, PIC/PIE are not default (but highly recommended) etc. We do produce PIC by default, with no opting in or out, and therefore don't follow the defaults of complier toolchain.

From dotnet publish view point, it would also make sense to align PublishAot's behavior with PublishSingleFile, which produce stripped binary. One key difference would be that .dbg file is produced next to the binary in case of NativeAOT (singlefilehost's native symbols are normally fetched from the server when SOS is installed). This way opt-in won't be necessary.

Opt-out is also unnecessary for this IMHO. Non-stripped binaries are generally not distributed. Folks who really want embedded symbols can use tools like eu-unstrip to reverse this effect.

Note: user is not missing anything, all symbols are there, but in a separate "fat symbol file" (symbol file is rarely needed in production environment).

@MichalStrehovsky
Copy link
Member

So cargo build produces unstripped executables for Rust. One has to add extra stuff to cargo.toml to have cargo do it for you (added recently - rust-lang/cargo#8246 - one had to pass extra ldflags before that).

I would still prefer to align with rustc/clang/gcc. Rustc doesn't have 50 years of legacy and they still chose unstripped to be the default.

Are there any examples of toolchains that strip by default? PublishSingleFile doesn't count because it doesn't actually generate an executable (it glues managed assemblies at the end of a preexisting executable - the symbols would be meaningless for the glued part).

@am11
Copy link
Member Author

am11 commented May 30, 2022

Makes sense. I think having it optional and wire it with an msbuild property like <StripSymbols>true/false would be enough.

Are there any examples of toolchains that strip by default?

I tested with nexe (node.js native), it also produces unstripped binary.

@MichalStrehovsky MichalStrehovsky added this to the 7.0.0 milestone May 31, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label May 31, 2022
@jkotas
Copy link
Member

jkotas commented Jun 3, 2022

@am11 Are you be interested in contributing the build targets for this to make it easy to opt-in into symbol stripping?

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 4, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 6, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jul 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-NativeAOT-coreclr size-reduction Issues impacting final app size primary for size sensitive workloads
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants