Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce binary size #17

Closed
junr03 opened this issue Apr 11, 2019 · 30 comments · Fixed by #182
Closed

reduce binary size #17

junr03 opened this issue Apr 11, 2019 · 30 comments · Fixed by #182
Assignees
Labels
Milestone

Comments

@junr03
Copy link
Member

junr03 commented Apr 11, 2019

In order for us to be able to ship Envoy on the client we need to optimize for binary size.

We will need to reduce Envoy Mobile’s final application bloat (compressed) down to < 5MB to be able to ship in a production mobile application

@junr03 junr03 transferred this issue from another repository May 2, 2019
@junr03 junr03 added the perf label May 2, 2019
@junr03 junr03 added this to the v0.2 "Primo" milestone May 2, 2019
@junr03
Copy link
Member Author

junr03 commented May 6, 2019

From initial cursory investigation of Cronet, the library size is in the single digits Mb. Not suggesting anything necessarily, just putting out there a point of reference.

@tonya11en
Copy link
Member

I hacked up a basic binary just to see the effect of linking in library/common/main_interface.h":

#include "main_interface.h"

using namespace std;

int main() {
  return 0;
}

Creating the binary took this patch:

diff --git a/library/common/BUILD b/library/common/BUILD
index cbf681c..e2162af 100644
--- a/library/common/BUILD
+++ b/library/common/BUILD
@@ -1,6 +1,6 @@
 licenses(["notice"])  # Apache 2
 
-load("@envoy//bazel:envoy_build_system.bzl", "envoy_cc_library", "envoy_package")
+load("@envoy//bazel:envoy_build_system.bzl", "envoy_cc_library", "envoy_package", "envoy_cc_binary")
 
 envoy_package()
 
@@ -11,3 +11,10 @@ envoy_cc_library(
     repository = "@envoy",
     deps = ["@envoy//source/exe:envoy_main_common_lib"]
 )
+
+envoy_cc_binary(
+    name = "tony_test",
+    srcs = ["tony_test.cc"],
+    repository = "@envoy",
+    deps = [":envoy_main_interface_lib"]
+)

Just compiling a noop program that exits immediately yields 8.0K binary size:

>> ls -lh a.out 
-rwxrwxr-x 1 tallen tallen 8.0K May  6 17:42 a.out

However, including the main_interface.h file does make it gain some weight:

>> ls -lh bazel-bin/library/common/tony_test
-r-xr-xr-x 1 tallen tallen 48M May  6 17:27 bazel-bin/library/common/tony_test

Stripping any unnecessary symbols cuts the size roughly in half:

 tallen@rathma  /tmp 
(Mon May  6 17:46:08 PDT 2019)
>> strip --strip-unneeded tony_test
 tallen@rathma  /tmp 
(Mon May  6 17:46:14 PDT 2019)
>> ls -lh tony_test
-rwxrwxrwx 1 tallen tallen 25M May  6 17:46 tony_test

Bloaty output on the stripped binary:

     VM SIZE                          FILE SIZE
 --------------                    --------------
  55.4%  13.8Mi .text               13.8Mi  55.7%
  16.6%  4.14Mi .eh_frame           4.14Mi  16.7%
  14.7%  3.66Mi .rodata             3.66Mi  14.7%
   4.5%  1.13Mi .rela.dyn           1.13Mi   4.6%
   3.9%   996Ki .eh_frame_hdr        996Ki   3.9%
   2.2%   553Ki .gcc_except_table    553Ki   2.2%
   1.3%   321Ki .data.rel.ro         321Ki   1.3%
   0.5%   121Ki .bss                     0   0.0%
   0.4%   101Ki .data                101Ki   0.4%
   0.3%  67.4Ki .data.rel.ro.local  71.6Ki   0.3%
   0.2%  44.2Ki .dynstr             44.2Ki   0.2%
   0.1%  18.0Ki .dynsym             18.0Ki   0.1%
   0.0%  6.40Ki .rela.plt           6.40Ki   0.0%
   0.0%  5.25Ki [19 Others]         5.89Ki   0.0%
   0.0%  4.28Ki .plt                4.28Ki   0.0%
   0.0%  4.21Ki .tbss                    0   0.0%
   0.0%  3.44Ki .gnu.hash           3.44Ki   0.0%
   0.0%  3.27Ki .init_array         3.27Ki   0.0%
   0.0%     624 [ELF Headers]       2.92Ki   0.0%
   0.0%       0 [Unmapped]          2.38Ki   0.0%
   0.0%  2.16Ki .got.plt            2.16Ki   0.0%
 100.0%  25.0Mi TOTAL               24.9Mi 100.0%

More analysis to come...

@tonya11en
Copy link
Member

Compiling with -Os results in a 19MB binary:

>> ls -lh bazel-bin/library/common/tony_test
-r-xr-xr-x 1 tallen tallen 19M May  6 18:23 bazel-bin/library/common/tony_test

Stripping this binary takes off another 6MB:

 ✘ tallen@rathma  /tmp 
(Mon May  6 18:25:50 PDT 2019)
>> sudo strip -S --strip-unneeded --remove-section=.note.gnu.gold-version --remove-section=.comment --remove-section=.note --remove-section=.note.gnu.build-id --remove-section=.note.ABI-tag tony_test
 tallen@rathma  /tmp 
(Mon May  6 18:25:52 PDT 2019)
>> ls -lh tony_test
-r-xr-xr-x 1 tallen tallen 13M May  6 18:25 tony_test

Next step is to remove unused objects from the binary (from unneeded includes).

@mattklein123
Copy link
Member

mattklein123 commented May 7, 2019

@tonya11en surprised it is this big. Can you confirm we are compiling without any extensions as well as compiling out hot restart, google gRPC, etc.?

@tonya11en
Copy link
Member

tonya11en commented May 7, 2019

We have an extensions file for this repo (envoy_build_config/extensions_build_config.bzl) that contains the bare minimum:

EXTENSIONS = {
    "envoy.filters.http.router":                        "//source/extensions/filters/http/router:config",
    "envoy.filters.network.http_connection_manager":    "//source/extensions/filters/network/http_connection_manager:config",
}
WINDOWS_EXTENSIONS = {}

There's also an Envoy extensions file in envoy/source/extensions/extensions_build_config.bzl which I attempted to modify. After cutting out all of the non-essential extensions from there, it maintained the same 19MB binary size (this is with the compile options in the previous post). Performing the strip drops the binary down to 13MB just as we saw before, so I'm pretty sure it's using the extensions file in this repo.

@tonya11en
Copy link
Member

Looks like running the bazel build with --copt="-Os" doesn't propogate to all subprojects? When running with bazel build -s ...., I'm seeing that protobuf is getting built with -O2:

SUBCOMMAND: # @com_google_protobuf//:protobuf_lite [action 'Compiling external/com_google_protobuf/src/google/protobuf/stubs/statusor.cc [for host]']
(cd /home/tallen/.cache/bazel/_bazel_tallen/38c86e9e1dc24b71a3a025aaed04eb1a/execroot/__main__ && \
  exec env - \
    PATH=/home/tallen/.local/bin:/home/tallen/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/rh/git19/root/usr/bin:/opt/local/bin:/opt/local/sbin:/usr/local/bin:/usr/bin:/bin:/  usr/sbin:/sbin:/opt/X11/bin:/usr/texbin:/home/tallen/gopath/bin:/home/tallen/bin \
    PWD=/proc/self/cwd \
  /home/tallen/.cache/bazel/_bazel_tallen/38c86e9e1dc24b71a3a025aaed04eb1a/external/local_config_cc/extra_tools/envoy_cc_wrapper -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++0x' -MD -MF bazel-out/host/bin/external/com_google_protobuf/_objs/protobuf_lite/statusor.d '-frandom-seed=bazel-out/host/bin/external/             com_google_protobuf/_objs/protobuf_lite/statusor.o' -iquote external/com_google_protobuf -iquote bazel-out/host/genfiles/external/com_google_protobuf -iquote bazel-out/host/bin/external/com_google_protobuf -isystem external/               com_google_protobuf/src -isystem bazel-out/host/genfiles/external/com_google_protobuf/src -isystem bazel-out/host/bin/external/com_google_protobuf/src -g0 -g0 -DHAVE_PTHREAD -DHAVE_ZLIB -Wall -Woverloaded-virtual -Wno-sign-compare -Wno-   unused-function -Wno-write-strings -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/com_google_protobuf/src/google/protobuf/stubs/statusor. cc -o bazel-out/host/bin/external/com_google_protobuf/_objs/protobuf_lite/statusor.o)

However, Envoy and BoringSSL are being built with -Os as expected.

Envoy:

SUBCOMMAND: # @envoy_api//envoy/api/v2/core:grpc_service_cc [action 'Compiling external/envoy_api/envoy/api/v2/core/grpc_service.pb.cc']
(cd /home/tallen/.cache/bazel/_bazel_tallen/38c86e9e1dc24b71a3a025aaed04eb1a/execroot/__main__ && \
  exec env - \
    PATH=/home/tallen/.local/bin:/home/tallen/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/rh/git19/root/usr/bin:/opt/local/bin:/opt/local/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/usr/texbin:/home/tallen/gopath/bin:/home/tallen/bin \
    PWD=/proc/self/cwd \
  /home/tallen/.cache/bazel/_bazel_tallen/38c86e9e1dc24b71a3a025aaed04eb1a/external/local_config_cc/extra_tools/envoy_cc_wrapper -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-std=c++0x' -MD -MF bazel-out/k8-fastbuild/bin/external/envoy_api/envoy/api/v2/core/_objs/grpc_service_cc/grpc_service.pb.pic.d '-frandom-seed=bazel-out/k8-fastbuild/bin/external/envoy_api/envoy/api/v2/core/_objs/grpc_service_cc/grpc_service.pb.pic.o' -fPIC -iquote external/envoy_api -iquote bazel-out/k8-fastbuild/genfiles/external/envoy_api -iquote bazel-out/k8-fastbuild/bin/external/envoy_api -iquote external/com_github_gogo_protobuf -iquote bazel-out/k8-fastbuild/genfiles/external/com_github_gogo_protobuf -iquote bazel-out/k8-fastbuild/bin/external/com_github_gogo_protobuf -iquote external/com_google_protobuf -iquote bazel-out/k8-fastbuild/genfiles/external/com_google_protobuf -iquote bazel-out/k8-fastbuild/bin/external/com_google_protobuf -iquote external/googleapis -iquote bazel-out/k8-fastbuild/genfiles/external/googleapis -iquote bazel-out/k8-fastbuild/bin/external/googleapis -iquote external/com_lyft_protoc_gen_validate -iquote bazel-out/k8-fastbuild/genfiles/external/com_lyft_protoc_gen_validate -iquote bazel-out/k8-fastbuild/bin/external/com_lyft_protoc_gen_validate -isystem external/com_google_protobuf/src -isystem bazel-out/k8-fastbuild/genfiles/external/com_google_protobuf/src -isystem bazel-out/k8-fastbuild/bin/external/com_google_protobuf/src -isystem bazel-out/k8-fastbuild/genfiles/external/envoy/bazel/foreign_cc/zlib/include -Os -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c bazel-out/k8-fastbuild/genfiles/external/envoy_api/envoy/api/v2/core/grpc_service.pb.cc -o bazel-out/k8-fastbuild/bin/external/envoy_api/envoy/api/v2/core/_objs/grpc_service_cc/grpc_service.pb.pic.o)

BoringSSL:

SUBCOMMAND: # @boringssl//:crypto [action 'Compiling external/boringssl/src/crypto/asn1/asn1_par.c']                                                                                                                                           
(cd /home/tallen/.cache/bazel/_bazel_tallen/38c86e9e1dc24b71a3a025aaed04eb1a/execroot/__main__ && \
  exec env - \
    PATH=/home/tallen/.local/bin:/home/tallen/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/rh/git19/root/usr/bin:/opt/local/bin:/opt/local/sbin:/usr/local/bin:/usr/bin:/bin:/  usr/sbin:/sbin:/opt/X11/bin:/usr/texbin:/home/tallen/gopath/bin:/home/tallen/bin \
    PWD=/proc/self/cwd \
  /home/tallen/.cache/bazel/_bazel_tallen/38c86e9e1dc24b71a3a025aaed04eb1a/external/local_config_cc/extra_tools/envoy_cc_wrapper -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -MD -MF bazel-out/k8-fastbuild/bin/external/boringssl/_objs/crypto/asn1_par.pic.d '-frandom-seed=bazel-out/k8-fastbuild/bin/external/boringssl/_objs/crypto/asn1_par.pic.o' -fPIC -iquote external/boringssl -iquote bazel-out/k8-     fastbuild/genfiles/external/boringssl -iquote bazel-out/k8-fastbuild/bin/external/boringssl -isystem external/boringssl/src/include -isystem bazel-out/k8-fastbuild/genfiles/external/boringssl/src/include -isystem bazel-out/k8-fastbuild/   bin/external/boringssl/src/include -Os -Wa,--noexecstack '-D_XOPEN_SOURCE=700' -Wall -Werror '-Wformat=2' -Wsign-compare -Wmissing-field-initializers -Wwrite-strings -Wshadow -fno-common '-std=c11' -Wmissing-prototypes -Wold-style-        definition -Wstrict-prototypes -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/boringssl/src/crypto/asn1/asn1_par.c -o bazel-out/k8-       fastbuild/bin/external/boringssl/_objs/crypto/asn1_par.pic.o)

@tonya11en
Copy link
Member

Made some changes to Envoy OSS to generate a size-optimized binary. Looks like it works. Here are the differences from the same Envoy SHA:

tallen@envoy-buildbox:~$ ls -lh
total 57M
drwxrwxr-x 23 tallen tallen 4.0K May 15 21:50 envoy
-rwxr-xr-x  1 tallen tallen  41M May 15 21:47 envoy.stripped.release
-rwxr-xr-x  1 tallen tallen  16M May 15 21:51 envoy.stripped.sizeopt

This is a decent starting point for further analysis and measurement of effects of various changes. Here's a list of where I'd like to go next:

  • Measure effect of removing hot restart and tcmalloc.
  • Allow for option in Envoy allowing us to compile out debug/trace logs.

@mattklein123
Copy link
Member

@tonya11en awesome work. Yes, please compile out all the things we are compiling out in mobile as a base point (please sync with @junr03 and @goaway on this). I would also love to see a per-object file breakdown of size.

@tonya11en
Copy link
Member

Learned this morning that setting -Os in the copts propagates this down to dependencies. That approach shaves off another 1MB:

build_sizeopt_stripped:
total 13728
-rwxr-xr-x 1 root root 14057072 May 16 20:34 envoy

Once envoyproxy/envoy#6960 gets merged in, that'll be a good starting point to begin breaking down the space usage of the library per-object as Matt mentioned.

@tonya11en
Copy link
Member

envoy_obj_dump.txt

Uploading granular objdump of the size-optimized envoy lib for future analysis.

@mattklein123
Copy link
Member

@tonya11en any thoughts on how to:

  1. demangle the symbols so they are easy to read
  2. Maybe actually group the symbols into a per-file/per-directory roll up for easier analysis?

cc @Reflejo who I know has done stuff like this in the past and can probably advise. @Reflejo I wonder if any of the scripts you wrote for LBS would be useful here?

@tonya11en
Copy link
Member

Looks like I can use bloaty to get the per-file breakdown. Building everything with the symbols now. I'll give an update today with the result.

@tonya11en
Copy link
Member

tonya11en commented Jun 3, 2019

Attaching a by-file breakdown of the binary size contributions for an envoy lib built at Envoy commit 4bb3fbf70d36ecd5d1a1747d7d99478e4c3ecd22. This should be the size breakdown with the binary stripped of symbols.

I was also able to generate a breakdown of inlined code, but the text file is >70MB, so I will not attach.

I'll follow up with some initial analysis later today or tomorrow morning.

bloaty_cu.txt

@mattklein123
Copy link
Member

Thanks @tonya11en this is awesome. Is it possible to also generate a report filtered by directory? I think that would be useful just to look at entire things we might be able to get rid of.

@mattklein123
Copy link
Member

(Also, could you document/commit somewhere how to generate this data)

@tonya11en
Copy link
Member

Documenting this process in https://github.com/lyft/envoy-edge-dev/pull/95. I will transfer the document over to the envoy-mobile repo as once the organization of the code becomes apparent.

@junr03
Copy link
Member Author

junr03 commented Jun 19, 2019

I picked up where @tonya11en left off.

I ran the same binary he was creating with a custom bloaty data source:

custom_data_source: {
  name: "bloaty_package"
  base_data_source: "compileunits"

  rewrite: {
    pattern: "^(external/envoy/source/)(\\w+/)(\\w+)"
    replacement: "envoy \\2"
  }

  rewrite: {
      pattern: "^(external/)(\\w+/)"
      replacement: "\\2"
  }

  rewrite: {
      pattern: "([.pb.cc | .pb.validate.cc])$"
      replacement: "compiled protos"
  }
}

This starts allowing us to aggregate size by files. The command ran was:

bloaty -c envoy.bloaty --debug-file=test_tony_06_19_10\:00 -d bloaty_package,compileunits test_tony_06_19_10\:00.stripped > test_initial_06_19.out

The binaries were created with:

bazel build //library/common:test_binary.stripped --config=tinybuild and bazel build //library/common:test_binary --config=tinybuild -c dbg --strip=never

Results: test_initial_06_19.txt

Coarse sections result:

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  33.6%  7.19Mi  48.8%  7.19Mi .text
  23.0%  4.93Mi   0.0%       0 .strtab
  17.6%  3.76Mi  25.5%  3.76Mi .rodata
   8.6%  1.85Mi   0.0%       0 .symtab
   6.8%  1.46Mi   9.9%  1.46Mi .gcc_except_table
   6.5%  1.38Mi   9.4%  1.38Mi .eh_frame
   1.8%   391Ki   2.6%   390Ki .data.rel.ro
   1.6%   342Ki   2.3%   342Ki .eh_frame_hdr
   0.0%       0   0.7%   110Ki .bss
   0.2%  48.9Ki   0.3%  48.9Ki .got
   0.1%  27.0Ki   0.2%  27.0Ki .data
   0.0%  8.46Ki   0.0%  5.00Ki [20 Others]
   0.0%  6.40Ki   0.0%  6.40Ki .dynsym
   0.0%  6.12Ki   0.0%  6.12Ki .rela.plt
   0.0%       0   0.0%  4.24Ki .tbss
   0.0%  4.09Ki   0.0%  4.09Ki .plt
   0.0%  3.19Ki   0.0%  3.19Ki [LOAD #2 [R]]
   0.0%  3.17Ki   0.0%       0 .init_array
   0.0%  3.09Ki   0.0%     792 [ELF Headers]
   0.0%  3.02Ki   0.0%  3.02Ki .dynstr
   0.0%  1.04Ki   0.0%  2.93Ki [LOAD #4 [RW]]
 100.0%  21.4Mi 100.0%  14.7Mi TOTAL

Interesting observations:

  1. boringssl and protobuf take up -- together - ~3.5mi so even when we squeeze what we can out of the envoy codebase this will be a determining factor for the size. I discussed with @goaway, and we might want to look at possibilities for dynamically linking to the version compiled into the parent app. More discussion is needed in this point.
  2. The largest culprit of size are the compiled protos. Both the normal ones and the .validate ones generated by protoc-gen-validate. Most of these protos are not needed. I had thought that they would be compiled out if they were not included by any of the source files used. Good avenue for exploration.
  3. As far as I can see, only the necessary extensions are getting compiled in, so that is good.
  4. Note that the exception table is a non-negligible amount. However, this is not representative of what we are going to put in the app given that I did not compile against arm. I will do that to get and actual sense for the real exception table size.

Next steps:

  1. Looking into compiling out logging. I will hack something quick, get data, and then work to upstream.
  2. Look into the protos
  3. compile against arm.

@junr03
Copy link
Member Author

junr03 commented Jun 21, 2019

06/20 update

What was done

  1. I would have liked to compile just the c++ code with arm64 as a target architecture. However, based on my own research, my reaching out to several members of the envoy community, and talking to @keith, building that kind of cross compile support is "where dragins lie". I might try compiling natively at some point, just not now.
  2. Given 1 I moved on to compiling the artifacts for ios and android given that those rules already have cross compilation toolchains available to them. I will break down what I did in each platform.
iOS

I built the Envoy.framework with the following command:
bazel build //:ios_dist --config=ios -c dbg --strip=never --ios_multi_cpus=arm64

Using this I can get size information of the composing object files with

mkdir /tmp/foo
cd /tmp/foo
ar x path/to/Envoy.framework
du -sh * | gsort -h

However, when I tried to get symbols from the object files I was not able to do so. e.g:

12:29:45  /tmp/dbg_framework $ bloaty -d compileunits listener_manager_impl.o
bloaty: missing debug info
12:30:00  /tmp/dbg_framework $ dsymutil listener_manager_impl.o
warning: no debug symbols in executable (-arch x86_64)

although as far as I can tell the object file does have debug symbols:

/tmp/dbg_framework $ bloaty listener_manager_impl.o
    FILE SIZE        VM SIZE
 --------------  --------------
  44.0%  14.1Mi  45.6%  14.1Mi ,__debug_str
  24.6%  7.87Mi  25.5%  7.87Mi ,__debug_pubtypes
  20.8%  6.66Mi  21.6%  6.66Mi ,__debug_pubnames
   4.2%  1.35Mi   4.4%  1.35Mi ,__debug_info
   2.5%   834Ki   0.0%       0 String Table
   1.2%   393Ki   1.2%   393Ki ,__text
   0.6%   196Ki   0.6%   196Ki ,__eh_frame
   0.6%   182Ki   0.0%       0 [Unmapped]
   0.5%   177Ki   0.6%   177Ki ,__debug_line
   0.5%   155Ki   0.5%   155Ki ,__compact_unwind
   0.3%  87.7Ki   0.0%       0 Symbol Table
   0.0%  13.0Ki   0.0%  13.0Ki ,__const
   0.0%  7.88Ki   0.0%  7.88Ki ,__gcc_except_tab
   0.0%  7.12Ki   0.0%  7.12Ki ,__cstring
   0.0%  4.98Ki   0.0%  4.98Ki ,__debug_ranges
   0.0%  4.70Ki   0.0%  4.70Ki ,__debug_abbrev
   0.0%  2.19Ki   0.0%       0 [Mach-O Headers]
   0.0%  1.56Ki   0.0%  1.56Ki ,__debug_loc
   0.0%     289   0.0%     279 [10 Others]
   0.0%     288   0.0%     288 ,__StaticInit
   0.0%       0   0.0%     136 ,__bss
 100.0%  32.0Mi 100.0%  30.9Mi TOTAL

At this point I stopped working on iOS as having the debug symbols is important to actually see what is going on.

@keith did mention I could try using --apple_generate_dsym. I will try that tomorrow.

Open questions:
  1. Can I get debug symbols?
  2. Can I get bloat information for the entirety of the library rather than per .o file?
Android

I moved on to android where I built the envoy.aar with bazel build //:android_dist --fat_apk_cpu=arm64-v8a --strip=never -c dbg and extracted the libenvoy_jni.so file. I was able to use all of bloaty's features.

I also created an optimized stripped envoy.aar with bazel build //:android_dist ---fat_apk_cpu=arm64-v8a --strip=never -c opt

The bloaty output is:

junr03@thinkpad:~/Desktop$ bloaty opt_stripped.so 
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  29.8%  8.71Mi  41.9%  8.71Mi .text
  18.7%  5.47Mi   0.0%       0 .strtab
  14.6%  4.26Mi  20.5%  4.26Mi .dynstr
  10.3%  3.01Mi   0.0%       0 .symtab
   4.5%  1.33Mi   6.4%  1.33Mi .rela.dyn
   4.5%  1.31Mi   6.3%  1.31Mi .eh_frame
   4.0%  1.17Mi   5.6%  1.17Mi .dynsym
   3.6%  1.04Mi   5.0%  1.04Mi .rodata
   2.0%   587Ki   2.8%   587Ki .gcc_except_table
   1.6%   470Ki   2.2%   470Ki .data.rel.ro
   1.3%   399Ki   1.9%   399Ki .hash
   1.3%   376Ki   1.8%   376Ki .gnu.hash
   1.1%   328Ki   1.5%   328Ki .eh_frame_hdr
   0.9%   276Ki   1.3%   276Ki .rela.plt
   0.6%   184Ki   0.9%   184Ki .plt
   0.3%  99.8Ki   0.5%  99.8Ki .gnu.version
   0.0%       0   0.4%  92.2Ki .bss
   0.3%  92.1Ki   0.4%  92.1Ki .got.plt
   0.3%  83.8Ki   0.4%  81.3Ki [13 Others]
   0.1%  37.3Ki   0.3%  61.0Ki [LOAD #3 [RW]]
   0.2%  57.7Ki   0.0%       0 [Unmapped]
 100.0%  29.2Mi 100.0%  20.8Mi TOTAL

Given that baseline I went in an deleted all logging (with ifdefs, and commenting some stuff out). The diff between libenvoy_jni.so was:

junr03@thinkpad:~/Desktop$ bloaty no_log.so -- opt_orig.so 
    FILE SIZE        VM SIZE    
 --------------  -------------- 
   +71% +40.9Ki  [ = ]       0 [Unmapped]
  +2.5%    +952  +5.1% +3.12Ki [LOAD #3 [RW]]
  -0.5%    -272  -0.5%    -272 .got
  -0.4%    -376  -0.4%    -376 .gnu.version
  -0.0%    -224  -0.1%    -416 [5 Others]
  -0.2%    -940  -0.2%    -940 .gnu.hash
  -3.5%    -952  -3.5%    -952 .data
  -0.4% -1.23Ki  -0.4% -1.23Ki .eh_frame_hdr
  -0.4% -1.47Ki  -0.4% -1.47Ki .hash
  -1.9% -1.75Ki  -1.9% -1.75Ki .got.plt
  -1.9% -3.50Ki  -1.9% -3.50Ki .plt
  -0.3% -3.82Ki  -0.3% -3.82Ki .rela.dyn
  -0.4% -4.41Ki  -0.4% -4.41Ki .dynsym
  -1.9% -5.25Ki  -1.9% -5.25Ki .rela.plt
  -0.8% -10.2Ki  -0.8% -10.2Ki .eh_frame
  -0.5% -22.4Ki  -0.5% -22.4Ki .dynstr
  -0.4% -23.0Ki  [ = ]       0 .strtab
  -6.4% -68.2Ki  -6.4% -68.2Ki .rodata
  -2.4% -73.1Ki  [ = ]       0 .symtab
 -19.7%  -115Ki -19.7%  -115Ki .gcc_except_table
  -2.8%  -251Ki  -2.8%  -251Ki .text
  -1.8%  -546Ki  -2.3%  -489Ki TOTAL

So ~500k reduction by removing logging.

To showcase a question that @mattklein123 has asked before. Using the dbg libenvoy_jni.so we can see what source files are larger in the envoy codebase:

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  32.9%   125Mi  14.1%  7.95Mi envoy common/
      57.6%  72.2Mi  42.4%  3.37Mi [162 Others]
       5.3%  6.63Mi   8.0%   649Ki external/envoy/source/common/upstream/cluster_manager_impl.cc
       4.2%  5.20Mi   4.4%   362Ki external/envoy/source/common/access_log/access_log_formatter.cc
       3.9%  4.88Mi   5.6%   458Ki external/envoy/source/common/upstream/subset_lb.cc
       3.3%  4.17Mi   4.0%   324Ki external/envoy/source/common/upstream/upstream_impl.cc
       3.2%  3.98Mi   7.7%   627Ki external/envoy/source/common/stats/thread_local_store.cc
       2.7%  3.39Mi   2.9%   237Ki external/envoy/source/common/router/header_formatter.cc
       2.5%  3.14Mi   3.2%   262Ki external/envoy/source/common/router/config_impl.cc
       2.0%  2.46Mi   2.2%   181Ki external/envoy/source/common/http/conn_manager_impl.cc
       1.9%  2.32Mi   2.2%   181Ki external/envoy/source/common/router/router.cc
       1.8%  2.31Mi   2.8%   228Ki external/envoy/source/common/router/scoped_rds.cc
       1.4%  1.80Mi   1.7%   139Ki external/envoy/source/common/router/rds_impl.cc
       1.4%  1.74Mi   1.5%   119Ki external/envoy/source/common/upstream/health_checker_base_impl.cc
       1.2%  1.46Mi   3.3%   270Ki external/envoy/source/common/json/json_loader.cc
       1.2%  1.44Mi   2.0%   162Ki external/envoy/source/common/secret/secret_manager_impl.cc
       1.1%  1.39Mi   1.1%  89.8Ki external/envoy/source/common/http/async_client_impl.cc
       1.1%  1.39Mi   0.8%  63.6Ki external/envoy/source/common/upstream/health_discovery_service.cc
       1.1%  1.38Mi   1.3%   105Ki external/envoy/source/common/network/connection_impl.cc
       1.1%  1.35Mi   1.6%   127Ki external/envoy/source/common/upstream/health_checker_impl.cc
       1.1%  1.33Mi   1.2%  99.7Ki external/envoy/source/common/upstream/load_balancer_impl.cc
       1.1%  1.32Mi   0.0%      95 external/envoy/source/common/common/version_linkstamp.cc
  22.4%  85.1Mi  19.3%  10.9Mi compiled protos
      57.7%  49.1Mi  53.8%  5.87Mi [305 Others]
       6.0%  5.11Mi   6.4%   720Ki bazel-out/android-arm64-v8a-dbg/bin/external/envoy_api/envoy/api/v2/route/route.pb.validate.cc
       3.0%  2.58Mi   2.9%   326Ki bazel-out/android-arm64-v8a-dbg/bin/external/envoy_api/envoy/config/filter/network/http_connection_manager/v2/http_connection_manager.pb.validate.cc
       2.6%  2.24Mi   2.6%   295Ki bazel-out/android-arm64-v8a-dbg/bin/external/envoy_api/envoy/admin/v2alpha/config_dump.pb.validate.cc
       2.6%  2.21Mi   2.6%   292Ki bazel-out/android-arm64-v8a-dbg/bin/external/envoy_api/envoy/config/bootstrap/v2/bootstrap.pb.validate.cc
       2.5%  2.15Mi   3.2%   357Ki bazel-out/android-arm64-v8a-dbg/bin/external/envoy_api/envoy/api/v2/route/route.pb.cc
       2.4%  2.07Mi   2.4%   270Ki bazel-out/android-arm64-v8a-dbg/bin/external/envoy_api/envoy/config/filter/accesslog/v2/accesslog.pb.validate.cc
       2.4%  2.06Mi   2.5%   278Ki bazel-out/android-arm64-v8a-dbg/bin/external/envoy_api/envoy/api/v2/cds.pb.validate.cc
       2.2%  1.86Mi   2.4%   264Ki /usr/local/google/buildbot/src/android/ndk-release-r20/external/libcxx/src/locale.cpp
       2.0%  1.73Mi   2.1%   234Ki bazel-out/android-arm64-v8a-dbg/bin/external/envoy_api/envoy/api/v2/core/base.pb.validate.cc
       2.0%  1.67Mi   1.9%   212Ki bazel-out/android-arm64-v8a-dbg/bin/external/envoy_api/envoy/api/v2/core/grpc_service.pb.validate.cc
       1.9%  1.65Mi   1.9%   217Ki bazel-out/android-arm64-v8a-dbg/bin/external/envoy/source/server/hot_restart.pb.validate.cc
       1.8%  1.56Mi   1.9%   211Ki bazel-out/android-arm64-v8a-dbg/bin/external/envoy_api/envoy/api/v2/auth/cert.pb.validate.cc
       1.5%  1.31Mi   1.5%   163Ki bazel-out/android-arm64-v8a-dbg/bin/external/envoy_api/envoy/service/discovery/v2/hds.pb.validate.cc
       1.5%  1.24Mi   1.4%   159Ki bazel-out/android-arm64-v8a-dbg/bin/external/envoy_api/envoy/config/trace/v2/trace.pb.validate.cc
       1.4%  1.16Mi   2.9%   319Ki library/common/main_interface.cc
       1.4%  1.16Mi   1.4%   153Ki bazel-out/android-arm64-v8a-dbg/bin/external/envoy_api/envoy/api/v2/core/health_check.pb.validate.cc
       1.3%  1.09Mi   1.2%   138Ki bazel-out/android-arm64-v8a-dbg/bin/external/envoy_api/envoy/config/metrics/v2/stats.pb.validate.cc
       1.3%  1.09Mi   2.7%   303Ki bazel-out/android-arm64-v8a-dbg/bin/external/com_envoyproxy_protoc_gen_validate/validate/validate.pb.cc
       1.3%  1.09Mi   1.2%   129Ki bazel-out/android-arm64-v8a-dbg/bin/external/envoy_api/envoy/config/filter/network/redis_proxy/v2/redis_proxy.pb.validate.cc
       1.2%  1005Ki   1.1%   120Ki bazel-out/android-arm64-v8a-dbg/bin/external/envoy_api/envoy/data/core/v2alpha/health_check_event.pb.validate.cc
  21.5%  81.6Mi   5.7%  3.22Mi envoy server/
      54.4%  44.4Mi  35.0%  1.13Mi external/envoy/source/server/filter_chain_manager_impl.cc
      17.6%  14.3Mi  11.1%   366Ki external/envoy/source/server/listener_manager_impl.cc
       7.2%  5.89Mi  19.4%   639Ki external/envoy/source/server/http/admin.cc
       4.4%  3.57Mi  11.1%   367Ki external/envoy/source/server/server.cc
       3.0%  2.41Mi   4.6%   151Ki external/envoy/source/server/config_validation/server.cc
       2.5%  2.06Mi   4.7%   155Ki external/envoy/source/server/overload_manager_impl.cc
       1.3%  1.03Mi   0.9%  31.3Ki external/envoy/source/server/config_validation/cluster_manager.cc
       1.2%   973Ki   4.3%   143Ki external/envoy/source/server/options_impl.cc
       1.1%   940Ki   2.3%  75.5Ki external/envoy/source/server/connection_handler_impl.cc
       1.1%   917Ki   1.7%  57.0Ki external/envoy/source/server/worker_impl.cc
       0.9%   793Ki   0.1%  2.91Ki [3 Others]
       0.8%   691Ki   1.6%  51.8Ki external/envoy/source/server/guarddog_impl.cc
       0.8%   656Ki   0.7%  22.1Ki external/envoy/source/server/lds_api.cc
       0.8%   637Ki   1.1%  34.9Ki external/envoy/source/server/configuration_impl.cc
       0.6%   465Ki   0.5%  17.1Ki external/envoy/source/server/drain_manager_impl.cc
       0.5%   459Ki   0.1%  2.51Ki external/envoy/source/server/config_validation/dispatcher.cc
       0.4%   355Ki   0.2%  6.86Ki external/envoy/source/server/http/config_tracker_impl.cc
       0.4%   353Ki   0.2%  6.06Ki external/envoy/source/server/watchdog_impl.cc
       0.4%   311Ki   0.2%  7.71Ki external/envoy/source/server/config_validation/api.cc
       0.4%   298Ki   0.1%  2.33Ki external/envoy/source/server/config_validation/dns.cc
       0.3%   277Ki   0.1%  1.86Ki external/envoy/source/server/config_validation/admin.cc
   2.4%  9.31Mi   1.3%   730Ki envoy extensions/
      31.1%  2.89Mi  35.7%   260Ki external/envoy/source/extensions/filters/network/http_connection_manager/config.cc
      14.2%  1.32Mi  18.4%   134Ki external/envoy/source/extensions/transport_sockets/tls/context_config_impl.cc
      12.9%  1.20Mi  12.8%  93.2Ki external/envoy/source/extensions/filters/http/router/config.cc
      10.5%  1002Ki  11.8%  86.3Ki external/envoy/source/extensions/transport_sockets/tls/context_impl.cc
       8.0%   760Ki   7.1%  51.8Ki external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc
       6.9%   656Ki   6.1%  44.8Ki external/envoy/source/extensions/transport_sockets/raw_buffer/config.cc
       5.4%   515Ki   3.6%  26.7Ki external/envoy/source/extensions/transport_sockets/tls/context_manager_impl.cc
       4.9%   468Ki   3.5%  25.3Ki external/envoy/source/extensions/transport_sockets/tls/config.cc
       3.1%   299Ki   0.8%  5.62Ki external/envoy/source/extensions/transport_sockets/tls/utility.cc
       3.0%   284Ki   0.2%  1.67Ki external/envoy/source/extensions/access_loggers/file/file_access_log_impl.cc
   0.3%  1.25Mi   0.2%  93.7Ki envoy exe/
      80.0%  1022Ki  99.0%  92.8Ki external/envoy/source/exe/main_common.cc
      20.0%   255Ki   1.0%     920 external/envoy/source/exe/process_wide.cc
 100.0%   380Mi 100.0%  56.4Mi TOTAL
Open Questions:
  1. I can't stamp the libenvoy_jni.so file, because if I understand correctly one can only stamp binaries and not object files. However, the lack of the stamp makes it so that bloaty does not allow me to extract debug symbols from the dbg .so and map them to the opt .so

@junr03
Copy link
Member Author

junr03 commented Jun 21, 2019

@keith @Reflejo @mattklein123 @goaway let me know what you think of the progress above. Also please comment if you have other ideas at pieces of data to look at, or next steps to take. This is the first time I am running an investigation of this sort (beyond very small toy projects), so your prior expertise is very valuable.

At this point I think that using the Android data compiled against arm, (albeit we have symbol resolution only in the -c dbg output), we probably have to make some calls about larger parts of envoy we can start compiling out.

@Reflejo
Copy link
Contributor

Reflejo commented Jun 21, 2019

@junr03 A few quick observations:

  • I don't think compiling the framework is going to be valuable for this analysis. I didn't try it but I assume that static library size is in the order of 1Gb and it's not very meaningful since it has effectively no code stripping at all and all transitive dependencies

  • One thing I did try is to just get that test_binary linked and it's around 19M. One thing you can try is to pass -Wl,-dead_strip to the linker (you can get the original linker command by removing the binary and running bazel with -s); that will reduce the size to 13M (.

  • The compressed size of that stripped binary (x86_64 btw) is 3.4M

  • Didn't do any deep analysis but just quickly checking __cstring, there is a bunch of json stuff, coming from, for example here.

$ otool -l bazel-out/darwin-fastbuild/bin/library/common/test_binary.dead
Section
  sectname __cstring
   segname __TEXT
      addr 0x00000001004bf0a0
      size 0x0000000000048f45
    offset 4976800
     align 2^4 (16)
(...)
$ dd if=bazel-out/darwin-fastbuild/bin/library/common/test_binary.dead bs=1 skip=4976800 count=`rax2 0x48f45` of=whatev

this is not a big deal because deflate compression on this section will be fairly effective. But this is obviously a low hanging fruit I think there are ~300K of these (on my stripped version of the binary).

  • There are also assertions there, are we compiling this with NDEBUG?

  • On the stripped binary there is also at least 1MB of exception tables. I don't know how many of them are exceptions we recover from vs we terminate. So we might investigate adding the nothrow specifier on methods so the except table is not created (not sure how we'll do this in an elegant way only for mobile, nor how we use exceptions, just throwing this here as a thought)

-__const is also fairly big (~800K) quickly checking this I think protos are the biggest offenders here (you can also check this by the same otool -l / dd for that section)

I will try to do a deeper analysis but please don't block on me. Hope this gives you more ideas where to look.

@mattklein123
Copy link
Member

Thanks @junr03 and @Reflejo. A couple of things from me:

  1. We shouldn't bother with anything other than -c opt --copt -Os. -c dbg isn't optimized. @junr03 can we make sure we only look at that from now on?
  2. Just to make sure we are speaking the same language at all times, can we report both uncompressed size after final linking and stripping, as well compressed size? That will give us a good place to start.
  3. Then yeah we can start actually looking at figuring out where size is being spent.

@junr03 my advice is to actually just schedule 30-45 minutes with @Reflejo as I think the time will be well worth it. I can sit with you also for a bit once you do that, but I mainly want to make sure we have the right tooling and outputs in place to begin measurement. I don't think it's actually worth looking at where the space is being spent until we do that. Thank you!

@mattklein123
Copy link
Member

mattklein123 commented Jun 21, 2019

(You should be able to get full symbols on top of the opt build via --copt -gdb3 also). It's possibly that the tool you are using might have to be modified in some way to get the data that we need but it should all be there. This is where I think @Reflejo can help pretty quickly. Anyway, again, let's make sure we have the baseline methodology in place before we do anything else (strip logging, etc.). This is roughly:

  1. Compile size optimized with symbols across all deps
  2. Size of binary after linking/stripping
  3. Compressed size of binary

@junr03
Copy link
Member Author

junr03 commented Jun 21, 2019

Thanks for the comments @Reflejo and @mattklein123

I do agree with you that the most representative way to look at the binary is by using the test_binary target I have described in the issue. My current trouble there is that I can only compile against x86. Further I do think that a good last signal in the process for any change we make should be seeing the delta in size between final app binary sizes (both in ios and android).

So I think the basic methodology we should use is (paraphrasing Matt's proposal and adding my own thoughts):

  1. Compile test_binary against arm64 size optimized with symbols
  2. Strip the binary, measure size
  3. Compress the binary, measure size
  4. Compile demo app (ios and android), measure size.

Then make changes and repeat, measuring deltas.

Questions for you:

  1. Do you think this is an acceptable methodology? If it is I can establish all the tooling necessary to do so.
  2. Related to 1 I do think it is important for us to be able to compile test_binary against arm64. This is the only part of our testing steps that is not done, but I can invest in that today.
  3. Both you and Matt have not mentioned measuring deltas in the final app bloat, but isn't that a final representation of any change that we make. i.e we should see how we are moving the needle on this front with changes we embark on?

Let me know what you think. Once I have this down, and can iterate efficiently we can pair for a bit to follow up on your ideas (e.g the exception idea). I can schedule some time for monday @Reflejo.

@mattklein123
Copy link
Member

Do you think this is an acceptable methodology? If it is I can establish all the tooling necessary to do so.

In general yes, but we need to also include tooling to actually see where the size is being spent. I would not worry about making any changes right now. Once we start making changes, yes, we need to definitely measure the deltas and keep track of the current size in CI so we don't regress without knowing it.

Related to 1 I do think it is important for us to be able to compile test_binary against arm64. This is the only part of our testing steps that is not done, but I can invest in that today.

Sure, this would be optimal, but I thought there was some issue here. I'm also not sure why we can't effectively just do this with the iOS/Android targets but I don't know the details. Can we time box getting this working and circle back? I do know that many people in the community are compiling Envoy for ARM, but I think they do this on an ARM host vs. cross-compile, but I'm not sure. cc @moderation.

Both you and Matt have not mentioned measuring deltas in the final app bloat, but isn't that a final representation of any change that we make. i.e we should see how we are moving the needle on this front with changes we embark on?

Yes, but let's not worry about changes right now. Let's get the methodology down on how to compile and measure what we have, and present a tabular view of where the size is being spent.

@junr03
Copy link
Member Author

junr03 commented Jun 21, 2019

need to also include tooling

Right now the tool that has given me most information is bloaty. However, part of the time I will setup with @Reflejo is to learn what his toolkit is for investigations like this. I will document so that it is repeatable in the future.

Can we time box getting this working and circle back?

Yep, that is what I am doing right now. Won't extend further than this afternoon. I chatted with @moderation yesterday and they are compiling on native hosts. I chatted with @keith about cross compiling via bazel on x86 and he advised that was a rough path. The last path I am exploring today is Docker, which arguable supports multi-arch out of the box.

@mattklein123
Copy link
Member

If it really comes down to it just buy a Raspberry Pi or similar to do the investigation on, but it seems like that shouldn't be needed.

@mattklein123
Copy link
Member

Also, I don't know what bloaty is, but all of the data should be available via objdump and related tools, and can be scripted as needed.

@moderation
Copy link

Hopefully @junr03 investigation into cross compilation pays off but if not I'm using a Pine64 RockPro64 SOC - https://www.pine64.org/rockpro64/. 4G board is $80. You'll want the big heat sink. Has enough memory and cores to compile stuff like Bazel, Envoy, CockroachDB etc.

@costinm
Copy link

costinm commented Jun 22, 2019

Not sure if it helps - the binary I built about 1 year ago - https://github.com/costinm/istio-build/releases - is 5M for android/arm (stripped). Maybe we can compare.

Looking at the arm build for android - it is a bit smaller (#153)

@kastiglione
Copy link
Contributor

kastiglione commented Jun 24, 2019

envoy_cc_library() declares alwayslink = 1:

https://github.com/envoyproxy/envoy/blob/ad57ed8511b636869afb3eef3c21b52890d71890/bazel/envoy_library.bzl#L64

I'm not sure how many of the libraries truly need to be force loaded, but this blocks the -dead_strip optimization of ld64 (iOS). Rerunning the link command with all -force_load flags removed, except libenvoy_main_interface_lib, reduced the test_binary size by 2mb (down to ~10mb from ~12mb, both using -dead_strip). So there's an upper limit of 2mb of reduction by using alwayslink only where needed.

junr03 added a commit that referenced this issue Jun 27, 2019
Signed-off-by: Jose Nino jnino@lyft.com

Description: This PR concludes binary size investigation slated for issue #17. The three deliverables of this PR are:

1. Developer documentation that solidifies the building and analysis platform used for binary size analysis.
2. A list of issues for next steps in binary size reduction under the perf/size label.
3. A final baseline size for the binary:

As of https://github.com/lyft/envoy-mobile/tree/f17caebcfce09ec5dcda905dc8418fea4d382da7
The test_binary_size_size as built by the toolchain against the architecture described (arm64 with clang and lld)
compiles to a stripped size of 8.9mb and a compressed size of 3mb.

Additionally #181 will add CI jobs to add size regression analysis on every PR.

Risk Level: low - add new bazel target and docs.
Docs Changes: added developer documentation.
Fixes #17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants