Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Breakpoints never hit in native code when using LLD #885

Closed
steveprentice opened this issue Jan 10, 2019 · 19 comments
Closed

[BUG] Breakpoints never hit in native code when using LLD #885

steveprentice opened this issue Jan 10, 2019 · 19 comments
Assignees
Labels
Milestone

Comments

@steveprentice
Copy link

Description

I wasn't sure if this should be submitted here or against Android Studio, but decided to start here since the NDK changelog is asking for people to start testing with LLD. Let me know if you need me to resubmit against Android Studio.

  1. Create new Native C++ project.
  2. Modify CMakeLists.txt to use LLD.
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -fuse-ld=lld")
  1. Set breakpoint inside Java_com_sample_lld_MainActivity_stringFromJNI()
  2. Debug application. Breakpoint should trigger but doesn't.

Reverting back to the gold linker solves the issue.

Sample app: https://github.com/steveprentice/lld-breakpoint

Environment Details

Android Studio 3.3 RC3
NDK r18b, r19 beta2 and latest canary 5221128.
macOS

@DanAlbert
Copy link
Member

@stephenhines: how do you want to handle lldb bugs?

@stephenhines
Copy link
Collaborator

This is an LLD bug, right? Or am I misunderstanding things?

@enh
Copy link
Contributor

enh commented Jan 10, 2019

until someone tries with gdb i'm not sure we can tell, since the submitter is using lld and lldb. (the fact that lldb and gold work does suggest it might be lld rather than lldb that's at fault though.)

@DanAlbert
Copy link
Member

Could also be lld doing something valid but different from gold that lldb doesn't expect. Fortunately it's all for @stephenhines either way, so same question :)

@chih-hung
Copy link

Did this happen on macOS?
Can it be reproduced on Linux or Windows host?
It would be the best for me to try with Linux command line.

@steveprentice
Copy link
Author

steveprentice commented Jan 14, 2019

@chih-hung:

Environment Details

Android Studio 3.3 RC3
NDK r18b, r19 beta2 and latest canary 5221128.
macOS

I have not tried with other environments.

@myliojordank
Copy link

@chih-hung:
Also looks to be happening for me on Windows:
Android Studio 3.3.2
NDK r19b
Windows 10 64-bit

@steveprentice
Copy link
Author

@chih-hung, this is still an issue with NDK 20. The NDK release notes are encouraging LLD usage / testing, but this issue prevents broader adoption beyond. Any update on progress or potential workarounds?

@DanAlbert DanAlbert added this to the external dependency milestone Jul 2, 2019
@DanAlbert
Copy link
Member

Need to figure out whether this is an lldb or lld issue. Using ndk-gdb with Studio is a hassle, but you can set up symlinks to make it happy. We have an example of this here: https://android.googlesource.com/platform/ndk/+/refs/heads/master/samples/NdkGdbSample/.

@stephenhines @chih-hung: could one of you give that a shot? Hopefully you can just switch that sample to lld and repro the issue with lldb, and then switch to ndk-gdb to confirm. If not, can start copying code from the repro case until it repros.

(@steveprentice if that's something you have time to verify we'd greatly appreciate it, otherwise we'll have a look when we have the time)

@rprichard
Copy link
Collaborator

I was able to confirm it. When I use -fuse-ld=bfd, Android Studio's image list reports using an unstripped intermediate solib, but when I use -fuse-ld=lld, image list reports a stripped solib pointing into the lldb module cache.

i.e. bfd:

...
[223] D373CEDC-B135-E0C5-8D45-B5E1E50206AE                    /usr/local/google/home/rprichard/.lldb/module_cache/remote-android/.cache/D373CEDC-B135-E0C5-8D45-B5E1E50206AE/libllvm-glnext.so 
[224] C123C627-0000-0000-0000-000000000000                    /usr/local/google/home/rprichard/.lldb/module_cache/remote-android/.cache/C123C627-0000-0000-0000-000000000000/libGLESv1_CM_adreno.so 
[225] 9C2B97D1-4D15-8EA9-3023-358531C97125-859F365A                    /usr/local/google/home/rprichard/AndroidStudioProjects/TestLldAndLldb/app/build/intermediates/cmake/debug/obj/arm64-v8a/libnative-lib.so 
[226] 9086DEA9-0000-0000-0000-000000000000                    /usr/local/google/home/rprichard/.lldb/module_cache/remote-android/.cache/9086DEA9-0000-0000-0000-000000000000/eglSubDriverAndroid.so 

lld:

...
[223] D373CEDC-B135-E0C5-8D45-B5E1E50206AE                    /usr/local/google/home/rprichard/.lldb/module_cache/remote-android/.cache/D373CEDC-B135-E0C5-8D45-B5E1E50206AE/libllvm-glnext.so 
[224] 85BC840D-0000-0000-0000-000000000000                    /usr/local/google/home/rprichard/.lldb/module_cache/remote-android/.cache/85BC840D-0000-0000-0000-000000000000/libnative-lib.so 
[225] C123C627-0000-0000-0000-000000000000                    /usr/local/google/home/rprichard/.lldb/module_cache/remote-android/.cache/C123C627-0000-0000-0000-000000000000/libGLESv1_CM_adreno.so 
[226] 9086DEA9-0000-0000-0000-000000000000                    /usr/local/google/home/rprichard/.lldb/module_cache/remote-android/.cache/9086DEA9-0000-0000-0000-000000000000/eglSubDriverAndroid.so 
...

The intermediate solib (app/build/intermediates/cmake/debug/obj/arm64-v8a/libnative-lib.so) has different build IDs with bfd versus lld:

$ file use-bfd.so 
use-bfd.so: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, BuildID[sha1]=9c2b97d14d158ea93023358531c97125859f365a, with debug_info, not stripped

$ file use-lld.so 
use-lld.so: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, BuildID[xxHash]=086ed6b255edf06d, with debug_info, not stripped

The file in the lldb module cache is stripped:

$ file /usr/local/google/home/rprichard/.lldb/module_cache/remote-android/.cache/85BC840D-0000-0000-0000-000000000000/libnative-lib.so 
/usr/local/google/home/rprichard/.lldb/module_cache/remote-android/.cache/85BC840D-0000-0000-0000-000000000000/libnative-lib.so: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, BuildID[xxHash]=086ed6b255edf06d, stripped

The lld build ID is shorter than the bfd build ID, and there's a disagreement. file reports 086ed6b255edf06d, but lldb is using 85BC840D. With bfd, the build ID reported by file and lldb match.

@rprichard
Copy link
Collaborator

The lldb in Studio appears to accept only 16- or 20-byte build IDs. lld defaults to an 8-byte hash as its build ID (https://reviews.llvm.org/D18091). When the 8-byte ID was added to lld, lldb was also patched to accept anything between 4 and 20 bytes (https://reviews.llvm.org/D18096). (Studio's lldb is a little out of date...)

I'm able to get lld debugging working by adding -Wl,--build-id=sha1 to the ldflags. Using NDK r20, I added two lines near the beginning of my CMakeLists.txt:

cmake_minimum_required(VERSION 3.4.1)

# -fno-experimental-isel works around https://github.com/android-ndk/ndk/issues/1004 in NDK r20.
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fno-experimental-isel")
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -fuse-ld=lld -Wl,--build-id=sha1")

About that linker argument: It looks like only the last --build-id[=type] matters, so a plain --build-id overrides an earlier explicit setting. --build-id=none turns the ID off. bfd, gold, and lld all recognize md5 and sha1 types. lld defaults to fast, its 8-byte hash. bfd defaults to sha1. gold defaults to tree, which is like sha1 but I think computes a hash-of-hashes in parallel. lld recognizes tree as an alias for sha1 (https://reviews.llvm.org/rL287119), because it always computes a hash-of-hashes (https://reviews.llvm.org/D26199).

I think we can replace the -Wl,--build-id in the NDK build scripts with -Wl,--build-id=sha1 until Studio has a newer lldb. I'll work on a CL.

@DanAlbert DanAlbert modified the milestones: external dependency, r21 Jul 3, 2019
@DanAlbert
Copy link
Member

Awesome, that means we can fix this for r21 and people can fix this up on r20 on their own if they need to.

We'll still want an updated lldb in Studio, but this at least unblocks lld.

@DanAlbert DanAlbert assigned rprichard and unassigned chih-hung Jul 3, 2019
@DanAlbert
Copy link
Member

(also, does gdb support the 8 byte hash, or will we need a similar fix for gdb?)

@rprichard
Copy link
Collaborator

Switching -Wl,--build-id to -Wl,--build-id=sha1 will slow the gold linker somewhat, because gold defaults to tree. Switching the NDK to -Wl,--build-id=tree instead avoids that problem but bfd warns about the unrecognized option. I measured gold link-time on a trivial 100MB executable:

char blob[100000000] = { 1 };
int main() {}

Using a Windows machine with 1000 iterations:

===> multitime results
1: /c/src/android-ndk-r20/toolchains/llvm/prebuilt/windows-x86_64/bin/aarch64-linux-android21-clang test.o -fuse-ld=gold -Wl,--build-id=sha1
            Mean                Std.Dev.    Min         Median      Max
real        0.801+/-0.0114      0.140       0.693       0.770       3.162
user        0.020+/-0.0014      0.017       0.000       0.015       0.091
sys         0.120+/-0.0023      0.028       0.046       0.122       0.217
===> multitime results
1: /c/src/android-ndk-r20/toolchains/llvm/prebuilt/windows-x86_64/bin/aarch64-linux-android21-clang test.o -fuse-ld=gold -Wl,--build-id=tree
            Mean                Std.Dev.    Min         Median      Max
real        0.694+/-0.0094      0.115       0.608       0.677       2.999
user        0.019+/-0.0013      0.016       0.000       0.015       0.076
sys         0.122+/-0.0023      0.028       0.030       0.123       0.232
===> multitime results
1: /c/src/android-ndk-r20/toolchains/llvm/prebuilt/windows-x86_64/bin/aarch64-linux-android21-clang test.o -fuse-ld=gold -Wl,--build-id=md5
            Mean                Std.Dev.    Min         Median      Max
real        0.668+/-0.0068      0.084       0.589       0.651       2.007
user        0.019+/-0.0013      0.016       0.000       0.015       0.092
sys         0.116+/-0.0022      0.027       0.030       0.108       0.201

Switching from tree to sha1 on this program increased link-time from 694ms to 801ms (+15%). I think a more realistic program would take longer to link (e.g. relocations), so the cost is probably more like 100ms/100MB in the output file. My guess is that the build ID calculation includes debug info.

I'm guessing this isn't enough of a regression to be a concern, but if it is, we could:

  • Use --build-id=sha1 for arm64 and --build-id=tree otherwise.
  • Search for -fuse-ld=lld and pass --build-id=sha1 just for it. We do that for Windows (to turn off threading), but that's a hack that should be going away soon.

@rprichard
Copy link
Collaborator

Running the same test on my Linux workstation shows a regression of 300ms to 377ms (+26%).

@rprichard
Copy link
Collaborator

Search for -fuse-ld=lld and pass --build-id=sha1 just for it. We do that for Windows (to turn off threading), but that's a hack that should be going away soon.

We only do that for ndk-build, though, not CMake, so that's probably not an option.

@DanAlbert
Copy link
Member

We only do that for ndk-build, though, not CMake, so that's probably not an option.

(in case you were wondering why, we can't do it reliably for CMake because our only chance to do anything in CMake is in the toolchain file, so we can't see anything that the user does in their CMakeLists.txt)

@rprichard
Copy link
Collaborator

(also, does gdb support the 8 byte hash, or will we need a similar fix for gdb?)

I verified that gdb 7.11 can find a library using its build ID if the build ID is 4, 8, or 16 bytes. AFAICT, gdb handles any non-zero build ID size. To test this, I copied an unstripped library to $OUT/symbols/.build-id/xx/yy..yy.debug file where xxyy..yy is the build ID in hex, then applied this gdbclient.py patch to make gdb download stripped libraries from the target:

diff --git a/scripts/gdbclient.py b/scripts/gdbclient.py
index e3a050330..f09d3b9a9 100755
--- a/scripts/gdbclient.py
+++ b/scripts/gdbclient.py
@@ -268,8 +268,9 @@ def generate_gdb_script(root, sysroot, binary_name, port, dalvik_gdb_script, sol
     gdb_commands = ""
     gdb_commands += "file '{}'\n".format(binary_name)
     gdb_commands += "directory '{}'\n".format(root)
-    gdb_commands += "set solib-absolute-prefix {}\n".format(sysroot)
-    gdb_commands += "set solib-search-path {}\n".format(solib_search_path)
+    gdb_commands += "set debug-file-directory {}\n".format(sysroot)
+    #gdb_commands += "set solib-absolute-prefix {}\n".format(sysroot)
+    #gdb_commands += "set solib-search-path {}\n".format(solib_search_path)
     if dalvik_gdb_script:
         gdb_commands += "source {}\n".format(dalvik_gdb_script)
 

FWIW, the platform defaults to 16-byte IDs (--build-id=md5).

@rprichard
Copy link
Collaborator

The fix is in aosp/master and should go into NDK r21:
https://android-review.googlesource.com/c/platform/ndk/+/1011392

Once Android Studio's LLDB is updated, I think we'll revert this change and let the linker pick its own build ID hash method. e.g. lld would compute an 8-byte xxHash ID.

We should probably also enable --build-id by default in the Clang driver when targeting Android.

disigma pushed a commit to wimal-build/ndk that referenced this issue Jul 9, 2019
lld defaults to an 8-byte hash (--build-id=fast), but Android Studio's
current version of lldb only recognizes 16- and 20-byte hashes
(--build-id=[md5,sha1,tree,uuid]). Specify --build-id=sha1 to explicitly
use a 20-byte hash for the build ID.

This setting can be removed once lldb is updated.

Bug: android/ndk#885
Test: build and debug NDK app in Android Studio
Test: run `file` on NDK libnative-lib.so
Change-Id: I2e5f2d37817c7855bd704eb05241eba4c1d02a52
IanusInferus added a commit to IanusInferus/typemake that referenced this issue Sep 11, 2019
disigma pushed a commit to wimal-build/ndk that referenced this issue Nov 8, 2019
Using sha1 instead of tree turned out to have a larger build impact
than the benchmarking suggested (10% regression in dolphin build
time).

For ndk-build, scan all the linker flags for -fuse-ld and check
APP_LD to determine if lld is being used, and if it is then fallback
to tree. Otherwise, use the linker default.

For CMake we don't know the user's chosen ldflags, so the best we can
do is work off of ANDROID_LD. If the user doesn't use ANDROID_LD and
instead uses -fuse-ld=lld explicitly, they won't be able to debug.

Test: inspected verbose build output for the following configurations:
    * No flags
    * APP_LD=lld
    * APP_LD=gold
    * APP_LD=lld APP_LDFLAGS=-fuse-ld=gold
    * APP_LD=gold APP_LDFLAGS=-fuse-ld=lld
    * For the first three variants, checked CMake with ANDROID_LD.
Bug: http://b/143411084
Bug: android/ndk#885

Change-Id: I62a7b6a296a61bc543c38d406bfb9f15ddb1c7fa
disigma pushed a commit to wimal-build/ndk that referenced this issue Nov 8, 2019
Using sha1 instead of tree turned out to have a larger build impact
than the benchmarking suggested (10% regression in dolphin build
time).

For ndk-build, scan all the linker flags for -fuse-ld and check
APP_LD to determine if lld is being used, and if it is then fallback
to tree. Otherwise, use the linker default.

For CMake we don't know the user's chosen ldflags, so the best we can
do is work off of ANDROID_LD. If the user doesn't use ANDROID_LD and
instead uses -fuse-ld=lld explicitly, they won't be able to debug.

Test: inspected verbose build output for the following configurations:
    * No flags
    * APP_LD=lld
    * APP_LD=gold
    * APP_LD=lld APP_LDFLAGS=-fuse-ld=gold
    * APP_LD=gold APP_LDFLAGS=-fuse-ld=lld
    * For the first three variants, checked CMake with ANDROID_LD.
Bug: http://b/143411084
Bug: android/ndk#885

Change-Id: I62a7b6a296a61bc543c38d406bfb9f15ddb1c7fa
(cherry picked from commit 8ebd5e0)
disigma pushed a commit to wimal-build/ndk that referenced this issue Nov 8, 2019
Using sha1 instead of tree turned out to have a larger build impact
than the benchmarking suggested (10% regression in dolphin build
time).

For ndk-build, scan all the linker flags for -fuse-ld and check
APP_LD to determine if lld is being used, and if it is then fallback
to tree. Otherwise, use the linker default.

For CMake we don't know the user's chosen ldflags, so the best we can
do is work off of ANDROID_LD. If the user doesn't use ANDROID_LD and
instead uses -fuse-ld=lld explicitly, they won't be able to debug.

Test: inspected verbose build output for the following configurations:
    * No flags
    * APP_LD=lld
    * APP_LD=gold
    * APP_LD=lld APP_LDFLAGS=-fuse-ld=gold
    * APP_LD=gold APP_LDFLAGS=-fuse-ld=lld
    * For the first three variants, checked CMake with ANDROID_LD.
Bug: http://b/143411084
Bug: android/ndk#885

Change-Id: I62a7b6a296a61bc543c38d406bfb9f15ddb1c7fa
(cherry picked from commit 8ebd5e0)
disigma pushed a commit to wimal-build/ndk that referenced this issue Nov 9, 2019
Using sha1 instead of tree turned out to have a larger build impact
than the benchmarking suggested (10% regression in dolphin build
time).

For ndk-build, scan all the linker flags for -fuse-ld and check
APP_LD to determine if lld is being used, and if it is then fallback
to tree. Otherwise, use the linker default.

For CMake we don't know the user's chosen ldflags, so the best we can
do is work off of ANDROID_LD. If the user doesn't use ANDROID_LD and
instead uses -fuse-ld=lld explicitly, they won't be able to debug.

Test: inspected verbose build output for the following configurations:
    * No flags
    * APP_LD=lld
    * APP_LD=gold
    * APP_LD=lld APP_LDFLAGS=-fuse-ld=gold
    * APP_LD=gold APP_LDFLAGS=-fuse-ld=lld
    * For the first three variants, checked CMake with ANDROID_LD.
Bug: http://b/143411084
Bug: android/ndk#885

Change-Id: I62a7b6a296a61bc543c38d406bfb9f15ddb1c7fa
(cherry picked from commit 8ebd5e0)
IanusInferus added a commit to IanusInferus/typemake that referenced this issue Nov 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants