Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug rcdailey's zlib unwind issues #457

Closed
DanAlbert opened this issue Jul 14, 2017 · 74 comments
Closed

debug rcdailey's zlib unwind issues #457

DanAlbert opened this issue Jul 14, 2017 · 74 comments
Assignees

Comments

@DanAlbert
Copy link
Member

@rcdailey

Forking from #230 (comment)

Not entirely clear what's going on just yet. It certainly looks like the unwinders are crossing the streams, but that shouldn't be happening with anything built with a modern NDK.

Could you check and make sure that all the _Unwind_* symbols in your libraries are hidden?

@DanAlbert DanAlbert self-assigned this Jul 14, 2017
@rcdailey
Copy link

What indicates if those symbols are "hidden"? I can grep all my libs for that keyword... but not sure what to look for besides that.

@DanAlbert
Copy link
Member Author

readelf -sW yourlib.so | grep _Unwind will tell you. readelf is included in the NDK under toolchains/arm-linux-androideabi-4.9/prebuilt/$OS/bin/arm-linux-androideabi-readelf (it's actually a cross-arch executable, there's just one per binutils install) in case you don't have it or are on Windows or something.

Here are some examples of what various symbols should look like:

extern "C" {
__attribute__((visibility("hidden"))) void myfunc_hidden() {}
static void myfunc_static() {}
extern void myfunc_extern();
void myfunc_public() { myfunc_extern(); }
}
$ readelf -sW foo/libs/armeabi-v7a/libfoo.so | grep myfunc_
     3: 000006ab     4 FUNC    GLOBAL DEFAULT   11 myfunc_public
     4: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND myfunc_extern

The column that says DEFAULT might also say HIDDEN, or the column that says GLOBAL might also say LOCAL. HIDDEN, LOCAL, or just not at all present are all forms of hidden.

@alexcohn
Copy link

@rcdailey Please check: your APK should not contain libzlib.so packed in any form. NDK build will use the system version of the library. The binary that is packed with NDK is intended to be used as a link stub only, I doubt if it has ever been tested.

Now, if the system version is bad (which happens on private ROMs), there is an easy workaround. You can add the official sources to your project and use your version of zlib. The size of the library is quite small, and it uses no private APIs that could make the system version better.

Actually, there are no reasons why zlib should be packaged in NDK, except historical.

@rcdailey
Copy link

rcdailey commented Jul 15, 2017

@alexcohn I am in fact packaging libz.so into my APK. I copy it from the NDK. It just makes sense to me to package the exact binary I linked against. I'll try without it though...

Do I still need to load the libz.so library in Java even if I don't package it?

EDIT: I also package libc++_shared.so with my APK (I copy that from the NDK as well). I assume that this is is still required? I don't recall seeing STL libraries on the device.

@alexcohn
Copy link

Well, this pretty much explains the problem that you encountered. Just like you don't package libc.so or lbm.so with your APK, libz.so is part of the public NDK libraries list.

I agree that the distinction between libz.so and libc++_shared.so is not clear for users of NDK. In a parallel thread I have even suggested to provide a mechanism of sharing libc++_shared.so across applications. It's only under 1 MB, but still the benefit will also be that any fixes (including security fixes) will be available to everybody.

But as of today, you should package libc++_shared.so and should not package libz.so.

@stephenhines
Copy link
Collaborator

@alexcohn: There really is no way to "share" libc++_shared.so though. It is intentionally meant to be unstable, so that the implementation of it can be improved over time (and also ABI bugs fixed, etc.). The real trouble here is that C++ was never intended to be used for dynamic shared libraries. In the C++ model, everything is meant to be compiled at the same time, with the same components, rather than having a path to do partial upgrades. It is great that partial updates can work for the vast majority of developer scenarios, but upgrades of the STL really will never be possible based on the way that the C++ standard works today.

@rcdailey
Copy link

Thanks @alexcohn. Unfortunately I won't be able to retest my scenario until Monday when I'm in the office. However I'll let you know the results then!

@alexcohn
Copy link

@stephenhines: Did I ever say sharing libc++ is easy? But never say never…

@rcdailey
Copy link

@DanAlbert: I ran the readelf command and here is the output:

$ "E:\android\ndk_72\toolchains\arm-linux-androideabi-4.9\prebuilt\windows-x86_64\bin\arm-linux-androideabi-readelf.exe" -sW libzApp.so | grep _Unwind
     4: 00000000     0 FUNC    GLOBAL DEFAULT  UND _Unwind_Resume
2596963: 00000000     0 FUNC    GLOBAL DEFAULT  UND _Unwind_Resume

There are 2 and neither are hidden based on your description. If this is problematic, what is the next step? Should I run readelf on my static libs too or something, to narrow it down to a specific third party library?

@rcdailey
Copy link

Also I removed libz.so from my APK and I'm loading it from /system/lib now on device, but I still get the segfault:

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'Android/ziosk/ziosk:4.2.2/JDQ39E/dev.bsp.BSP-6.3.3.14.1706091617:eng/test-keys'
Revision: '6'
pid: 4945, tid: 4958, name: ttm.zPayService  >>> com.ttm.zPayService <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 00000016
    r0 00000002  r1 5ee488d4  r2 00000000  r3 00000000
    r4 00000002  r5 5ee4892c  r6 5bd7104c  r7 5ee48a00
    r8 5bd71050  r9 5bd71054  sl 5bd71060  fp 00000000
    ip 5ec31a94  sp 5ee488c0  lr 401ea89c  pc 401ea604  cpsr 80000050
    d0  0000000000000000  d1  0000000000000000
    d2  0000000000000000  d3  0000000000000000
    d4  6620535043206568  d5  656d79617020726f
    d6  000000127420746e  d7  40322189374bc6a8
    d8  0000000000000000  d9  0000000000000000
    d10 0000000000000000  d11 0000000000000000
    d12 0000000000000000  d13 0000000000000000
    d14 0000000000000000  d15 0000000000000000
    d16 697a2f617461642f  d17 746e6f632f6b736f
    d18 6977206e69676562  d19 6873616c73206874
    d20 746f6e20646e6120  d21 74697720646e6520
    d22 49202e656e6f2068  d23 646e657070612073
    d24 0067006700670067  d25 0067006700670067
    d26 0067006700670067  d27 0067006700670067
    d28 0100010001000100  d29 0100010001000100
    d30 0000000100000001  d31 0000000100000001
    scr 60000010

backtrace:
    #00  pc 00010604  /system/lib/libz.so (__gnu_Unwind_Resume+8)
    #01  pc 00010898  /system/lib/libz.so (_Unwind_Resume+20)

stack:
         5ee48880  5ee48c34  
         5ee48884  00000000  
         5ee48888  400588c8  
         5ee4888c  5ee488e4  
         5ee48890  5ee488c0  
         5ee48894  5e96fa2f  /data/app-lib/com.ttm.zapp-1/libzPayService.so (boost::shared_ptr<boost::filesystem::filesystem_error::m_imp>::shared_ptr<boost::filesystem::filesystem_error::m_imp>(boost::filesystem::filesystem_error::m_imp*)+54)
         5ee48898  5ee48c34  
         5ee4889c  5ee48c34  
         5ee488a0  5ee48c34  
         5ee488a4  59de38d8  
         5ee488a8  5ee48c34  
         5ee488ac  5ee48c34  
         5ee488b0  5ee48c34  
         5ee488b4  5ee48970  
         5ee488b8  df0027ad  
         5ee488bc  00000000  
    #00  5ee488c0  00000000  
         5ee488c4  40058850  
         5ee488c8  5ee4892c  
         5ee488cc  401ea89c  /system/lib/libz.so (_Unwind_Resume+24)
    #01  5ee488d0  00000000  
         5ee488d4  00000000  
         5ee488d8  00000002  
         5ee488dc  5ee48970  
         5ee488e0  00000000  
         5ee488e4  5e96b80b  /data/app-lib/com.ttm.zapp-1/libzPayService.so ((anonymous namespace)::error(int, boost::filesystem::path const&, boost::system::error_code*, char const*)+222)
         5ee488e8  40058850  
         5ee488ec  5ee4892c  
         5ee488f0  5bd7104c  
         5ee488f4  5ee48a00  
         5ee488f8  5bd71050  
         5ee488fc  5bd71054  
         5ee48900  5bd71060  
         5ee48904  00000000  
         5ee48908  5ec31a94  /data/app-lib/com.ttm.zapp-1/libzPayService.so
         5ee4890c  5ee48918  

It's consistently coming from boost::filesystem but not sure why... never had this problem when I was using GNU STL + Clang.

@rcdailey
Copy link

So I did a grep of all my prebuilt *.a and *.so files for armeabi-v7a. The last time these were built it was with GNU STL + GCC, I am currently using LLVM STL + Clang. I did not rebuild libraries that did not yield linker errors, since I assumed that would catch any ABI issues. But maybe there's more to it?

Some libs that are showing _Unwind_Resume in their binary data resulting from the grep:

  • libMagickCore-7.so
  • libMagickWand-7.so
  • libPowerVR.so
  • libogles2tools.a

If they show up here, would it cause problems? Do I need to rebuild them even if there are no linker issues?

@alexcohn
Copy link

Maybe start with the filesystem_error you are receiving from boost? What is the root cause of it?

@DanAlbert
Copy link
Member Author

I am in fact packaging libz.so into my APK. I copy it from the NDK. It just makes sense to me to package the exact binary I linked against.

Just like you don't package libc.so or lbm.so with your APK, libz.so is part of the public NDK libraries list.

Exactly. The libz.so in the NDK is just a stub. Every function in it is void foo() {}. We maintain ABI compatibility for NDK libraries in the system, which is why its safe to do this.

EDIT: I also package libc++_shared.so with my APK (I copy that from the NDK as well). I assume that this is is still required? I don't recall seeing STL libraries on the device.

Yeah, this is correct. The C++ STLs are not ABI stable, so we ship real libraries in the NDK for you to package in your app.

I agree that the distinction between libz.so and libc++_shared.so is not clear for users of NDK.

You can determine which category a library falls into based on its location in the NDK. Anything in platforms/ or sysroot/ is a stub. If it's outside that directory (STLs are in sources/cxx-stl/...), it's a real library and should be shipped with your app.

I think I'm going to be rewriting the C++ Libraries doc soon to account for the fact that we're advocating for people to switch to libc++ starting with r16. I'll make sure I mention this.

So I did a grep of all my prebuilt *.a and *.so files for armeabi-v7a. The last time these were built it was with GNU STL + GCC, I am currently using LLVM STL + Clang.

This is definitely part of the problem. You can't reliably mix STLs in the same app (there are some ways to do it that aren't exactly correct if you're very careful, but it's best to just avoid it). Should also be noted that you'll need to use the shared version of the STL (you mention above that you're using c++_shared, so you're fine, but just a note that you shouldn't switch to c++_static). This is the case whenever you have more than one shared library (the actual conditions are a bit more complex, but that's a good rule of thumb).

Any chance those libraries were also built with an old version of the NDK? There were definitely some problems with the way we linked the unwinder until r12 or r13.

If rebuilding the world still has those unwind symbols left undefined or public, then you're probably being hit by #379.

I did not rebuild libraries that did not yield linker errors, since I assumed that would catch any ABI issues. But maybe there's more to it?

The linker can't catch everything, sadly.

struct mystruct {
    std::string s;
}

void foo(const mystruct&);

If something like the above is used in a library, everything will still link fine even if the two std::string have different mangled names. If the library was built with gnustl and the caller was built with libc++, the std::string will have different layout at each end of the call, and this will lead to some very strange bugs.

@rcdailey
Copy link

Thanks for all the great feedback so far. I think I'm going to head down the road of just making sure all our third party libs build together with my normal targets. However, as I'm working towards that goal, I run into new issues each time...

I have ImageMagick building with r15b using the same toolchain settings as my normal targets, and loading that shared lib now says it can't find "floor":

D ZActivity: Activity onCreate
D TTMApplication: Loading library: c
D dalvikvm: No JNI_OnLoad found in /system/lib/libc.so 0x41714cf8, skipping init
D TTMApplication: Loading library: c++_shared
D dalvikvm: Trying to load lib /data/app-lib/com.ttm.zapp-1/libc++_shared.so 0x41714cf8
D dalvikvm: Added shared lib /data/app-lib/com.ttm.zapp-1/libc++_shared.so 0x41714cf8
D dalvikvm: No JNI_OnLoad found in /data/app-lib/com.ttm.zapp-1/libc++_shared.so 0x41714cf8, skipping init
D TTMApplication: Loading library: z
D dalvikvm: No JNI_OnLoad found in /system/lib/libz.so 0x41714cf8, skipping init
D TTMApplication: Loading library: MagickCore
D dalvikvm: Trying to load lib /data/app-lib/com.ttm.zapp-1/libMagickCore.so 0x41714cf8
E dalvikvm: dlopen("/data/app-lib/com.ttm.zapp-1/libMagickCore.so") failed: Cannot load library: soinfo_relocate(linker.cpp:975): cannot locate symbol "floor" referenced by "libMagickCore.so"...
D AndroidRuntime: Shutting down VM
W dalvikvm: threadid=1: thread exiting with uncaught exception (group=0x4120f930)
E AndroidRuntime: FATAL EXCEPTION: main
E AndroidRuntime: java.lang.UnsatisfiedLinkError: Cannot load library: soinfo_relocate(linker.cpp:975): cannot locate symbol "floor" referenced by "libMagickCore.so"...

I tried explicitly doing loadLibrary("c") in Java to make libc.so load (not sure if this is necessary; does libc.so get loaded automatically somewhere?) but that didn't fix the issue with it not finding floor.

Any reason for this? Floor is provided by the C library so it should be finding it... I don't understand.

@enh
Copy link
Contributor

enh commented Jul 18, 2017

floor (like most of <math.h>) is provided by libm, not libc.

@rcdailey
Copy link

Ok so that means I do need to explicitly load libc and libm (which I assume is also under /system/lib). No standard system libraries seem to be loaded automatically for me, and I must do it through java?

@rcdailey
Copy link

Looks like loading libm doesn't fix it...

D TTMApplication: Loading library: c
D dalvikvm: No JNI_OnLoad found in /system/lib/libc.so 0x41710888, skipping init
D TTMApplication: Loading library: m
D dalvikvm: No JNI_OnLoad found in /system/lib/libm.so 0x41710888, skipping init
D TTMApplication: Loading library: c++_shared
D dalvikvm: Trying to load lib /data/app-lib/com.ttm.zapp-2/libc++_shared.so 0x41710888
D dalvikvm: Added shared lib /data/app-lib/com.ttm.zapp-2/libc++_shared.so 0x41710888
D dalvikvm: No JNI_OnLoad found in /data/app-lib/com.ttm.zapp-2/libc++_shared.so 0x41710888, skipping init
D TTMApplication: Loading library: z
D dalvikvm: No JNI_OnLoad found in /system/lib/libz.so 0x41710888, skipping init
D TTMApplication: Loading library: MagickCore
D dalvikvm: Trying to load lib /data/app-lib/com.ttm.zapp-2/libMagickCore.so 0x41710888
E dalvikvm: dlopen("/data/app-lib/com.ttm.zapp-2/libMagickCore.so") failed: Cannot load library: soinfo_relocate(linker.cpp:975): cannot locate symbol "floor" referenced by "libMagickCore.so"...
D AndroidRuntime: Shutting down VM
W dalvikvm: threadid=1: thread exiting with uncaught exception (group=0x4120f930)
E AndroidRuntime: FATAL EXCEPTION: main
E AndroidRuntime: java.lang.UnsatisfiedLinkError: Cannot load library: soinfo_relocate(linker.cpp:975): cannot locate symbol "floor" referenced by "libMagickCore.so"...

@enh
Copy link
Contributor

enh commented Jul 18, 2017

both libc and libm will already have been loaded by the zygote. what OS release and architecture is this on? did you pull the libm off the device and check it actually has a floor symbol?

@rcdailey
Copy link

@enh

$ "E:\android\android-ndk-r15b\toolchains\arm-linux-androideabi-4.9\prebuilt\windows-x86_64\bin\arm-linux-androideabi-readelf.exe" -sW libm.so | grep floor
    26: 0000e358   352 FUNC    GLOBAL DEFAULT    7 floor
    80: 0000e4b8   196 FUNC    GLOBAL DEFAULT    7 floorf
   119: 0000e580   488 FUNC    GLOBAL DEFAULT    7 floorl

@DanAlbert
Copy link
Member Author

That's not the real libm though. adb pull /system/lib/libm.so and check that one.

@rcdailey
Copy link

@DanAlbert That's what I did, I used filezilla to copy it over but it's the same one. Sorry for the confusion.

@enh
Copy link
Contributor

enh commented Jul 18, 2017

what version of Android is this?

@rcdailey
Copy link

rcdailey commented Jul 18, 2017

Jellybean API 17 is the actual device OS. I setup my NDK minimum to API 15 though

@DanAlbert
Copy link
Member Author

My bad, I actually read that wrong (I just saw the long path into the NDK, but that was for readelf).

Could you readelf -sW libMagickCore.so | grep -w floor? I'm wondering if you're linking against a more modern libm and there's some symbol versioning stuff going on.

@rcdailey
Copy link

rcdailey commented Jul 18, 2017

@DanAlbert

$ "E:\android\android-ndk-r15b\toolchains\arm-linux-androideabi-4.9\prebuilt\windows-x86_64\bin\arm-linux-androideabi-readelf.exe" -sW libMagickCore.so | grep floor
   115: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND floor
183616: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND floor

(sorry forgot the -w to grep; but ran it again with it and the results are the same)

@DanAlbert
Copy link
Member Author

So much for that theory.

@DanAlbert
Copy link
Member Author

Might give https://github.com/KeepSafe/ReLinker a try just to rule out any linker weirdness.

@rcdailey
Copy link

I'll take a look at that. In the meantime here is one compile & the final *.so link command line output when I build with ninja -v. Not sure if it will help but it's something...

[325/326] E:\android\ndk_72\toolchains\llvm\prebuilt\windows-x86_64\bin\clang.exe --target=armv7-none-linux-androideabi --gcc-toolchain=E:/android/ndk_72/toolchains/arm-linux-androideabi-4.9/prebuilt/windows-x86_64 --sysroot=E:/android/ndk_72/sysroot -DANDROID -DMAGICKCORE_HDRI_ENABLE=0 -DMAGICKCORE_QUANTUM_DEPTH=8 -DMagickCore_EXPORTS -IE:/android/ndk_72/sources/android/cpufeatures -IE:/android/ndk_72/sources/android/native_app_glue -IE:/code/frontend/source/Core/ThirdParty/ImageMagick/source/jni/ImageMagick-7.0.5-2 -IE:/code/frontend/source/Core/ThirdParty/ImageMagick/source/jni/tiff-3.9.7/libtiff -IE:/code/frontend/source/Core/ThirdParty/ImageMagick/source/jni/jpeg-9b -IE:/code/frontend/source/Core/ThirdParty/libpng/source/jni -isystem E:/android/ndk_72/sysroot/usr/include -isystem E:/android/ndk_72/sysroot/usr/include/arm-linux-androideabi -march=armv7-a -mthumb -mfpu=vfpv3-d16 -mfloat-abi=softfp -funwind-tables -no-canonical-prefixes -D__ANDROID_API__=15 -fexceptions -O2 -g -DNDEBUG -fPIC   -Wno-inconsistent-missing-override -Wno-expansion-to-defined -MD -MT Core/ThirdParty/ImageMagick/source/CMakeFiles/MagickCore.dir/jni/ImageMagick-7.0.5-2/MagickCore/quantum-import.c.o -MF Core\ThirdParty\ImageMagick\source\CMakeFiles\MagickCore.dir\jni\ImageMagick-7.0.5-2\MagickCore\quantum-import.c.o.d -o Core/ThirdParty/ImageMagick/source/CMakeFiles/MagickCore.dir/jni/ImageMagick-7.0.5-2/MagickCore/quantum-import.c.o   -c E:/code/frontend/source/Core/ThirdParty/ImageMagick/source/jni/ImageMagick-7.0.5-2/MagickCore/quantum-import.c
[326/326] cmd.exe /C "cd . && E:\android\ndk_72\toolchains\llvm\prebuilt\windows-x86_64\bin\clang.exe --target=armv7-none-linux-androideabi --gcc-toolchain=E:/android/ndk_72/toolchains/arm-linux-androideabi-4.9/prebuilt/windows-x86_64 --sysroot=E:/android/ndk_72/platforms/android-15/arch-arm -fPIC -march=armv7-a -mthumb -mfpu=vfpv3-d16 -mfloat-abi=softfp -funwind-tables -no-canonical-prefixes -D__ANDROID_API__=15 -fexceptions -O2 -g -DNDEBUG  -Wl,--fix-cortex-a8 -u ANativeActivity_onCreate -shared -Wl,-soname,libMagickCore.so -o output\bin\libMagickCore.so @CMakeFiles/MagickCore.rsp  && cd ."

@rcdailey
Copy link

@DanAlbert So I officially have all of my third party libs building in CMake in real time with my targets. There is nothing else left over that I'm linking that is pre-built. Should be no traces of GNU left. I still see _Unwind_Resume not hidden:

$ "E:\android\android-ndk-r15b\toolchains\arm-linux-androideabi-4.9\prebuilt\windows-x86_64\bin\arm-linux-androideabi-readelf.exe" -sW libzApp.so | grep _Unwind
     9: 00000000     0 FUNC    GLOBAL DEFAULT  UND _Unwind_Resume
2504819: 00000000     0 FUNC    GLOBAL DEFAULT  UND _Unwind_Resume

And my build command below (result of ninja -v):

[3/4] E:\android\ndk_72\toolchains\llvm\prebuilt\windows-x86_64\bin\clang++.exe --target=armv7-none-linux-androideabi --gcc-toolchain=E:/android/ndk_72/toolchains/arm-linux-androideabi-4.9/prebuilt/windows-x86_64 --sysroot=E:/android/ndk_72/sysroot  -DANDROID -DBETTER_ENUMS_STRICT_CONVERSION -DBOOST_ALL_NO_LIB=1 -DBOOST_ASIO_DISABLE_THREAD_KEYWORD_EXTENSION -DBOOST_BIND_NO_PLACEHOLDERS -DBOOST_FILESYSTEM_NO_DEPRECATED -DBOOST_SYSTEM_NO_DEPRECATED -DBOOST_THREAD_PROVIDES_EXECUTORS -DBOOST_THREAD_USES_CHRONO -DBOOST_THREAD_VERSION=4 -DBUILD_OGLES2 -DMAGICKCORE_HDRI_ENABLE=0 -DMAGICKCORE_QUANTUM_DEPTH=8 -DNOAUTOLINK_MAGICK -DOPENSSL_NO_ASM -DSTATIC_MAGICK -DZIOSK_ENABLE_ZPAY_DIAGNOSTICS -DZIOSK_MODULE_NAME=\"zApp\" -D_MAGICKLIB_ -DzApp_EXPORTS -IE:/code/frontend/source/Applications/zApp/Source -IE:/code/frontend/source/Core/UI/Source -ICore/UI/Source -IE:/code/frontend/source/Core/ThirdParty/PowerVR/sdk/Include -IE:/code/frontend/source/Core/ThirdParty/PowerVR/tools -IE:/code/frontend/source/Core/ThirdParty/PowerVR/tools/OGLES2 -isystem Core/ThirdParty/boost/source/boost/boost_1_64_0 -IE:/code/frontend/source/Core/ThirdParty/openssl/source/include -ICore/ThirdParty/openssl/source/include -IE:/code/frontend/source/Core/ThirdParty/sqlite/source -IE:/code/frontend/source/Core/ThirdParty/cereal/include -IE:/code/frontend/source/Core/ThirdParty/rapidxml/include -IE:/code/frontend/source/Core/ThirdParty/better-enums/include -IE:/code/frontend/source/Core/ThirdParty/libpng/source/jni -IE:/code/frontend/source/Core/ThirdParty/ImageMagick/source/jni/ImageMagick-7.0.5-2 -IE:/code/frontend/source/Core/ThirdParty/bsp/msr/include -IE:/android/ndk_72/sources/android/cpufeatures -IE:/android/ndk_72/sources/android/native_app_glue -IE:/code/frontend/source/Core/Barcode/Source -IE:/code/frontend/source/Core/ThirdParty/zxing/source/core/src -IE:/code/frontend/source/Applications/DynamicUI/Source -IE:/code/frontend/source/Core/WebServices/Source -IE:/code/frontend/source/Applications/OrderEntry/Source -IE:/code/frontend/source/Services/Source -IE:/code/frontend/source/Applications/zPayService/Interface/Source -IE:/code/frontend/source/Applications/PATT/Source -IE:/code/frontend/source/Applications/Loyalty/Source -IE:/code/frontend/source/Applications/Survey/Source -IE:/code/frontend/source/Applications/EmailClub/Source -IE:/code/frontend/source/Applications/SettingsManager/Source -IE:/code/frontend/source/Applications/MessagingModule/Source -IE:/code/frontend/source/Applications/ETM/Source -isystem E:/android/ndk_72/sources/cxx-stl/llvm-libc++/include -isystem E:/android/ndk_72/sources/android/support/include -isystem E:/android/ndk_72/sources/cxx-stl/llvm-libc++abi/include -isystem E:/android/ndk_72/sysroot/usr/include -isystem E:/android/ndk_72/sysroot/usr/include/arm-linux-androideabi -march=armv7-a -mthumb -mfpu=vfpv3-d16 -mfloat-abi=softfp -funwind-tables -no-canonical-prefixes -D__ANDROID_API__=15 -fexceptions -frtti -O2 -g -DNDEBUG -fPIC   -Wno-inconsistent-missing-override -Wno-expansion-to-defined -std=gnu++14 -MD -MT Applications/zApp/CMakeFiles/zApp.dir/Source/ZioskApp.cpp.o -MF Applications\zApp\CMakeFiles\zApp.dir\Source\ZioskApp.cpp.o.d -o Applications/zApp/CMakeFiles/zApp.dir/Source/ZioskApp.cpp.o -c E:/code/frontend/source/Applications/zApp/Source/ZioskApp.cpp
[4/4] cmd.exe /C "cd . && E:\android\ndk_72\toolchains\llvm\prebuilt\windows-x86_64\bin\clang++.exe --target=armv7-none-linux-androideabi --gcc-toolchain=E:/android/ndk_72/toolchains/arm-linux-androideabi-4.9/prebuilt/windows-x86_64 --sysroot=E:/android/ndk_72/platforms/android-15/arch-arm -fPIC -march=armv7-a -mthumb -mfpu=vfpv3-d16 -mfloat-abi=softfp -funwind-tables -no-canonical-prefixes -D__ANDROID_API__=15 -fexceptions -frtti -O2 -g -DNDEBUG  -Wl,--fix-cortex-a8 -u ANativeActivity_onCreate -Wl,--no-undefined -shared -Wl,-soname,libzApp.so -o output\bin\libzApp.so Applications/zApp/CMakeFiles/zApp.dir/Source/ZioskApp.cpp.o
output/lib/libUI.a output/lib/libBarcode.a output/lib/libDynamicUI.a output/lib/libOrderEntry.a output/lib/libPATT.a output/lib/libServices.a output/lib/libSurvey.a output/lib/libEmailClub.a output/lib/libSettingsManager.a output/lib/libLoyalty.a output/lib/libMessagingModule.a output/lib/libETM.a -landroid output/lib/libcpufeatures.a output/lib/libnative_app_glue.a -ljnigraphics -lm -llog -lEGL -lGLESv2 output/lib/libDynamicUI.a output/lib/libPATT.a output/lib/libLoyalty.a output/lib/libDynamicUI.a output/lib/libPATT.a output/lib/libLoyalty.a output/lib/libBarcode.a output/lib/libzxing.a output/lib/libOrderEntry.a output/lib/libServices.a output/lib/libzPayServiceInterface.a output/lib/libServices.a output/lib/libzPayServiceInterface.a output/lib/libWebServices.a output/lib/libUI.a output/lib/libPowerVR.a output/lib/libboost_context.a output/lib/libboost_date_time.a output/lib/libboost_filesystem.a output/lib/libboost_regex.a output/lib/libboost_signals.a output/lib/libboost_thread.a output/lib/libboost_chrono.a output/lib/libboost_system.a output/lib/libssl.a output/lib/libcrypto.a output/lib/libsqlite.a output/lib/libpng.a -lz output/bin/libMagickWand.so output/bin/libMagickCore.so output/lib/libcpufeatures.a -ldl output/lib/libnative_app_glue.a -landroid -ljnigraphics -lm -llog -lEGL -lGLESv2  "E:/android/ndk_72/sources/cxx-stl/llvm-libc++/libs/armeabi-v7a/libc++_shared.so" "E:/android/ndk_72/sources/cxx-stl/llvm-libc++/libs/armeabi-v7a/libandroid_support.a" && cd ."

Note that [3/4] is a build of a CPP file, and [4/4] is the link of libzApp.so.

I'm out of ideas...

@DanAlbert
Copy link
Member Author

You're still using the upstream CMake support, right? I don't see any mention of -Wl,--exclude-libs,libgcc.a or -Wl,--exclude-libs,libunwind.a. Without this, the following (which is probably what happens by default under CMake):

$ clang++ foo.o -lbar -lgcc -o libfoo.so -shared

Will result in libfoo.so having undefined references to things in libgcc (like the unwind symbols) because the linker thinks it can get them from libbar.

IMO, just switch to our cmake toolchain file. These sorts of problems are basically the whole reason we have our own.

@rcdailey
Copy link

Does your toolchain file use CMake's modern Android NDK integration features (as documented here)? I want to avoid using "ghetto" toolchain files like takanome's, which we had to use in the "old days".

@DanAlbert
Copy link
Member Author

DanAlbert commented Jul 20, 2017

Ours basically is take-no-me's, but given that your options seem to be "ghetto" and "broken", "ghetto" seems like a good choice.

We're working on integrating ours with the modern CMake approach, but the modern approach didn't exist when we created it and these things take time.

@rcdailey
Copy link

Sorry I didn't mean my comment to come off as rude or a complaint. What I'm trying to say is that I'd rather upstream CMake do everything your toolchain does. Brad has constantly told me that the design intent for toolchain files in CMake has been to be very simple things that do not do any system introspection. Takanome's is a violation of that, and a symptom of a larger problem: Neeidng better built in support for NDK.

Sure, I do want my stuff working but another goal of mine is to help contribute these to upstream CMake so that maybe eventually the NDK won't need to package a toolchain file. In the future, it would be great if the NDK developers could work with the CMake devs to help contribute these issues. Dan you have a lot of valuable knowledge that I have not been able to find on my own. And it's a huge waste of your expertise and time to have to answer the same questions over and over again (either by dealing with people like me, or having to code them explicitly into a toolchain file).

I'm willing to help (even if there's not too much I can do besides facilitate communication), but we really need to get that knowledge out of your brain & the toolchain file bundled with the NDK and get it into upstream CMake. Unless I'm misunderstanding some separation of concerns here, that seems to be the ideal long term solution.

@DanAlbert
Copy link
Member Author

No worries, wasn't taken as such :) I'm well aware that ours isn't ideal right now. I just wanted to point out that although ours should be cleaned up, ours seems to work, whereas the clean implementation does not. If you want your build to work again ASAP, ours might be the better choice.

Brad has constantly told me that the design intent for toolchain files in CMake has been to be very simple things that do not do any system introspection.

Yeah, we've had this conversation with him too. He's got us pointed in the right direction, it's just going to take some time to get it done.

Essentially what we'll have when this is done is the built-in CMake pieces will code to hook into CMake modules that are shipped in the NDK. That keeps CMake out of the business of tracking a dozen variations for each NDK version and avoids the issue of CMake not working for NDK rN+1 until the next version of CMake is available.

It looks like we don't have a bug for this filed right now. I'll track down the WIP commits and file one. I'll get you, Brad, and the Studio engineer that was working on this CC'd.

@rcdailey
Copy link

That's great news Dan. I'm glad that it's being worked on behind the scenes. Happy to help where I can.

For now I'll try to adopt the built in toolchain file. This seems to be the most scalable solution right now since as I upgrade the NDK I won't have to constantly mess with my own android-specific configuration. If nothing else I want to do it to see if it fixes the problems I'm seeing.

I'll let you know what happens.

@rcdailey
Copy link

Using your toolchain file, this is what I get:

$ "E:\android\android-ndk-r15b\toolchains\arm-linux-androideabi-4.9\prebuilt\windows-x86_64\bin\arm-linux-androideabi-readelf.exe" -sW bin\libzApp.so | grep _Unwind
 48999: 00000000     0 FUNC    GLOBAL DEFAULT  UND __gnu_Unwind_Find_exidx
156348: 010c4521    92 FUNC    LOCAL  DEFAULT   11 _ZN12_GLOBAL__N_1L14unwindOneFrameEjP21_Unwind_Control_BlockP15_Unwind_Context
156358: 010c463d   296 FUNC    LOCAL  DEFAULT   11 _ZL13unwind_phase2P13unw_context_tP21_Unwind_Control_Blockb
156666: 010c3f4d   700 FUNC    LOCAL  HIDDEN    11 _Unwind_VRS_Interpret
156667: 010c43b1   360 FUNC    LOCAL  HIDDEN    11 _Unwind_VRS_Pop
156669: 010c4209   208 FUNC    LOCAL  HIDDEN    11 _Unwind_VRS_Get
156670: 010c42d9   216 FUNC    LOCAL  HIDDEN    11 _Unwind_VRS_Set
156672: 010c458d   176 FUNC    LOCAL  HIDDEN    11 _Unwind_RaiseException
156673: 010c4765     2 FUNC    LOCAL  HIDDEN    11 _Unwind_Complete
156674: 010c4769   164 FUNC    LOCAL  HIDDEN    11 _Unwind_Resume
156675: 010c480d    64 FUNC    LOCAL  HIDDEN    11 _Unwind_GetLanguageSpecificData
156676: 010c484d    64 FUNC    LOCAL  HIDDEN    11 _Unwind_GetRegionStart
156677: 010c488d    12 FUNC    LOCAL  HIDDEN    11 _Unwind_DeleteException
205802: 00000000     0 FUNC    GLOBAL DEFAULT  UND __gnu_Unwind_Find_exidx

Looking better (?) already... still need to run it on device.

@DanAlbert
Copy link
Member Author

Yep! That's the expected output. The __gnu_Unwind_Find_exidx one is actually provided by libc, so that one should be undefined, unlike the others.

@rcdailey
Copy link

rcdailey commented Jul 20, 2017

Looks like the bundled toolchain file solves all my problems. Sorry I've been hard headed about this; I know you told me it would a while back. Somehow we need to communicate in the CMake documentation that the built-in support isn't ready to use yet. It has numerous problems the minute you decide to use LLVM STL.

Now my only gripe is that backtraces in tombstones end where abort() is invoked by standard assert()... I was really hoping switching away from GNU STL would fix this, but doesn't seem so. Anyway that's unrelated to this issue and I don't want to start down a tangent...

Thanks for everything guys!

@enh
Copy link
Contributor

enh commented Jul 20, 2017

(if you have problems unwinding on a modern release -- there's not much we can do about JellyBean -- please file a bug so our unwinding expert can take a look.)

@rcdailey
Copy link

rcdailey commented Jul 27, 2017

I have one last question. Regarding the unwinding problems, I'm still seeing issues with short backtraces in tombstones. This makes it very difficult to diagnose segfaults. This makes it hard to tell if this is a problem with the NDK, my code, the way I'm building librarires, or an issue with Android OS itself. Is there a way I can get in touch with some android OS developers (this "unwinding expert" maybe) to ask further about this issue? @enh mentioned filing a bug, but where would I do this? Also I'm not 100% sure this is a bug yet at the OS level. I just need a good next step to take. This is not something I can figure out by myself, that's for sure. Thanks for everything so far everyone.

@enh
Copy link
Contributor

enh commented Jul 27, 2017

if you can create a reproduceable test case of a bad unwind (on a recent release, since there's nothing we can do about the distant past), @cferris1000 would be interested to see it...

@rcdailey
Copy link

Well in my case, I'm stuck on certain platforms because we manufacture and maintain our own ARM devices in-house. Those platforms being API 15, 17, and 22. If we could make custom changes to AOSP to fix the problem, since we maintain our own fork of Android itself, that's a feasible option if we can't get any fixes from Google upstream for older OS releases. If this is truly a problem with an older Android OS release, would it be feasible to get his help so we can fix it ourselves? I'm also not sure how to provide a reproducible test case, I'd have to discuss with him to know what I can do to help. Most of what I'm using is proprietary. Not sure the best way to communicate with him other than this github issue...

@rcdailey
Copy link

rcdailey commented Jul 27, 2017

I found an internal issue from last year when I worked with our resident OS team to diagnose why back traces were not unwinding into our library code. At the time we were using GCC + GNU STL on NDK r10. Some quotes:

debugging backtraces generation I got the line that stops unwinding calls with comment:

// The first word is a place-relative pointer to a generic personality
// routine function. We don't support invoking such functions, so stop here.

A long time when I googled "personality routines", I came up with: https://issuetracker.google.com/issues/36982950 (not sure if it's relevant; this is outside of my domain of expertise)

Another member of the OS team stated:

There are a few mentions of Gabi++ not handling exceptions properly. Is this the C++ library you are using in your native code? It looks like the options for better handling would be a different C++ library port, or modifying the one you are using.
http://mobilepearls.com/labs/native-android-api/ndk/docs/CPLUSPLUS-SUPPORT.html

Still a lot of uncertainty, but that's about all we were able to figure out. Maybe @cferris1000 has some insight we can use to make some changes on our side.

@enh
Copy link
Contributor

enh commented Jul 27, 2017

you could rip out the unwinder and replace it with a newer one, but that would probably mean upgrading many other parts of the tree too! 15 and 17 are old enough that you're using libcorkscrew and don't have a decent STL. 22 will at least be libc++ and libunwind, but a pretty old libunwind. you might be able to backport the current version, but you should probably try running your crashing code on a new release first to see whether it's actually worthwhile. (we're actually moving on from libunwind to our own unwinder at this point, but that won't be the default until P.)

@rcdailey
Copy link

So if 22 is libc++, what is 15 and 17? Are you talking about libc++_shared.so in the NDK? I do link against that and ship it with my APK, even for the older devices.

@enh
Copy link
Contributor

enh commented Jul 27, 2017

i meant the platform STL. 15 and 17 were still stlport.

@rcdailey
Copy link

Well I can still pick gnustl or c++stl with platform set to android-15. You're saying that the underlying implementation is borrowed from stlport? Meaning that libc++ was not actually used (because libc was not up to par until API 21)?

@DanAlbert
Copy link
Member Author

No. For platform code (things in an AOSP) you're not using the NDK unless the module has LOCAL_SDK_VERSION set. In that case (for releases older than M), you don't have an STL by default. Prior to L, you could opt in to stlport (/system/lib/libstlport.so). Starting with L, you could opt in to that or libc++ (/system/lib/libc++.so). Starting with M, libc++ was the default. All of these are built as part of the system image and the binaries are not portable.

For the NDK you do not (cannot and should not) use those libraries because the ABI is not stable (/system/lib/libstlport.so doesn't even exist on a modern release), so you pack the STL with your application. This is where libc++_shared.so, libstlport_shared.so, and libgnustl_shared.so come in. You can use any of these to target any release.

@rcdailey
Copy link

rcdailey commented Jul 27, 2017

Thanks for the information Dan & enh. I'm going to leave the backtrace issues alone for now, I'm definitely getting better backtraces on API 22 devices; so it's an OS problem and out of scope for NDK at this point I think. Thank you for helping me with those questions even though it was a distraction.

Back to NDK compatibility, I'm getting exceptions from different parts of boost that I wasn't seeing when I linked against GNU STL. Because of the lack of backtrace, I'm not able to deduce exactly why these things are causing segfaults. In one case, boost::filesystem reported a "Function not implemented" exception when using operator++ with directory_iterator:

boost::filesystem::directory_iterator::operator++: Function not implemented

The above was the string in the what() of the filesystem exception. For as difficult as boost code is to read, I was only able to determine that this seems to come from low-level code in boost used to interface with platform-specific APIs and libraries for filesystem operations:

      temp_ec = dir_itr_increment(it.m_imp->handle,
#       if defined(BOOST_POSIX_API)
        it.m_imp->buffer,
#       endif
        filename, file_stat, symlink_file_stat);

      if (temp_ec)  // happens if filesystem is corrupt, such as on a damaged optical disc
      {
        path error_path(it.m_imp->dir_entry.path().parent_path());  // fix ticket #5900
        it.m_imp.reset();
        if (ec == 0)
          BOOST_FILESYSTEM_THROW(
            filesystem_error("boost::filesystem::directory_iterator::operator++",
              error_path,
              error_code(BOOST_ERRNO, system_category())));
        ec->assign(BOOST_ERRNO, system_category());
        return;
      }

Above is from filesystem/src/operations.cpp in Boost 1.64.0.

Secondly, A really common and consistent one I'm getting as well is related to boost::lexical_cast. When I use it to cast a string to bool, I get this in my tombstone:

backtrace:
    #00  pc fffffffc  <unknown>
    #01  pc 006e0965  /data/app-lib/com.ttm.zapp-2/libzApp.so (boost::exception_detail::refcount_ptr<boost::exception_detail::error_info_container>::release()+34)
    #02  pc 006e0907  /data/app-lib/com.ttm.zapp-2/libzApp.so (boost::exception_detail::refcount_ptr<boost::exception_detail::error_info_container>::~refcount_ptr()+16)

stack:
         60155968  60155a90  
         6015596c  5e9f3ac9  /data/app-lib/com.ttm.zapp-2/libzApp.so (ErrorLog::PrintLog(Severity, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, int, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)+416)
         60155970  00000063  
         60155974  580da728  
         60155978  580da728  
         6015597c  580da728  
         60155980  601559a0  
         60155984  5e9cf93f  /data/app-lib/com.ttm.zapp-2/libzApp.so (boost::exception_detail::refcount_ptr<boost::exception_detail::error_info_container>::adopt(boost::exception_detail::error_info_container*)+38)
         60155988  5cec4700  
         6015598c  580da728  
         60155990  00000000  
         60155994  580da728  
         60155998  00000000  
         6015599c  60155a00  
         601559a0  df0027ad  
         601559a4  00000000  
    #00  601559a8  601559c0  
         ........  ........
    #01  601559a8  601559c0  
         601559ac  60155a34  
         601559b0  60155a34  
         601559b4  60155a34  
         601559b8  601559d0  
         601559bc  5e9cf90b  /data/app-lib/com.ttm.zapp-2/libzApp.so (boost::exception_detail::refcount_ptr<boost::exception_detail::error_info_container>::~refcount_ptr()+20)
    #02  601559c0  0000000c  
         601559c4  60155a34  
         601559c8  60155a34  
         601559cc  60155a34  
         601559d0  601559e8  
         601559d4  5e9cf99d  /data/app-lib/com.ttm.zapp-2/libzApp.so (boost::exception::~exception()+36)
         601559d8  60155a88  
         601559dc  60155a30  
         601559e0  60155a30  
         601559e4  60155a30  
         601559e8  60155a00  
         601559ec  5e9cf3b7  /data/app-lib/com.ttm.zapp-2/libzApp.so (boost::exception_detail::error_info_injector<boost::bad_lexical_cast>::~error_info_injector()+26)
         601559f0  60155a30  
         601559f4  60155a24  
         601559f8  60155a24  
         601559fc  60155a24  

All of these issues give me the feeling that there's something that boost or STL is using that isn't supported on this older version of Android. Maybe something related to the system libs, although after doing readelf and such, as well as having -W1,--no-undefined linker flag, I'm not dealing with any linker / missing symbols issues at the build stage. I have no idea at this point.

Note that in these failure cases, my minimum platform is android-15 and I'm running on a device running Android 17. When I take that same build (using x86 architecture instead of ARM, due to the different hardware) and run it on Android OS at API 22 (minimum still set to android-15) I do not get these errors/exceptions.

Any hints that could point me in the right direction here? To stay within scope of the conversation here, I at least want to rule out any NDK or build configuration problems (CMake bugs, NDK toolchain issues, etc). If this keeps up and there is no solution (worst case: OS is too old and we can't use LLVM STL), then I'll be forced to switch back to GNU STL for the older platforms. What's odd is the OS hasn't changed when I switch to GNU STL, that's the only reason I'm not able to feel completely confident this is an OS issue.

Again thanks for all the help so far, I really appreciate the continued support. I'm lost without you guys helping me out lol.

@cferris1000
Copy link
Collaborator

The older devices don't support dwarf unwind information on arm, but newer devices do. So if something is using just dwarf for some functions, you would see this behavior.

@rcdailey
Copy link

@cferris1000 is that something configurable on older devices? Or we're stuck? My hope is that even if it's not configurable, there are maybe some light code changes to android OS we can make to improve this.

@cferris1000
Copy link
Collaborator

I don't think there is any easy way to do it, we switched to a completely different unwinder in newer versions, and isolated all of the unwind calls through a new library. You'd need to pull in external/libunwind and system/core/libbacktrace and then modify system/core/debuggerd to use the libbacktrace code. This has been done in the new OS versions, but that code is a lot different now and requires newer features to work.

Worse yet, that code might require a newer version of clang (and might not compile with gcc), so you are fighting an uphill battle.

@rcdailey
Copy link

I was afraid of that... would it be fair to say then, that on older platforms, native application developers are screwed? Without on-device debugging, there's no way I've found to be able to properly diagnose segfaults due to the lack of backtraces in tombstones. I wish I could magically introduce some library in my application code to do this instead of the OS, but that doesn't seem feasible.

Thanks for the information.

@cferris1000
Copy link
Collaborator

There is a plan to create an unwinder library in the NDK, so that app developers can use it everywhere to get good backtraces. There is no current eta on it though.

Otherwise, yes, some crashes on these older systems might not give you good backtraces.

@rcdailey
Copy link

Per my post earlier, I'm suspecting I may have to switch back to GNU STL. Hopefully @DanAlbert can prove me wrong, but I'm getting too many strange runtime issues that I don't know how to deal with. Optimistically, if these are NDK problems I'd like to help get them resolved however I can.

BTW, thanks @cferris1000 for the information!

@webmaster128
Copy link

Maybe start with the filesystem_error you are receiving from boost? What is the root cause of it?

This is a bug in Boost: https://svn.boost.org/trac10/ticket/13172 that can be fixed with https://gist.github.com/webmaster128/5912a70d100e9ef341df67b177c465d6

cc @rcdailey

@rcdailey
Copy link

rcdailey commented Aug 21, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants