-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
debug rcdailey's zlib unwind issues #457
Comments
What indicates if those symbols are "hidden"? I can grep all my libs for that keyword... but not sure what to look for besides that. |
Here are some examples of what various symbols should look like: extern "C" {
__attribute__((visibility("hidden"))) void myfunc_hidden() {}
static void myfunc_static() {}
extern void myfunc_extern();
void myfunc_public() { myfunc_extern(); }
} $ readelf -sW foo/libs/armeabi-v7a/libfoo.so | grep myfunc_
3: 000006ab 4 FUNC GLOBAL DEFAULT 11 myfunc_public
4: 00000000 0 NOTYPE GLOBAL DEFAULT UND myfunc_extern The column that says |
@rcdailey Please check: your APK should not contain libzlib.so packed in any form. NDK build will use the system version of the library. The binary that is packed with NDK is intended to be used as a link stub only, I doubt if it has ever been tested. Now, if the system version is bad (which happens on private ROMs), there is an easy workaround. You can add the official sources to your project and use your version of zlib. The size of the library is quite small, and it uses no private APIs that could make the system version better. Actually, there are no reasons why zlib should be packaged in NDK, except historical. |
@alexcohn I am in fact packaging Do I still need to load the EDIT: I also package |
Well, this pretty much explains the problem that you encountered. Just like you don't package libc.so or lbm.so with your APK, libz.so is part of the public NDK libraries list. I agree that the distinction between libz.so and libc++_shared.so is not clear for users of NDK. In a parallel thread I have even suggested to provide a mechanism of sharing libc++_shared.so across applications. It's only under 1 MB, but still the benefit will also be that any fixes (including security fixes) will be available to everybody. But as of today, you should package libc++_shared.so and should not package libz.so. |
@alexcohn: There really is no way to "share" libc++_shared.so though. It is intentionally meant to be unstable, so that the implementation of it can be improved over time (and also ABI bugs fixed, etc.). The real trouble here is that C++ was never intended to be used for dynamic shared libraries. In the C++ model, everything is meant to be compiled at the same time, with the same components, rather than having a path to do partial upgrades. It is great that partial updates can work for the vast majority of developer scenarios, but upgrades of the STL really will never be possible based on the way that the C++ standard works today. |
Thanks @alexcohn. Unfortunately I won't be able to retest my scenario until Monday when I'm in the office. However I'll let you know the results then! |
@stephenhines: Did I ever say sharing libc++ is easy? But never say never… |
@DanAlbert: I ran the
There are 2 and neither are hidden based on your description. If this is problematic, what is the next step? Should I run readelf on my static libs too or something, to narrow it down to a specific third party library? |
Also I removed
It's consistently coming from |
So I did a grep of all my prebuilt Some libs that are showing
If they show up here, would it cause problems? Do I need to rebuild them even if there are no linker issues? |
Maybe start with the |
Exactly. The libz.so in the NDK is just a stub. Every function in it is
Yeah, this is correct. The C++ STLs are not ABI stable, so we ship real libraries in the NDK for you to package in your app.
You can determine which category a library falls into based on its location in the NDK. Anything in I think I'm going to be rewriting the C++ Libraries doc soon to account for the fact that we're advocating for people to switch to libc++ starting with r16. I'll make sure I mention this.
This is definitely part of the problem. You can't reliably mix STLs in the same app (there are some ways to do it that aren't exactly correct if you're very careful, but it's best to just avoid it). Should also be noted that you'll need to use the shared version of the STL (you mention above that you're using c++_shared, so you're fine, but just a note that you shouldn't switch to c++_static). This is the case whenever you have more than one shared library (the actual conditions are a bit more complex, but that's a good rule of thumb). Any chance those libraries were also built with an old version of the NDK? There were definitely some problems with the way we linked the unwinder until r12 or r13. If rebuilding the world still has those unwind symbols left undefined or public, then you're probably being hit by #379.
The linker can't catch everything, sadly. struct mystruct {
std::string s;
}
void foo(const mystruct&); If something like the above is used in a library, everything will still link fine even if the two |
Thanks for all the great feedback so far. I think I'm going to head down the road of just making sure all our third party libs build together with my normal targets. However, as I'm working towards that goal, I run into new issues each time... I have ImageMagick building with r15b using the same toolchain settings as my normal targets, and loading that shared lib now says it can't find "floor":
I tried explicitly doing Any reason for this? Floor is provided by the C library so it should be finding it... I don't understand. |
|
Ok so that means I do need to explicitly load |
Looks like loading
|
both libc and libm will already have been loaded by the zygote. what OS release and architecture is this on? did you pull the libm off the device and check it actually has a floor symbol? |
|
That's not the real libm though. |
@DanAlbert That's what I did, I used filezilla to copy it over but it's the same one. Sorry for the confusion. |
what version of Android is this? |
Jellybean API 17 is the actual device OS. I setup my NDK minimum to API 15 though |
My bad, I actually read that wrong (I just saw the long path into the NDK, but that was for readelf). Could you |
(sorry forgot the |
So much for that theory. |
Might give https://github.com/KeepSafe/ReLinker a try just to rule out any linker weirdness. |
I'll take a look at that. In the meantime here is one compile & the final *.so link command line output when I build with
|
@DanAlbert So I officially have all of my third party libs building in CMake in real time with my targets. There is nothing else left over that I'm linking that is pre-built. Should be no traces of GNU left. I still see
And my build command below (result of
Note that I'm out of ideas... |
You're still using the upstream CMake support, right? I don't see any mention of $ clang++ foo.o -lbar -lgcc -o libfoo.so -shared Will result in libfoo.so having undefined references to things in libgcc (like the unwind symbols) because the linker thinks it can get them from libbar. IMO, just switch to our cmake toolchain file. These sorts of problems are basically the whole reason we have our own. |
Does your toolchain file use CMake's modern Android NDK integration features (as documented here)? I want to avoid using "ghetto" toolchain files like takanome's, which we had to use in the "old days". |
Ours basically is take-no-me's, but given that your options seem to be "ghetto" and "broken", "ghetto" seems like a good choice. We're working on integrating ours with the modern CMake approach, but the modern approach didn't exist when we created it and these things take time. |
Sorry I didn't mean my comment to come off as rude or a complaint. What I'm trying to say is that I'd rather upstream CMake do everything your toolchain does. Brad has constantly told me that the design intent for toolchain files in CMake has been to be very simple things that do not do any system introspection. Takanome's is a violation of that, and a symptom of a larger problem: Neeidng better built in support for NDK. Sure, I do want my stuff working but another goal of mine is to help contribute these to upstream CMake so that maybe eventually the NDK won't need to package a toolchain file. In the future, it would be great if the NDK developers could work with the CMake devs to help contribute these issues. Dan you have a lot of valuable knowledge that I have not been able to find on my own. And it's a huge waste of your expertise and time to have to answer the same questions over and over again (either by dealing with people like me, or having to code them explicitly into a toolchain file). I'm willing to help (even if there's not too much I can do besides facilitate communication), but we really need to get that knowledge out of your brain & the toolchain file bundled with the NDK and get it into upstream CMake. Unless I'm misunderstanding some separation of concerns here, that seems to be the ideal long term solution. |
No worries, wasn't taken as such :) I'm well aware that ours isn't ideal right now. I just wanted to point out that although ours should be cleaned up, ours seems to work, whereas the clean implementation does not. If you want your build to work again ASAP, ours might be the better choice.
Yeah, we've had this conversation with him too. He's got us pointed in the right direction, it's just going to take some time to get it done. Essentially what we'll have when this is done is the built-in CMake pieces will code to hook into CMake modules that are shipped in the NDK. That keeps CMake out of the business of tracking a dozen variations for each NDK version and avoids the issue of CMake not working for NDK rN+1 until the next version of CMake is available. It looks like we don't have a bug for this filed right now. I'll track down the WIP commits and file one. I'll get you, Brad, and the Studio engineer that was working on this CC'd. |
That's great news Dan. I'm glad that it's being worked on behind the scenes. Happy to help where I can. For now I'll try to adopt the built in toolchain file. This seems to be the most scalable solution right now since as I upgrade the NDK I won't have to constantly mess with my own android-specific configuration. If nothing else I want to do it to see if it fixes the problems I'm seeing. I'll let you know what happens. |
Using your toolchain file, this is what I get:
Looking better (?) already... still need to run it on device. |
Yep! That's the expected output. The |
Looks like the bundled toolchain file solves all my problems. Sorry I've been hard headed about this; I know you told me it would a while back. Somehow we need to communicate in the CMake documentation that the built-in support isn't ready to use yet. It has numerous problems the minute you decide to use LLVM STL. Now my only gripe is that backtraces in tombstones end where Thanks for everything guys! |
(if you have problems unwinding on a modern release -- there's not much we can do about JellyBean -- please file a bug so our unwinding expert can take a look.) |
I have one last question. Regarding the unwinding problems, I'm still seeing issues with short backtraces in tombstones. This makes it very difficult to diagnose segfaults. This makes it hard to tell if this is a problem with the NDK, my code, the way I'm building librarires, or an issue with Android OS itself. Is there a way I can get in touch with some android OS developers (this "unwinding expert" maybe) to ask further about this issue? @enh mentioned filing a bug, but where would I do this? Also I'm not 100% sure this is a bug yet at the OS level. I just need a good next step to take. This is not something I can figure out by myself, that's for sure. Thanks for everything so far everyone. |
if you can create a reproduceable test case of a bad unwind (on a recent release, since there's nothing we can do about the distant past), @cferris1000 would be interested to see it... |
Well in my case, I'm stuck on certain platforms because we manufacture and maintain our own ARM devices in-house. Those platforms being API 15, 17, and 22. If we could make custom changes to AOSP to fix the problem, since we maintain our own fork of Android itself, that's a feasible option if we can't get any fixes from Google upstream for older OS releases. If this is truly a problem with an older Android OS release, would it be feasible to get his help so we can fix it ourselves? I'm also not sure how to provide a reproducible test case, I'd have to discuss with him to know what I can do to help. Most of what I'm using is proprietary. Not sure the best way to communicate with him other than this github issue... |
I found an internal issue from last year when I worked with our resident OS team to diagnose why back traces were not unwinding into our library code. At the time we were using GCC + GNU STL on NDK r10. Some quotes:
A long time when I googled "personality routines", I came up with: https://issuetracker.google.com/issues/36982950 (not sure if it's relevant; this is outside of my domain of expertise) Another member of the OS team stated:
Still a lot of uncertainty, but that's about all we were able to figure out. Maybe @cferris1000 has some insight we can use to make some changes on our side. |
you could rip out the unwinder and replace it with a newer one, but that would probably mean upgrading many other parts of the tree too! 15 and 17 are old enough that you're using libcorkscrew and don't have a decent STL. 22 will at least be libc++ and libunwind, but a pretty old libunwind. you might be able to backport the current version, but you should probably try running your crashing code on a new release first to see whether it's actually worthwhile. (we're actually moving on from libunwind to our own unwinder at this point, but that won't be the default until P.) |
So if 22 is libc++, what is 15 and 17? Are you talking about |
i meant the platform STL. 15 and 17 were still stlport. |
Well I can still pick gnustl or c++stl with platform set to android-15. You're saying that the underlying implementation is borrowed from stlport? Meaning that libc++ was not actually used (because libc was not up to par until API 21)? |
No. For platform code (things in an AOSP) you're not using the NDK unless the module has For the NDK you do not (cannot and should not) use those libraries because the ABI is not stable (/system/lib/libstlport.so doesn't even exist on a modern release), so you pack the STL with your application. This is where libc++_shared.so, libstlport_shared.so, and libgnustl_shared.so come in. You can use any of these to target any release. |
Thanks for the information Dan & enh. I'm going to leave the backtrace issues alone for now, I'm definitely getting better backtraces on API 22 devices; so it's an OS problem and out of scope for NDK at this point I think. Thank you for helping me with those questions even though it was a distraction. Back to NDK compatibility, I'm getting exceptions from different parts of boost that I wasn't seeing when I linked against GNU STL. Because of the lack of backtrace, I'm not able to deduce exactly why these things are causing segfaults. In one case, boost::filesystem reported a "Function not implemented" exception when using operator++ with directory_iterator:
The above was the string in the
Above is from Secondly, A really common and consistent one I'm getting as well is related to
All of these issues give me the feeling that there's something that boost or STL is using that isn't supported on this older version of Android. Maybe something related to the system libs, although after doing Note that in these failure cases, my minimum platform is Any hints that could point me in the right direction here? To stay within scope of the conversation here, I at least want to rule out any NDK or build configuration problems (CMake bugs, NDK toolchain issues, etc). If this keeps up and there is no solution (worst case: OS is too old and we can't use LLVM STL), then I'll be forced to switch back to GNU STL for the older platforms. What's odd is the OS hasn't changed when I switch to GNU STL, that's the only reason I'm not able to feel completely confident this is an OS issue. Again thanks for all the help so far, I really appreciate the continued support. I'm lost without you guys helping me out lol. |
The older devices don't support dwarf unwind information on arm, but newer devices do. So if something is using just dwarf for some functions, you would see this behavior. |
@cferris1000 is that something configurable on older devices? Or we're stuck? My hope is that even if it's not configurable, there are maybe some light code changes to android OS we can make to improve this. |
I don't think there is any easy way to do it, we switched to a completely different unwinder in newer versions, and isolated all of the unwind calls through a new library. You'd need to pull in external/libunwind and system/core/libbacktrace and then modify system/core/debuggerd to use the libbacktrace code. This has been done in the new OS versions, but that code is a lot different now and requires newer features to work. Worse yet, that code might require a newer version of clang (and might not compile with gcc), so you are fighting an uphill battle. |
I was afraid of that... would it be fair to say then, that on older platforms, native application developers are screwed? Without on-device debugging, there's no way I've found to be able to properly diagnose segfaults due to the lack of backtraces in tombstones. I wish I could magically introduce some library in my application code to do this instead of the OS, but that doesn't seem feasible. Thanks for the information. |
There is a plan to create an unwinder library in the NDK, so that app developers can use it everywhere to get good backtraces. There is no current eta on it though. Otherwise, yes, some crashes on these older systems might not give you good backtraces. |
Per my post earlier, I'm suspecting I may have to switch back to GNU STL. Hopefully @DanAlbert can prove me wrong, but I'm getting too many strange runtime issues that I don't know how to deal with. Optimistically, if these are NDK problems I'd like to help get them resolved however I can. BTW, thanks @cferris1000 for the information! |
This is a bug in Boost: https://svn.boost.org/trac10/ticket/13172 that can be fixed with https://gist.github.com/webmaster128/5912a70d100e9ef341df67b177c465d6 cc @rcdailey |
Excellent, thank you for finding and fixing that bug. I hit a wall and
assumed it was an LLVM STL issue, so I gave up and switched back to GNU
STL. I will look forward to the version of boost that includes your fix!
…On Mon, Aug 21, 2017, 1:40 PM Simon Warta ***@***.***> wrote:
Maybe start with the filesystem_error you are receiving from boost? What
is the root cause of it?
This is a bug in Boost: https://svn.boost.org/trac10/ticket/13172 that
can be fixed with
https://gist.github.com/webmaster128/5912a70d100e9ef341df67b177c465d6
cc @rcdailey <https://github.com/rcdailey>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#457 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABr6dluFNJbMVoLnezOmP-YWdcS7X8Ajks5sac8EgaJpZM4OYzR5>
.
|
@rcdailey
Forking from #230 (comment)
Not entirely clear what's going on just yet. It certainly looks like the unwinders are crossing the streams, but that shouldn't be happening with anything built with a modern NDK.
Could you check and make sure that all the
_Unwind_*
symbols in your libraries are hidden?The text was updated successfully, but these errors were encountered: