-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collecting from unknown thread in relation to libuv & getaddrinfo on GNU libc #561
Comments
The reason is that you redirect malloc but don't redirect (intercept) pthread_create everywhere.
Yes.
No! Any unregistered thread should not call malloc or even just manipulate (load/store) pointers returned by malloc. Otherwise bdwgc may reclaim an object which is still in-use. |
Thanks @ivmai. It is indeed right. I added back the dlopen wrapping and it works properly now. I wonder about macos though. Dlopen wrapping isn't supported there IIRC. Hope its libc doesn't do threads for dns or I'm screwed I think |
Funny how it took me this long to run into this problem. I guess libc mostly doesn't use threads so this isn't an issue most of the time... I haven't done enough DNS requests. Anyhow it's fixed for Acton on Linux and I hope this isn't an issue on Macos.. That comment you wrote when you removed the error related to this is quite spot on! 6b73b6e Still curious what's the case on Macos. Do you think it would be possible to adapt the dlopen wrapping code to run on macos too, or is it some more fundamental limitation? |
I don't known (or at least I don't remember anyone reported an issue related). |
I'm getting a "Collecting from unknown thread" abort from bdwgc. It's an application written in the programming language Acton where bdwgc is used for garbage collection in conjunction with libuv for async I/O and I'm primarily linking with GNU libc.
I think getaddrinfo is notoriously difficult to make async, so libuv starts a threadpool to run DNS queries. getaddrinfo in turn is implemented in libc. I've noted that if I link with Musl libc I don't seem to get this problem. Not sure if that's entirely conclusive or if I'm just lucky / unlucky.
I'm a little bit fumbling in the dark here and I'm opening this ticket in case someone (well most likely you Ivan :)) has a little bit of experience and has perhaps run in to something similar.
My current reproduction is about repeatedly running DNS queries, pretty much as fast as possible. The queries are for a non-existent name, so they quickly error (local DNS resolver). After 20-100 queries, the application dies with a "Collecting from unknown thread".
The backtrace looks like this:
I don't recognize what thread this is. I suspect it is something that GNU libc sets up, the files towards the bottom of the stack are for nss_files, nss, libio and with symbols that seem to be part of GNU libc.
I am guessing that Musl libc implements getaddrinfo differently, without a thread like that, which is why I'm not running into the same problem. It could also be that the linking with musl is done statically. I do link time redirection of malloc & friends, so does this mean I redirect musls malloc too? I think not based on the order in which we link them, plus malloc is part of libc itself, right?
There are many threads in total, most of which are the worker threads of the Acton Run Time System, some auxillary threads, the GC threads (I'm running the parallel collector) and some for the libuv threadpool. I believe that collecting is OK from all the normal threads, i.e. I correctly hook the thread creation to get the GC wrapping in place, but it's quite possible that we miss a GNU libc created thread, if that's what we are seeing here.
I'm puzzled - I can't be the only one using bdwgc in combination with DNS requests on gnu libc, right? Why is noone else running into this?
I wonder, would it be possible to simply cancel a GC collection run in case we notice the thread is an unknown thread. It feels wrong, my intuition tells me all threads need to be paused to avoid freeing something that is actively being used by some other thread, but perhaps GNU libc is carefully enough written, with a clever enough interface and any internal threads are isolated enough that it could indeed work?
Any pointers? :)
The text was updated successfully, but these errors were encountered: