Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Threading support for Corrade::PluginManager? #65

Closed
xqms opened this issue Jun 12, 2019 · 17 comments
Closed

Multi-Threading support for Corrade::PluginManager? #65

xqms opened this issue Jun 12, 2019 · 17 comments

Comments

@xqms
Copy link
Contributor

xqms commented Jun 12, 2019

Use case: I want to load textures using the Magnum::Trade subsystem in a background thread. Basically, I have to load one texture per rendered frame, that's why I would like to minimize/hide I/O and decoding time.

Problem: I can't find a clean solution. Separate Corrade::PluginManager::Manager instances do not work, since they act upon a global, static GlobalPluginStorage without locking. The handy AnyImageImporter plugin even loads other plugins at runtime, which further complicates things.

Do you think it might be feasible to either introduce some locking on the global plugin storage or to completely decouple the Manager instances, i.e. remove any global state?

At least under Linux dlopen() and dlclose() seem to be robust against multiple calls, even from separate threads as they do locking and refcounting internally. I haven't looked at other platforms yet...

I can also understand if you say this is out of scope for Corrade. Probably I'd have to use a separate image loading library then...

@xqms
Copy link
Contributor Author

xqms commented Jun 12, 2019

... my current workaround: duplicate the logic from AnyImageImporter to figure out the specific importer based on the filename, and instantiate it in the main thread. Then call openFile() and image2D() in the worker thread.

(The duplication is required since AnyImageImporter::openFile() is already costly - it reads the complete file for most file types.)

Not ideal, but works for me.

@mosra
Copy link
Owner

mosra commented Jun 17, 2019

The global storage is needed for static plugins -- at app startup these need to be "registered" so any plugin manager instance can see them. This is done before main() is entered, so that's definitely single-threaded and thus no locking needed.

After that, there's definitely a bunch of problems -- when the manager gets instantiated, it "steals" static plugins from the global storage and then adds dynamic plugins to the global storage as well. This is needed in order to properly support inter-plugin dependencies, in particular plugins with different interface can depend on each other and there needs to be some manager responsible for loading them. This part could be done thread-local without problems, I think -- providing that Windows' LoadLibrary() is thread-safe and robust against multiple calls as well.

For "stealing" static plugins from the non-thread-local storage -- these would be all just reads, so I don't see a problem there.

I can also understand if you say this is out of scope for Corrade. Probably I'd have to use a separate image loading library then...

It's definitely not out of scope, no -- and reimplementing all that on your side sounds too much like a nuclear solution to me :D

I'll see what I can do in the next weeks.

@mosra mosra added this to the 2019.0b milestone Jun 17, 2019
@xqms
Copy link
Contributor Author

xqms commented Jun 24, 2019

providing that Windows' LoadLibrary() is thread-safe and robust against multiple calls as well.

LoadLibraryA() explicitly mentions reference counting:
https://docs.microsoft.com/en-us/windows/desktop/api/libloaderapi/nf-libloaderapi-loadlibrarya

I couldn't find an explicit mention that the calls are thread-safe, but there are unofficial statements that they are:
https://stackoverflow.com/a/11254061

@mosra mosra mentioned this issue Aug 12, 2019
60 tasks
@mosra
Copy link
Owner

mosra commented Oct 1, 2019

Soo... I finally had a chance to look at this. One part of this is solved in ddd57b1 and 491ff17 (Aug 31), where the static plugin registration is made allocation-less and thus practically failproof, and the global plugin storage file-local. The static plugin registration data is only being read from after, so there won't be no data races. With that in place, the only remaining thing to do (as far as I thought) was slapping thread_local on the global plugin storage. However, after writing a minimal test that instantiates the manager and loads/uses a plugin in two parallel threads, and running it via TSan, I got ... data races in libdl itself, of all places:

WARNING: ThreadSanitizer: data race (pid=65358)
  Write of size 8 at 0x7b1c00003870 by thread T4:
    #0 free <null> (PluginManagerManagerTest+0x61b30)
    #1 _dl_close_worker <null> (ld-linux-x86-64.so.2+0x155d4)
    #2 Corrade::PluginManager::AbstractManager::unloadInternal(Corrade::PluginManager::AbstractManager::Plugin&) /home/mosra/Code/corrade/build-clang-threadsanitizer/../src/Corrade/PluginManager/AbstractManager.cpp:736:8 (libCorradePluginManagerTestLib-d.so+0x1204f)

  Previous write of size 8 at 0x7b1c00003870 by thread T3:
    #0 malloc <null> (PluginManagerManagerTest+0x58bcf)
    #1 _dl_new_object <null> (ld-linux-x86-64.so.2+0xcac7)
    #2 Corrade::PluginManager::AbstractManager::loadInternal(Corrade::PluginManager::AbstractManager::Plugin&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /home/mosra/Code/corrade/build-clang-threadsanitizer/../src/Corrade/PluginManager/AbstractManager.cpp:567:20 (libCorradePluginManagerTestLib-d.so+0x15262)

Looking at man pages, dlopen(), dlclose(), dlsym() is all marked as thread-safe, so what the heck is happening? Any idea? :)

@xqms
Copy link
Contributor Author

xqms commented Oct 2, 2019

I found some people reporting thread safety problems in glibc with dl* functions when they did not link with -lpthread. Somehow glibc detects whether pthread is linked and has different behavior if that is not the case.

For example: http://lua-users.org/lists/lua-l/2015-04/msg00010.html

This looks like an interesting problem. I'd have time on the weekend to look at it - if you don't solve it sooner ;-) Could you share your test case?

@xqms
Copy link
Contributor Author

xqms commented Oct 2, 2019

One further note about the pthread theory: since you are already testing with multiple threads (probably using std::thread) I guess that pthread is already linked in. This is probably not the issue here.

@mosra
Copy link
Owner

mosra commented Oct 2, 2019

Here's the whole thing: https://github.com/mosra/corrade/compare/pluginmanager-threadlocal . I'm kinda inclined to merge it as-is, unless you come up with a better solution :) CI passes because I added libdl to TSan suppressions ... but since that's the only place where it's actually testing thread safety it's not much different from not using TSan at all 😅

Additionally -- since you have a real use case for this, could you give this branch a spin in your project (with manager instances being used in multiple threads)? I think this is all you need, but ... one never knows for sure :)

@xqms
Copy link
Contributor Author

xqms commented Oct 5, 2019

At first glance, dlopen() and dlclose() both lock a (recursive) mutex during operation, so there should be no race 😕

https://github.com/bminor/glibc/blob/5a82c74822d3272df2f5929133680478c0cfb4bd/elf/dl-open.c#L541
https://github.com/bminor/glibc/blob/5a82c74822d3272df2f5929133680478c0cfb4bd/elf/dl-close.c#L812

As expected, the logic is quite complex, so I'll have to dig deeper.

@xqms
Copy link
Contributor Author

xqms commented Oct 5, 2019

Ok, I think there is no problem in dlopen()/dlclose(). The thread sanitizer simply cannot detect synchronization primitives in uninstrumented code:

https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual#non-instrumented-code

So basically you'd have to compile glibc with -fsanitize=thread to get rid of the detections. I guess the suppression list is the more practical solution.

I'll test your branch in my setup next.

@xqms
Copy link
Contributor Author

xqms commented Oct 5, 2019

Everything seems to work from in my setup. I also tested with TSAN and it didn't find any issues.

Quick related question: I'm currently assuming that the Trade::AbstractImporter instance has to be retained for the entire lifetime of Trade::ImageData - and that it's not safe to destroy the ImageData instance in a thread different from the loading one, because there may be a custom deleter involved.
While your branch allows me to create and delete the importers in the worker threads, which is good, I have resorted to copying the image data to Magnum::Image2D instances to transfer them to the main thread. Probably there is some way to have the worker threads delete the importer and ImageData objects after the main thread is done using them (i.e. has uploaded the data to GPU). Do you see another solution?

Probably preprocessing my image dataset and converting everything to some GPU-friendly image format would be the best solution, but I was hoping to avoid that ;-)

@williamjcm
Copy link
Contributor

While your branch allows me to create and delete the importers in the worker threads, which is good, I have resorted to copying the image data to Magnum::Image2D instances to transfer them to the main thread. Probably there is some way to have the worker threads delete the importer and ImageData objects after the main thread is done using them (i.e. has uploaded the data to GPU). Do you see another solution?

Maybe you could use Magnum::ResourceManager and a custom loader derived from Magnum::AbstractResourceLoader for that.

@xqms
Copy link
Contributor Author

xqms commented Oct 6, 2019

@williamjcm Thanks for the suggestion - could you explain what Magnum::ResourceManager would offer in that case? My main problem is that I have ImageData objects in the loader thread, which I want to use in the main thread, but cannot safely destroy them there. Same applies to the Trade::AbstractImporter instance - it needs to be destroyed in the loader thread because it can influence that thread's plugin manager.

So probably I would need some sort of reference counting system where the main thread can signal that it is done with an image - and the loader thread can destroy the ImageData and Importer instances. I guess there is no other way.

[Also, the image dataset is way too big to be able to hold everything in memory, so I need to load/use/unload it file-by-file, as fast as possible.]

Anyway, my question probably belongs somewhere else than in this issue ;-)

@williamjcm
Copy link
Contributor

williamjcm commented Oct 7, 2019

Do you do any processing on the images before uploading them to the GPU ? If not, that's how the workflow would basically look like:

  1. In the main thread, tell the ResourceManager to get you the texture you want.
  2. ResourceManager will and use the loader you registered to, well, load the texture.
  3. The loader creates a thread that opens the required file (you'll have to do some key->filename mapping somewhere, though) and uploads the image to the GPU.
  4. Use the Resource you get from the manager to access your texture.

However, I can't give code examples, because I work on proprietary software, and even if I could share code, I only load stuff synchronously.

So, the best advice I can give is to check the docs for ResourceManager and AbstractResourceLoader. 😅

@mosra
Copy link
Owner

mosra commented Oct 7, 2019

So basically you'd have to compile glibc with -fsanitize=thread to get rid of the detections. I guess the suppression list is the more practical solution.

Right, that was my suspicion too. Yes, I think the suppression list is more practical. Will add an explanatory comment there and put this to master.

I'm currently assuming that the Trade::AbstractImporter instance has to be retained for the entire lifetime of Trade::ImageData - and that it's not safe to destroy the ImageData instance in a thread different from the loading one, because there may be a custom deleter involved.

This is explicitly disallowed -- not so much because some custom deleter could be non-thread-safe, but rather because unloading a plugin would mean the deleter is a dangling pointer (dangling function pointers? fun!). It's only documented for plugin implementers right now, but I think it makes sense to document that for plugin users (and add sanity asserts to all the image2D() / ... functions). Will do that now.

Probably preprocessing my image dataset and converting everything to some GPU-friendly image format would be the best solution, but I was hoping to avoid that ;-)

In short, transferring ownership of the ImageData2D across threads should be perfectly safe, and there's no (and there will never be any) relation/dependency between the imported data and the plugin module/dll or the importer instance, unless explicitly documented (e.g. importerState()) -- but even then deleting the importer shouldn't cause bad things to happen to imported data.

In a not-so-distant future I'll be looking back at zero-copy importers (mosra/magnum#240), those will add a dependency between a memory blob the assets are imported from and the Trade::*Data instances, but again not between the data instances and importer instance/plugin.

@xqms
Copy link
Contributor Author

xqms commented Oct 7, 2019

there's no (and there will never be any) relation/dependency between the imported data and the plugin module/dll or the importer instance

Thanks @mosra, that clears everything up and makes my design much simpler 👍
I think I just saw the importerState() thing and assumed that there is some dependency.

@mosra
Copy link
Owner

mosra commented Oct 8, 2019

This is now in master as d6e0ef0, and on magnum side the data dependencies were clarified (and deleters checked) in mosra/magnum@84fc685.

I guess that's all to be done here, right? :)

@xqms
Copy link
Contributor Author

xqms commented Oct 8, 2019

I guess that's all to be done here, right? :)

From my side yes ;) Thanks for following up on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

3 participants