Make symbols visible to the object itself from threads created by INIT functions #1067

wkozaczuk · 2019-12-17T13:14:24Z

The original patch restricted access to symbols in a specific object until the object is fully loaded and initialized which includes running the INIT functions. It accomplished it by introducing simple visibility attribute that tracks which thread if any or all is supposed to be able to look up symbols in a given object.

Another patch which delayed calling INIT functions for application objects and its dependencies to a new thread so that these functions could access proper TLS variables if needed.

It turns out that sometimes the INIT functions may start new threads on their own and then any lookup by name (by elf_resolve_pltgot for example) of the symbols in the same object will fail as the new thread id will NOT match the thread id set when calling elf::setprivate before calling INIT functions. So we somehow need to loosen this restriction and allow lookups if symbols looked by the object itself (self-lookup). It could be accomplished in very similar way as we do it for versioning table (see https://groups.google.com/forum/#!msg/osv-dev/d56plMGXi6E/Jm9lgv7xCwAJ).

The text was updated successfully, but these errors were encountered:

nyh · 2019-12-18T12:58:25Z

I'm not convinced the "self lookup" flag will work in this case: If I understood that self-lookup flag trick, you use it when an object is relocating its own symbols, and it knows it is doing that. However, when the init function is being run, it just runs ordinary code and then calls something which needs resolving, and it doesn't really know this is a "self lookup" (unless I'm misunderstanding something).

How about the following idea? It might work, but then again I'm not sure I fully understod the use-case of the init function being run in a different thread:

We can have a per-thread variable saying that this thread is currently running an init function for object X:

__thread object* initializing_object = nullptr;

We set this to the current object when running the init function, and then reset it back to nullptr. We also write code so that when a new thread is created, this variable is copied from its parent (I have a patch for this feature, which I called INHERITED_THREAD_LOCAL_SIMPLE).

Then we change the definition of visible to something like:

bool object::visible(void) const
{
    auto v = _visibility.load(std::memory_order_acquire);
    return (v == nullptr) || (v == sched::thread::current()) || (initializing_object == this);
}

(the new part is just the addition of the initializing_object == this comparison)

I can think of another way that doesn't involve adding INHERITED_THREAD_LOCAL_SIMPLE (which although I have the patches already, isn't trivial):

We could add to OSv the notion of a "parent thread" (should be easy).
Then visible() will be something like:

bool object::visible(sched::thread *me = sched::thread::current()) const
{
    auto v = _visibility.load(std::memory_order_acquire);
    return (v == nullptr) || (v == me) || (me->parent() && visible(me->parent()));
}

The trick here is that if the object is not visible for this thread, we check (recursively) maybe it's visible to its parent thread - until we reach the root whose parent is null. In the usual case, visible() will, just like before, quickly return true. Only in the rare case, while loading an object and attempting to access a yet-hidden symbol, will this recursion use any non-trivial amount of computation.

wkozaczuk · 2019-12-18T17:48:08Z

The "self-lookup" flag is intended to be ON only for the cases where symbol is being looked using its index in the symbol table rather its name. Specifically these cases involve only calls to symbol_module object::symbol(unsigned idx, bool ignore_missing) method which are made by:

bool object::arch_relocate_rela(u32 type, u32 sym, void *addr, Elf64_Sxword addend)
bool object::arch_relocate_jump_slot(u32 sym, void *addr, Elf64_Sxword addend, bool ignore_missing)
void* object::resolve_pltgot(unsigned index)

This flag was originally intended to deal with symbols "deprecated" by versioning table which should be still visible to the object itself (see #1066) for example when the INIT functions are called. I thought same flag could also be used to deal with the "threads/INIT" issue. But I think you are right that it is not quite correct.

I must say I like your INHERITED_THREAD_LOCAL_SIMPLE proposal (could you send a patch?). It would be nice to see how setprivate method looks like. There are still many details I am not sure I know how this would work. Where exactly would initializing_object be set and/or change and reset?

The code around INIT looks like this now:

//
// After loading the object and all its needed objects, run these objects'
// init functions in reverse order (so those of deepest needed object runs
// first) and finally make the loaded objects visible in search order.
auto size = loaded_objects.size();
for (unsigned i = 0; i < size; i++) {
    loaded_objects[i]->setprivate(true);
}
for (int i = size - 1; i >= 0; i--) {
    loaded_objects[i]->run_init_funcs(argc, argv);
}
for (unsigned i = 0; i < size; i++) {
    loaded_objects[i]->setprivate(false);
}
_loaded_objects_stack.pop();

shouldn't the latter part of it change to this:

for (int i = size - 1; i >= 0; i--) {
    loaded_objects[i]->run_init_funcs(argc, argv);
    loaded_objects[i]->setprivate(false);
}

to make the object visible right away after its init functions are called. Imagine an init function in object A that depends on B needs resolving a symbol in B, in same thread or another created by init function? Now I am not sure how this would affect the INHERITED_THREAD_LOCAL_SIMPLE approach.

Now regarding the parent thread idea, I actually had same one but I also thought it would be more complicated and less efficient because of the recursion (what a coincidence :-)). But I just realised that having the concept of parent thread id could be extremely useful for debugging purposes or possibly others (imagine tracking which thread created another one, etc) regardless if we use it to solve this problem. So if you have patch just for that we should apply it.

wkozaczuk · 2019-12-18T19:45:53Z

The problem we are trying to solve manifests itself for example when one tries to run numpy with python on OSv (see these mail group discussions) and it fails to resolve one of the symbols:

./scripts/run.py --api -e "/python3 -c \"import numpy; print(numpy.__version__)\""
OSv v0.53.0-74-gef56fde7
eth0: 192.168.122.15
Booted up in 611.56 ms
/lib/python3.7/numpy/.libs/libopenblasp-r0-2ecf47d5.3.7.dev.so: failed looking up symbol blas_memory_alloc

[backtrace]
0x0000000040356209 <elf::object::symbol(unsigned int, bool)+969>
0x00000000403562cf <elf::object::resolve_pltgot(unsigned int)+127>
0x0000000040356494 <elf_resolve_pltgot+52>
0x000000004039b82f <???+1077524527>
0x000020000077ff6f <???+7864175>
0x0000000000000003 <???+3>

One of these example succeeds if OSv runs with single vCPU. Most likely OpenBLAS tries to create a thread pool in one of the init functions and fails resolving blas_memory_alloc symbol.

wkozaczuk · 2019-12-26T05:32:08Z

Regarding the parent thread approach, I think we can avoid recursion by using a simple loop:

bool object::visible(void) const
{
    auto v = _visibility.load(std::memory_order_acquire);
    if (v == nullptr) {
        return true;
    }
    for (auto t = sched::thread::current(); t != null; t = t->parent()) {
        if (v == t) {
            return true;
        }
    }
    return false;
}

But I think we have a bigger problem: what if parent thread terminates before the child does? Maybe we can deal with it by using some kind of indirection - "proxy" object that both child and parent will point to and parent will deactivate in its destructor:

struct thread_ref {
    thread *the_thread;
    std::atomic<bool> terminated;
}

Making parent keep track of its children and notifying the when destructed seems like a nightmare,

The commit fe37650 to address the issue #1067 relaxed the elf visibility rules while being loaded to child threads so that they can resolve symbols if any threads are created by the INIT functions. Unfortunately inadvertently that patch allowed access to elf ebjects too early in some cases and would lead to a crash. Here is a concrete scenario when this would happen with the 'cli' app: 1. Thread A resolves objects of the httpserver app. 2. Thread A starts the httpserver app on new thread B. 3. Thread A continues to load objects of the cli app. 4. Thread B tries to resolve some symbols through elf_resolve_pltgot() for example and ends up attempting to look up symbols of cli object that is just being loaded by thread A and is not "ready" yet. However because thread B is a child of A it can do so. Clearly the example above illustrates a hole in the original patch. This patch fills this hole by restricting back the access to the symbols of the object being loaded to the thread doing it only as it used to be be. And it only relaxes the access to child threads for the brief time while object is being initialized. It does so by introducing the elf access visibility levels: - Public - ThreadOnly - ThreadAndItsChildren The Public and ThreadOnly and equivalent to the old - nullptr, non-nullptr visibility attribute. The ThreadAndItsChildren is only set to for the period the INIT functions are being called. References #1067 Signed-off-by: Waldemar Kozaczuk <jwkozaczuk@gmail.com> Message-Id: <20200111180020.22335-1-jwkozaczuk@gmail.com>

wkozaczuk closed this as completed in fe37650 Jan 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make symbols visible to the object itself from threads created by INIT functions #1067

Make symbols visible to the object itself from threads created by INIT functions #1067

wkozaczuk commented Dec 17, 2019

nyh commented Dec 18, 2019 •

edited

Loading

wkozaczuk commented Dec 18, 2019 •

edited

Loading

wkozaczuk commented Dec 18, 2019 •

edited

Loading

wkozaczuk commented Dec 26, 2019 •

edited

Loading

Make symbols visible to the object itself from threads created by INIT functions #1067

Make symbols visible to the object itself from threads created by INIT functions #1067

Comments

wkozaczuk commented Dec 17, 2019

nyh commented Dec 18, 2019 • edited Loading

wkozaczuk commented Dec 18, 2019 • edited Loading

wkozaczuk commented Dec 18, 2019 • edited Loading

wkozaczuk commented Dec 26, 2019 • edited Loading

nyh commented Dec 18, 2019 •

edited

Loading

wkozaczuk commented Dec 18, 2019 •

edited

Loading

wkozaczuk commented Dec 18, 2019 •

edited

Loading

wkozaczuk commented Dec 26, 2019 •

edited

Loading