LKL does not support clone #155

davidchisnall · 2020-05-04T09:37:38Z

To fix the layering, we need to return to musl creating threads via the clone system call. Currently, LKL does not implement clone at all.

We need to provide an implementation that handles a the flags required for pthread_create. The correct change in LKL may simply be to provide a host_ops hook that handles clone entirely in the LKL consumer. The musl implementation of pthread_create depends on the following clone flags:

CLONE_VM: Share address space with the parent. In a single address space world, we cannot support anything other than this.
CLONE_FS: Share a filesystem namespace with the parent. In a single-process world, this is the obvious thing to do.
CLONE_FILES: Share a file descriptor table with the parent. We probably want to support not having this so that our init process can have a separate FD table.
CLONE_SIGHAND. Share signal handlers. We probably want to support not having this so that our init process can have separate signal handlers.
CLONE_THREAD. New thread shares PID and has a separate TGID from the parent. It would be nice to support both variations of this so that we can have a distinct PID for init.
CLONE_SYSVSEM. Shares ownership of SysV semaphores with the parent. It doesn't matter too much if we support this because our init process shouldn't use SysV IPC.
CLONE_SETTLS. Sets the TLS pointer. Should simply set the %fs base value.
CLONE_PARENT_SETTID. Sets the child thread ID in the parent. Should be easy to support.
CLONE_CHILD_CLEARTID. Uses the return thread ID as a futex. See Intercept clone to handle CLONE_CHILD_CLEARTID #154.
CLONE_DETACHED. Has no effect in modern Linux, safe to ignore.

The text was updated successfully, but these errors were encountered:

prp · 2020-05-05T07:56:14Z

Note https://github.com/lkl/linux/blob/58dc2025bf469d880d76250e682dd8e4ed225a6b/arch/lkl/kernel/syscalls.c#L64. kernel_thread() uses _do_fork() for the actual heavy lifting: https://github.com/lsds/lkl/blob/a0eb9abd8af92d1aa34bc1e24dfbd1ba0bd6a56c/kernel/fork.c#L2458, which is the same as clone().

davidchisnall · 2020-05-05T08:09:13Z

Thanks. It looks as if setting __ARCH_WANT_SYS_CLONE gives us a clone implementation, we can then intercept it and validate the arguments returning EINVAL if CLONE_VM is not set.

davidchisnall · 2020-05-05T09:13:27Z

With that in mind, I believe that we should:

Tweak LKL to expose clone
Extend lthreads to have a futex value for lthread exit.
Add a clone syscall wrapper that:
- Checks if CLONE_VM is set, returns failure if not.
- Checks CLONE_CHILD_CLEARTID. If it's set then sets the corresponding flag in the spawned lthread (probably clearing the flag)

This should also make it possible for lthreads to be aware of their Linux tid.

davidchisnall · 2020-05-06T13:07:18Z

I tried enabling the clone system call. Unfortunately, it then crashes in __alloc_pages_nodemask, I believe because the memory that we're passing to clone is memory that LKL doesn't think that it owns. I think that means that we need to fix #187 before we can do this.

prp · 2020-05-06T13:17:35Z

I tried enabling the clone system call. Unfortunately, it then crashes in __alloc_pages_nodemask, I believe because the memory that we're passing to clone is memory that LKL doesn't think that it owns. I think that means that we need to fix #187 before we can do this.

@davidchisnall where exactly does it crash? Is it due to an access check?

davidchisnall · 2020-05-06T14:37:36Z

Oh, never mind, my clone userspace wrapper was misaligning the stack.

davidchisnall · 2020-05-06T15:36:26Z

Okay, it looks as if it ends up in LKL's copy_thread function. This creates a new lthread (yay!) but then jumps to the new thread's stack pointer, rather than returning.

That works correctly for PF_KTHREAD threads, it breaks for userspace threads. We can probably handle userspace threads by doing a __switch_to in the new thread, though we have to be careful that the cloned threads aren't aliasing kernel threads accidentally.

prp · 2020-05-07T06:58:04Z

@davidchisnall, the problem that you will now face is that the kernel scheduler will want to run these (now kernel-visible) user-level threads. This is inconsistent with the lthread scheduler being in control of their scheduling. You could modify the kernel threads that represent user-level threads to return immediately, but then you will still have the overhead of LKL doing many spurious context-switches.

davidchisnall · 2020-05-07T08:47:48Z

How does this work for kernel threads? These are already created via the same code path and we have a lot of them.

prp · 2020-05-07T09:06:35Z

When a userspace lthread does a system call, it assumes the identity of a unique host task (kernel thread) inside the kernel. After the system call has been executed, we have the LKL scheduler context-switch to all pending kernel tasks before returning to userspace.

IIRC the host tasks that represent the userspace lthreads are never selected by the kernel scheduler for execution. Perhaps it will be enough if you simply create the lthread/host task mapping at clone time (and not when the first system call is invoked).

davidchisnall · 2020-05-07T09:56:08Z

I think I am still a bit confused. When a new kernel thread is created, it goes into the copy_thread function in the LKL arch, which then spawns a new lthread. These are switched to with the kernel's __switch_to routine, but are they also run concurrently with the lthread scheduler? If not, how does the lthread scheduler know not to run these threads?

davidchisnall · 2020-05-07T09:58:24Z

Also, which direction of mapping are you talking about? The Linux task structure contains the lthread ID of the thread, which is set when the lthread is created in the LKL arch for any lthread created via the clone_thread call (currently only kernel threads, if we support clone then also userspace threads). Is there also a mapping from lthread to Linux TID or is the mapping unidirectional?

prp · 2020-05-07T10:59:25Z

The kernel scheduler's __switch_to routine uses a host_ops semaphore associated with each lthread to signal to the lthread scheduler what can run. Only a single kernel-visible lthread will be unblocked at a time because LKL needs complete control over concurrency.

LKL retrieves the task_struct from the lthread's TLS.

davidchisnall · 2020-05-07T16:14:27Z

Thanks, that makes sense. To check I understand how this all fits together:

LKL has a notion of a 'host task', which is a task that is externally scheduled, but still has LKL state associated with it.

When an lthread that was not created by LKL calls lkl_syscall, it allocates a new host task. The kernel scheduler is aware of this thread, but it's not ever passed back to the scheduler, so it's never run?

Because LKL has a single process model, all of these threads are assumed to be threads of the init process (their task is looked up by pid [thread ID] in the init pid namespace [process]).

When a thread returns from a syscall, the thread_sched_jb function unlocks

I think we should be able to create cloned threads in almost the same way. Does this sound like a sensible plan?

Add another hook to the host ops structure that creates a new host thread given a PC and stack pointer.
Modify copy_thread to use this for new threads that don't have PF_KTHREAD set.
Explicitly set the task structure for the lthread on lthread creation.
Mark the newly created threads as TIF_HOST_THREAD so that LKL regards them as threads that it is not responsible for scheduling and that are allowed to make system calls.

As far as I can tell, lkl_syscall doesn't currently store the return address anywhere, so we'd need to add that into the thread state (__builtin_return_address() would be sufficient, we wouldn't need anything else, a new thread only needs %rip and %rsp set and can then continue). In lthreads, we'd construct the new lthread and initialise the esp and eip values in its cpu_ctx so that the next _switch to it will jump to the new thread. I believe lthread_exit should still work correctly. The userspace code that wraps clone does an exit system call if the function that is passed to it returns, which will trigger LKL to exit the thread (calling lthread_exit), so we won't pop off the top of the stack.

Does that make sense, or have I missed anything important?

prp · 2020-05-07T18:19:07Z

Yes, that makes sense to me.

One minor thing is that our threads don't share the parent pid of init, but rather have a different host parent task, otherwise the kernel wouldn't deliver certain signals to pid=1.

davidchisnall · 2020-05-13T09:19:51Z

I have started working on this in the wip-clone branch. Current status:

Clone system call exists.
Calling it creates a new lthread.
The new lthread begins executing on the provided stack, with the PC set to the return address of the syscall caller.
The Linux task_struct created for the clone call is associated with the new thread.
Normal syscall return happens in the original lthread
The first time either lthread does a syscall, both threads deadlock.

It seems to be nearly there, but I have not yet been able to diagnose the call of the deadlock. In the test case, the timer thread is firing and delivering ticks, but nothing else happens after the first attempt at a syscall. With tracing enabled, we see this with 8 ethreads:

[[    LKL   ]] lkl_syscall(): calling run_syscall() (no=220 task=host2 current=host2)
[[    LKL   ]] alloc_thread_stack_node(): enter (task= node=-1)
[[    LKL   ]] init_ti(): enter
[[    LKL   ]] setup_thread_stack(): enter
[[    LKL   ]] copy_thread(): enter
[[    LKL   ]] lkl_syscall(): returned from run_syscall() (no=220 task=host2 current=host2)
[[    LKL   ]] lkl_syscall(): enter (no=64 current=host2 host0->TIF host0->TIF_SIGPENDING=1)
[[    LKL   ]] do_signal(): enter
[[    LKL   ]] __switch_to(): host2=>ksoftirqd/0
[[    LKL   ]] __switch_to(): ksoftirqd/0=>host2
[[    LKL   ]] lkl_syscall(): done (no=220 task=host2 current=ksoftirqd/0 ret=43)
[[    LKL   ]] lkl_syscall(): enter (no=66 current=host2 host0->TIF host0->TIF_SIGPENDING=1)

With 1 ethread, everything is serialised and we see this:

[[    LKL   ]] lkl_syscall(): enter (no=220 current=host2 host0->TIF host0->TIF_SIGPENDING=1)
[[    LKL   ]] lkl_syscall(): CPU lock acquired
[[    LKL   ]] lkl_syscall(): switching to host task (no=220 task=host2 current=host2)
[[    LKL   ]] switch_to_host_task(): enter (task=host2 current=host2 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] lkl_syscall(): calling run_syscall() (no=220 task=host2 current=host2)
[[    LKL   ]] alloc_thread_stack_node(): enter (task= node=-1)
[[    LKL   ]] init_ti(): enter
[[    LKL   ]] setup_thread_stack(): enter
[[    LKL   ]] copy_thread(): enter
[[    LKL   ]] lkl_syscall(): returned from run_syscall() (no=220 task=host2 current=host2)
[[    LKL   ]] do_signal(): enter
[[    LKL   ]] __switch_to(): host2=>ksoftirqd/0
[[    LKL   ]] lkl_syscall(): done (no=220 task=host2 current=ksoftirqd/0 ret=43)
[[    LKL   ]] lkl_syscall(): enter (no=66 current=ksoftirqd/0 host0->TIF host0->TIF_SIGPENDING=1)
[[    LKL   ]] lkl_syscall(): enter (no=64 current=ksoftirqd/0 host0->TIF host0->TIF_SIGPENDING=1)
[[    LKL   ]] __switch_to(): ksoftirqd/0=>host2

It appears as if one lthread enters lkl_syscall, yields, the other lthread enters lkl_syscall, and also yields. Neither lthread is then ever rescheduled. Both are likely blocking on the same futex, but it's not yet clear which one or why.

davidchisnall · 2020-05-13T11:53:41Z

@prp, do you know how LKL's current works? I wasn't yet able to chase all of the macros. I suspect that, when we copy the TLS from the parent thread, we may be copying accidentally the currently running task, so LKL gets confused on syscall entry.

davidchisnall · 2020-05-13T12:27:22Z

Looking at the preprocessed source for syscalls.c, it appears that current is current_thread_info()->task and current_thread_info() just returns _current_thread_info, so this won't be affected by anything here.

Fixes #155

This implements two new LKL hooks. The first one to create an lthread with a specific initial register state (to capture the returns-twice behaviour of clone, along with the caller's ability to define the stack and TLS addresses). The new thread is immediately associated with the Linux task structure (normally, lthreads are associated with Linux tasks lazily when they perform a system call). The second hook destroys a thread. This is done in response to an exit system call. This is somewhat complicated, because LKL never returns to this thread and the thread's stack may be deallocated by the time we exit it. The lthread scheduler does not have an easy way of adding a mechanism to kill a thread without that thread running. We can add one eventually, but for now create a temporary stack that lthreads can use during teardown and make them run the teardown from there. Disable access02 test. It is spuriously passing and this makes it fail. See #277 for more information. Fixes #155

davidchisnall · 2020-05-26T09:25:59Z

This was fixed in #259.

davidchisnall added enhancement area: sgx-lkl Core SGX-LKL functionality labels May 4, 2020

davidchisnall mentioned this issue May 4, 2020

Separate userspace TLS from lthreads #156

Closed

6 tasks

davidchisnall assigned SeanTAllen May 4, 2020

prp added the p1 Medium priority label May 5, 2020

prp added this to the Milestone 1 milestone May 5, 2020

davidchisnall assigned davidchisnall and unassigned SeanTAllen May 7, 2020

davidchisnall closed this as completed May 13, 2020

davidchisnall reopened this May 13, 2020

SeanTAllen mentioned this issue May 14, 2020

Remove libc dependencies from src/enclave #151

Open

25 tasks

davidchisnall added a commit that referenced this issue May 15, 2020

Working clone support.

fdef760

Fixes #155

prp mentioned this issue May 15, 2020

[Tests] basic/pthread_join test gets stuck #198

Closed

davidchisnall mentioned this issue May 18, 2020

Support the clone system call. #259

Merged

davidchisnall closed this as completed May 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LKL does not support clone #155

LKL does not support clone #155

davidchisnall commented May 4, 2020 •

edited

Loading

prp commented May 5, 2020

davidchisnall commented May 5, 2020

davidchisnall commented May 5, 2020 •

edited

Loading

davidchisnall commented May 6, 2020

prp commented May 6, 2020

davidchisnall commented May 6, 2020

davidchisnall commented May 6, 2020

prp commented May 7, 2020

davidchisnall commented May 7, 2020

prp commented May 7, 2020

davidchisnall commented May 7, 2020

davidchisnall commented May 7, 2020

prp commented May 7, 2020

davidchisnall commented May 7, 2020 •

edited

Loading

prp commented May 7, 2020

davidchisnall commented May 13, 2020 •

edited

Loading

davidchisnall commented May 13, 2020

davidchisnall commented May 13, 2020

davidchisnall commented May 26, 2020

LKL does not support clone #155

LKL does not support clone #155

Comments

davidchisnall commented May 4, 2020 • edited Loading

prp commented May 5, 2020

davidchisnall commented May 5, 2020

davidchisnall commented May 5, 2020 • edited Loading

davidchisnall commented May 6, 2020

prp commented May 6, 2020

davidchisnall commented May 6, 2020

davidchisnall commented May 6, 2020

prp commented May 7, 2020

davidchisnall commented May 7, 2020

prp commented May 7, 2020

davidchisnall commented May 7, 2020

davidchisnall commented May 7, 2020

prp commented May 7, 2020

davidchisnall commented May 7, 2020 • edited Loading

prp commented May 7, 2020

davidchisnall commented May 13, 2020 • edited Loading

davidchisnall commented May 13, 2020

davidchisnall commented May 13, 2020

davidchisnall commented May 26, 2020

davidchisnall commented May 4, 2020 •

edited

Loading

davidchisnall commented May 5, 2020 •

edited

Loading

davidchisnall commented May 7, 2020 •

edited

Loading

davidchisnall commented May 13, 2020 •

edited

Loading