Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threads created via clone() intermittently cause application to hang #686

Closed
vtikoo opened this issue Jul 29, 2020 · 8 comments · Fixed by lsds/lkl#5
Closed

Threads created via clone() intermittently cause application to hang #686

vtikoo opened this issue Jul 29, 2020 · 8 comments · Fixed by lsds/lkl#5
Assignees
Labels
p0 Blocking priority

Comments

@vtikoo
Copy link
Contributor

vtikoo commented Jul 29, 2020

Test setup:

  • Thread is created via clone() and the main thread waits on the child tid's futex. Repeat 10k times. 843c8d4
int parallelthr(void* arg)
{
	int odd = (int)(intptr_t)arg;
	return 0;
}

int main(int argc, char** argv)
{
	unsigned flags = CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND
		| CLONE_THREAD | CLONE_SYSVSEM | CLONE_SETTLS | CLONE_CHILD_SETTID
		| CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID | CLONE_DETACHED;
	int i, runs = 10000;

	pid_t ptid;
	pid_t ctid_futex1, ctid_futex2;
	for (i = 0; i < runs; i++) {
		pid_t ctid1 = clone(parallelthr, child_stack_end1, 
							flags, (void*)0, &ptid, &child_tls1,
							&ctid_futex1);
		futex_wait(&ctid_futex1, ctid1);
	}

	return 0;
}
  • H/W: SGXLKL_ETHREADS= 8 on a 2-core machine

Observed behavior:

Sometimes the test will hang after a few thousand iterations.

Annotated LKL trace logs from a successful vs unsuccessful run of a single iteration:

Successful run:

t0: clone() lkl_syscall enter
[[    LKL   ]] lkl_syscall(): enter (no=220 current=host1 host0->TIF host0->TIF_SIGPENDING=1)
[[    LKL   ]] lkl_syscall(): switching to host task (no=220 task=host1 current=host1)
[[    LKL   ]] switch_to_host_task(): enter (task=host1 current=host1 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] lkl_syscall(): calling run_syscall() (no=220 task=host1 current=host1)

t1: thread creation
[[    LKL   ]] alloc_thread_stack_node(): enter (task=host_clone3311 node=-1)
[[    LKL   ]] init_ti(): enter
[[    LKL   ]] setup_thread_stack(): enter
[[    LKL   ]] copy_thread_tls(): enter
[[    LKL   ]] lkl_syscall(): returned from run_syscall() (no=220 task=host1 current=host1)

t2: cloned thread calls exit()
[[    LKL   ]] lkl_syscall(): enter (no=93 current=host1 host0->TIF host0->TIF_SIGPENDING=1) 
[[    LKL   ]] switch_to_host_task(): enter (task=host_clone3312 current=host1 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] __switch_to(): host1=>host_clone3312
[[    LKL   ]] switch_to_host_task(): calling sem_down (task=host_clone3312 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] switch_to_host_task(): calling schedule_tail (task=host_clone3312 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0 abs_prev=host1)
[[    LKL   ]] switch_to_host_task(): done (task=host_clone3312 current=host_clone3312 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] switch_to_host_task(): enter (task=host1 current=host_clone3312 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] __switch_to(): host_clone3312=>host1
[[    LKL   ]] switch_to_host_task(): calling sem_down (task=host1 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] switch_to_host_task(): calling schedule_tail (task=host1 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0 abs_prev=host_clone3312)
[[    LKL   ]] switch_to_host_task(): done (task=host1 current=host1 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] do_signal(): enter

t3: clone() syscall returns
[[    LKL   ]] lkl_syscall(): done (no=220 task=host1 current=host1 ret=3355)

t4: futex wait from main thread
[[    LKL   ]] lkl_syscall(): enter (no=98 current=host1 host0->TIF host0->TIF_SIGPENDING=1)
[[    LKL   ]] lkl_syscall(): switching to host task (no=93 task=host_clone3312 current=host1)
[[    LKL   ]] switch_to_host_task(): enter (task=host_clone3312 current=host1 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] __switch_to(): host1=>host_clone3312
[[    LKL   ]] switch_to_host_task(): calling sem_down (task=host_clone3312 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] switch_to_host_task(): calling schedule_tail (task=host_clone3312 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0 abs_prev=host1)
[[    LKL   ]] switch_to_host_task(): done (task=host_clone3312 current=host_clone3312 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)

t5: cloned thread cleanup
[[    LKL   ]] lkl_syscall(): calling run_syscall() (no=93 task=host_clone3312 current=host_clone3312)
[[    LKL   ]] __switch_to(): host_clone3312=>ksoftirqd/0
[[    LKL   ]] free_thread_stack(): enter (task=host_clone3312 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0 ti=0x7fe0402af800 current=ksoftirqd/0) <-- clone thread cleanup
[[    LKL   ]] kill_thread(): enter (task=host_clone3312 task->state=128 task->flags=32844ti=0x7fe0402af800 ti->flags=1282 ti->TIF_NO_TERMINATION=0 )

t6: context switches back to main thread(host1 task) 
[[    LKL   ]] __switch_to(): ksoftirqd/0=>swapper
[[    LKL   ]] __switch_to(): swapper=>idle_host_task
[[    LKL   ]] lkl_syscall(): switching to host task (no=98 task=host1 current=idle_host_task)
[[    LKL   ]] switch_to_host_task(): enter (task=host1 current=idle_host_task task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] __switch_to(): idle_host_task=>host1
[[    LKL   ]] switch_to_host_task(): calling sem_down (task=host1 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] switch_to_host_task(): calling schedule_tail (task=host1 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0 abs_prev=idle_host_task)
[[    LKL   ]] switch_to_host_task(): done (task=host1 current=host1 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] lkl_syscall(): calling run_syscall() (no=98 task=host1 current=host1)
[[    LKL   ]] lkl_syscall(): returned from run_syscall() (no=98 task=host1 current=host1)
[[    LKL   ]] do_signal(): enter

t7: futex wait returns with -EWOULDBLOCK(i.e no need to wait, as value pointed by futex has changed)
[[    LKL   ]] lkl_syscall(): done (no=98 task=host1 current=host1 ret=-11)

Unsuccessful (causes program execution to hang) -

t0: clone() lkl_syscall enter
[[    LKL   ]] lkl_syscall(): enter (no=220 current=host1 host0->TIF host0->TIF_SIGPENDING=1)
[[    LKL   ]] lkl_syscall(): switching to host task (no=220 task=host1 current=host1)
[[    LKL   ]] switch_to_host_task(): enter (task=host1 current=host1 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] lkl_syscall(): calling run_syscall() (no=220 task=host1 current=host1)

t1: thread creation
[[    LKL   ]] alloc_thread_stack_node(): enter (task=host_clone3872 node=-1)
[[    LKL   ]] init_ti(): enter
[[    LKL   ]] setup_thread_stack(): enter
[[    LKL   ]] copy_thread_tls(): enter

t2: cloned thread calls exit()
[[    LKL   ]] lkl_syscall(): enter (no=93 current=host1 host0->TIF host0->TIF_SIGPENDING=1)
[[    LKL   ]] lkl_syscall(): returned from run_syscall() (no=220 task=host1 current=host1)
[[    LKL   ]] switch_to_host_task(): enter (task=host_clone3873 current=host1 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] __switch_to(): host1=>host_clone3873
[[    LKL   ]] switch_to_host_task(): calling sem_down (task=host_clone3873 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] switch_to_host_task(): calling schedule_tail (task=host_clone3873 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0 abs_prev=host1)
[[    LKL   ]] switch_to_host_task(): done (task=host_clone3873 current=host_clone3873 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] switch_to_host_task(): enter (task=host1 current=host_clone3873 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] __switch_to(): host_clone3873=>host1
[[    LKL   ]] switch_to_host_task(): calling sem_down (task=host1 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] switch_to_host_task(): calling schedule_tail (task=host1 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0 abs_prev=host_clone3873)
[[    LKL   ]] switch_to_host_task(): done (task=host1 current=host1 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] do_signal(): enter

t3: clone() syscall returns
[[    LKL   ]] lkl_syscall(): done (no=220 task=host1 current=host1 ret=3916)

t4: futex wait from main thread
[[    LKL   ]] lkl_syscall(): enter (no=98 current=host1 host0->TIF host0->TIF_SIGPENDING=1)
[[    LKL   ]] lkl_syscall(): switching to host task (no=93 task=host_clone3873 current=host1)
[[    LKL   ]] switch_to_host_task(): enter (task=host_clone3873 current=host1 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] __switch_to(): host1=>host_clone3873
[[    LKL   ]] switch_to_host_task(): calling sem_down (task=host_clone3873 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] switch_to_host_task(): calling schedule_tail (task=host_clone3873 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0 abs_prev=host1)
[[    LKL   ]] switch_to_host_task(): done (task=host_clone3873 current=host_clone3873 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)

t5: cloned thread cleanup
[[    LKL   ]] lkl_syscall(): calling run_syscall() (no=93 task=host_clone3873 current=host_clone3873)
[[    LKL   ]] __switch_to(): host_clone3873=>ksoftirqd/0
[[    LKL   ]] free_thread_stack(): enter (task=host_clone3873 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0 ti=0x7fe0402af800 current=ksoftirqd/0)
[[    LKL   ]] kill_thread(): enter (task=host_clone3873 task->state=128 task->flags=32844ti=0x7fe0402af800 ti->flags=1282 ti->TIF_NO_TERMINATION=0 )

t6: context switch to swapper and execution hangs ...
[[    LKL   ]] __switch_to(): ksoftirqd/0=>swapper

On the face of it, it looks like the swapper is not able to context switch back to the host task.

cc: @davidchisnall @prp

@davidchisnall
Copy link
Contributor

The cloned thread cleanup bit (t5) should reach this line:

https://github.com/lsds/lkl/blob/86819ac5446e1fa31ca67db68aa165bbf07b90a9/kernel/fork.c#L1307

This should then reach the futex_wake call here:

https://github.com/lsds/lkl/blob/86819ac5446e1fa31ca67db68aa165bbf07b90a9/kernel/futex.c#L1598

Can you try adding some logging to see if the wake is actually happening here:

https://github.com/lsds/lkl/blob/86819ac5446e1fa31ca67db68aa165bbf07b90a9/kernel/futex.c#L1465

I wondered if this was a memory order issue, but it looks as if, for this trace, the futex wait is definitely ordered before the futex wake (the ocalls for the logging are equivalent to strong memory barriers). I can see two possible causes here:

  • The futex wake is sometimes not being called.
  • The futex wake is being called, but the wake_q_add_safe is adding the thread to a wake queue, but the work queue is not being scheduled.

I think you could also test the second case by modifying the test to have add a background test that loops doing ten-second (or longer) sleeps. If the test pauses until the sleeping thread wakes up but then the problem is LKL not running the work queues.

This actually sounds like the right answer: As I recall, LKL runs work queues on syscall return, but the exit syscall never returns. The work queue item that wakes the main thread is therefore never run.

@vtikoo
Copy link
Contributor Author

vtikoo commented Jul 29, 2020

Looking at - [[ LKL ]] lkl_syscall(): calling run_syscall() (no=98 task=host1 current=host1) in the successful trace above, the futex wait syscall is serviced after futex wake.
And in the unsuccessful trace, futex wait has not yet reached the run_syscall stage.

I had added traces to the enqueue and dequeue operations on the futex wait queue, they aren't showing up here cause those code paths were not reached. For the contrived example here where the child thread is doing no work and exiting fast, in most cases there are no processes to wake up during futex wake. And subsequently the futex wait returns early when it checks the value pointed by the futex has changed.

Also fwiw, I've yet to see the hang up happening when futex wake had a process to wake up.

@davidchisnall davidchisnall added the needs-triage Bug does not yet have a priority assigned label Jul 31, 2020
@davidchisnall davidchisnall added p0 Blocking priority and removed needs-triage Bug does not yet have a priority assigned labels Aug 3, 2020
@prp
Copy link
Member

prp commented Aug 4, 2020

@davidchisnall @vtikoo What I observe is that when thread_destroy_host() is called, the thread that is being destroyed (tid=47 in this run) still shows enclave futex calls (see below). A futex_wait() will then add deallocated memory to the futex queue, which eventually leads to a cycle in the queue when the memory is allocated again. This causes the hang when the lthread scheduler traverses the queue.

The following shows the problem. I've added instrumentation and an assert that checks if tid==0, which should never happen. (I presume that at that point, we have deallocated the memory for the lthread struct):

[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b09730 (i=15)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b09350 (i=16)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b09540 (i=17)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b0a0e0 (i=18)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b085c0 (i=19)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b09ef0 (i=20)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b09d00 (i=21)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b09160 (i=22)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b08b90 (i=23)
[[  SGX-LKL ]] futex_wake(): [  47] FUTEX_WAKE with key: 0xb0ac60, woke 0
[[  SGX-LKL ]] futex_wake(): [  47] FUTEX_WAKE with key: 0xb08bf0, num 2147483647
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b06280 (i=0)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b08d80 (i=1)
[[  SGX-LKL ]] futex_wake(): if entered
[[  SGX-LKL ]] futex_wake(): enqueuing lt=7f9000b08c10
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b0a2d0 (i=2)
[[  SGX-LKL ]] futex_wait(): [  13] FUTEX_WAITING woke up
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b07ff0 (i=3)
===> [[  SGX-LKL ]] thread_destroy_host(): destroying lt=0x7f9000b0ac80 tid=47 fq=0x7f9000b0adf0 <===
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b07db0 (i=4)
[[  SGX-LKL ]] thread_destroy_host(): futex_dequeue lt=0x7f9000b0ac80 tid=47 fq=0x7f9000b0adf0
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b0aa10 (i=5)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b09970 (i=6)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b0ac00 (i=7)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b083d0 (i=8)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b0a860 (i=9)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b087b0 (i=10)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b08f70 (i=11)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b0a650 (i=12)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b089a0 (i=13)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b0a4a0 (i=14)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b09730 (i=15)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b09350 (i=16)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b09540 (i=17)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b0a0e0 (i=18)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b085c0 (i=19)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b09ef0 (i=20)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b09d00 (i=21)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b09160 (i=22)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f9000b08b90 (i=23)
[[  SGX-LKL ]] futex_wake(): [  47] FUTEX_WAKE with key: 0xb08bf0, woke 1
[[  SGX-LKL ]] futex_wait(): [  47] FUTEX_WAIT with key: 0xb0ac60, timeout: 0 usec
[[  SGX-LKL ]] __futex_wait_new(): [   47] created new fq=0x7f9000b0adf0 lt=0x7f9000b0ac80
[[  SGX-LKL ]] __futex_wait_new(): lt=0x7f9000b0ac80 tid=0
SGXLKL ASSERTION FAILED: lt->tid (sched/futex.c: 140: __futex_wait_new)
2020-08-04T21:40:10.000000Z [(H)ERROR] tid(0x7fa782263700) | :OE_ENCLAVE_ABORTING [/home/v-pepiet/devel/sgx-lkl/openenclave/host/calls.c:oe_call_enclave_function_by_table_id:91]
[   SGX-LKL  ] ethread (1: 19) [   SGX-LKL  ] FAIL: sgxlkl_ethread_init() failed (id=1 result=19 (OE_ENCLAVE_ABORTING))

@vtikoo
Copy link
Contributor Author

vtikoo commented Aug 4, 2020

@prp David had mentioned a similar issue, where its possible that the LKL scheduler can call sem down on an exited process. That may be the source of the futex wait on deallocated memory.

@prp
Copy link
Member

prp commented Aug 5, 2020

Here is a trace with LKL_DEBUG that shows the sem_down() call after thread_destroy_host():

[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b0a170 (i=22)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b09ba0 (i=23)
[[  SGX-LKL ]] futex_wake(): [  47] FUTEX_WAKE with key: 0xb0bc60, woke 0
[[    LKL   ]] switch_to_host_task(): calling sem_down (task=host_clone0 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[  SGX-LKL ]] sem_down(): [  47] sem_down()
[[    LKL   ]] switch_to_host_task(): calling schedule_tail (task=host_clone0 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0 abs_prev=host1)
[[    LKL   ]] switch_to_host_task(): done (task=host_clone0 current=host_clone0 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0)
[[    LKL   ]] lkl_syscall(): calling run_syscall() (no=93 task=host_clone0 current=host_clone0)
[[    LKL   ]] __switch_to(): host_clone0=>ksoftirqd/0
[[  SGX-LKL ]] sem_up(): [  47] sem_up()
[[  SGX-LKL ]] futex_wake(): [  47] FUTEX_WAKE with key: 0xb09c00, num 2147483647
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b072b0 (i=0)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b09d90 (i=1)
[[  SGX-LKL ]] futex_wake(): if entered
[[  SGX-LKL ]] futex_wake(): enqueuing lt=7f7000b09c20
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b0a930 (i=2)
[[  SGX-LKL ]] futex_wait(): [  13] FUTEX_WAITING woke up
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b09000 (i=3)
[[    LKL   ]] free_thread_stack(): enter (task=host_clone0 task->TIF_HOST_THREAD=1 task->TIF_SIGPENDING=0 ti=0x7f703fe6b800 current=ksoftirqd/0)
[[    LKL   ]] kill_thread(): enter (task=host_clone0 task->state=128 task->flags=32844ti=0x7f703fe6b800 ti->flags=1282 ti->TIF_NO_TERMINATION=0 )
[[  SGX-LKL ]] thread_destroy_host(): destroying lt=0x7f7000b0bc80 tid=47 fq=0x7f7000b0bdf0
[[  SGX-LKL ]] thread_destroy_host(): futex_dequeue lt=0x7f7000b0bc80 tid=47 fq=0x7f7000b0bdf0
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b08de0 (i=4)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b0ba10 (i=5)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b0b140 (i=6)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b0bc00 (i=7)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b093e0 (i=8)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b0b7d0 (i=9)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b097c0 (i=10)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b09f80 (i=11)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b0b5e0 (i=12)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b099b0 (i=13)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b0b330 (i=14)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b0af00 (i=15)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b0ab20 (i=16)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b0ad10 (i=17)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b0a740 (i=18)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b095d0 (i=19)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b0a550 (i=20)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b0a360 (i=21)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b0a170 (i=22)
[[  SGX-LKL ]] futex_wake(): [  47] checking fq=0x7f7000b09ba0 (i=23)
[[  SGX-LKL ]] futex_wake(): [  47] FUTEX_WAKE with key: 0xb09c00, woke 1
[[  SGX-LKL ]] sem_down(): [  47] sem_down()
[[  SGX-LKL ]] futex_wait(): [   6] FUTEX_WAITING woke up
[[  SGX-LKL ]] futex_wait(): [  47] FUTEX_WAIT with key: 0xb0bc60, timeout: 0 usec
[[  SGX-LKL ]] __futex_wait_new(): [   0] created new fq=0x7f7000b0bdf0 lt=0x7f7000b0bc80
[[  SGX-LKL ]] __futex_wait_new(): lt=0x7f7000b0bc80 tid=0
SGXLKL ASSERTION FAILED: lt->tid (sched/futex.c: 140: __futex_wait_new)
2020-08-05T06:16:43.000000Z [(H)ERROR] tid(0x7f8590210700) | :OE_ENCLAVE_ABORTING [/home/v-pepiet/devel/sgx-lkl/openenclave/host/calls.c:oe_call_enclave_function_by_table_id:91]
[   SGX-LKL  ] ethread (3: 19) [   SGX-LKL  ] FAIL: sgxlkl_ethread_init() failed (id=3 result=19 (OE_ENCLAVE_ABORTING))

So the bug is caused by how thread_destroy_host() is invoked here and the fact that __switch_to then still calls sem_down().

@davidchisnall, since you wrote this code as part of the clone support, can you have a look?

@prp prp self-assigned this Aug 5, 2020
@davidchisnall
Copy link
Contributor

I did some debugging of this yesterday and we discussed it in the planning meeting. #719 adds use after free checking that can trigger this reliably.

@prp
Copy link
Member

prp commented Aug 5, 2020

I can trigger this reliably as above by running the clone test from here on my machine.

@davidchisnall
Copy link
Contributor

The checking in that pr causes the crash to happen on the first use after free, not at some point in the future. The root cause appears to be related to rescheduling the dead task here:

https://github.com/lsds/lkl/blob/86819ac5446e1fa31ca67db68aa165bbf07b90a9/arch/lkl/kernel/threads.c#L219

davidchisnall added a commit to lsds/lkl that referenced this issue Aug 5, 2020
We were not serialising deallocation of the thread with the Linux
scheduler.  This caused some use-after free errors when, in some
situations, the kernel could try to manipulate the deallocated scheduler
semaphore.

The fix treats cloned threads as kernel threads. The kernel already gets
the dying thread to a quiescence point and enters the scheduler, just as
it does for kernel threads. We reuse that logic with a minor
modification: we need to prevent the TLS destructor associated with the
host thread from running, so that the exit call does not attempt to make
the thread exit again.

Fixes lsds/sgx-lkl#686
davidchisnall added a commit to lsds/lkl that referenced this issue Aug 5, 2020
We were not serialising deallocation of the thread with the Linux
scheduler.  This caused some use-after free errors when, in some
situations, the kernel could try to manipulate the deallocated scheduler
semaphore.

The fix treats cloned threads as kernel threads. The kernel already gets
the dying thread to a quiescence point and enters the scheduler, just as
it does for kernel threads. We reuse that logic with a minor
modification: we need to prevent the TLS destructor associated with the
host thread from running, so that the exit call does not attempt to make
the thread exit again.

Fixes lsds/sgx-lkl#686
vtikoo pushed a commit to lsds/lkl that referenced this issue Aug 5, 2020
We were not serialising deallocation of the thread with the Linux
scheduler.  This caused some use-after free errors when, in some
situations, the kernel could try to manipulate the deallocated scheduler
semaphore.

The fix treats cloned threads as kernel threads. The kernel already gets
the dying thread to a quiescence point and enters the scheduler, just as
it does for kernel threads. We reuse that logic with a minor
modification: we need to prevent the TLS destructor associated with the
host thread from running, so that the exit call does not attempt to make
the thread exit again.

Fixes lsds/sgx-lkl#686
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p0 Blocking priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants