Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support the clone system call. #259

Merged
merged 1 commit into from
May 20, 2020
Merged

Support the clone system call. #259

merged 1 commit into from
May 20, 2020

Conversation

davidchisnall
Copy link
Contributor

This implements two new LKL hooks. The first one to create an lthread
with a specific initial register state (to capture the returns-twice
behaviour of clone, along with the caller's ability to define the stack
and TLS addresses). The new thread is immediately associated with the
Linux task structure (normally, lthreads are associated with Linux tasks
lazily when they perform a system call).

The second hook destroys a thread. This is done in response to an exit
system call. This is somewhat complicated, because LKL never returns to
this thread and the thread's stack may be deallocated by the time we
exit it.

The lthread scheduler does not have an easy way of adding a mechanism to
kill a thread without that thread running. We can add one eventually,
but for now create a temporary stack that lthreads can use during
teardown and make them run the teardown from there.

Fixes #155

@davidchisnall davidchisnall requested review from prp and mikbras May 18, 2020 09:26
@davidchisnall
Copy link
Contributor Author

Don't merge this yet: the LKL commit will change once lsds/lkl#1 is merged. I'll force-push to this branch to update the LKL submodule once that's done.

@davidchisnall
Copy link
Contributor Author

Note that this doesn't yet fix #154.

Copy link
Member

@prp prp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor comments. I see CI failures though...

src/include/enclave/lthread.h Show resolved Hide resolved
src/include/enclave/lthread.h Show resolved Hide resolved
src/include/enclave/lthread.h Show resolved Hide resolved
src/lkl/posix-host.c Show resolved Hide resolved
src/lkl/posix-host.c Outdated Show resolved Hide resolved
src/lkl/posix-host.c Show resolved Hide resolved
src/lkl/posix-host.c Show resolved Hide resolved
src/sched/lthread.c Show resolved Hide resolved
tests/basic/clone/clone.s Outdated Show resolved Hide resolved
@davidchisnall
Copy link
Contributor Author

CI is failing because the weak symbol for lkl_syscall isn't found. It does for me, so I'm not entirely sure what's going on here...

@@ -0,0 +1,11 @@
FROM alpine:3.6 AS builder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, given that 3.6 is rather dated. is there a "no later than X" dependency for sgx-lkl that I'm not aware of?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use a consistent Alpine version in all tests. @letmaik?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #260. For this PR, it's fine to leave it at an arbitrary version. The maximum supported currently is 3.10.

@davidchisnall davidchisnall force-pushed the clone branch 3 times, most recently from 6fe9619 to 23f27cf Compare May 19, 2020 16:28
@SeanTAllen
Copy link
Contributor

SeanTAllen commented May 19, 2020

I started looking into the failing access02 ltp test. Here's how to replicate.

  • Start from fresh repo of the clone branch.
  • make DEBUG=true
  • cd tests/ltp/ltp-batch1
  • edit ../batch.mk
  • comment out the commands for run-hw and run-sw commands (those commands run all tests)
run-hw: $(ROOT_FS)
        #@${LTP_TEST_SCRIPT} run-hw

run-sw: $(ROOT_FS)
        #@${LTP_TEST_SCRIPT} run-sw
  • make clean
  • make DEBUG=true
  • make run

At this point, you are prepped to run the single test for HW mode:

  • SGXLKL_VERBOSE=1 SGXLKL_KERNEL_VERBOSE=0 ../../../build/sgx-lkl-run-oe --hw-debug sgxlkl-miniroot-fs.img /ltp/testcases/kernel/syscalls/access/access02

for SW mode:

SGXLKL_VERBOSE=1 SGXLKL_KERNEL_VERBOSE=0 ../../../build/sgx-lkl-run-oe --sw-debug sgxlkl-miniroot-fs.img /ltp/testcases/kernel/syscalls/access/access02

Result I get

HW:

tst_test.c:1106: INFO: Timeout per run is 0h 05m 00s
tst_test.c:1125: INFO: No fork support
access02.c:144: PASS: access(file_f, F_OK) as root behaviour is correct.
access02.c:144: PASS: access(file_f, F_OK) as nobody behaviour is correct.
access02.c:144: PASS: access(file_r, R_OK) as root behaviour is correct.
access02.c:144: PASS: access(file_r, R_OK) as nobody behaviour is correct.
access02.c:144: PASS: access(file_w, W_OK) as root behaviour is correct.
access02.c:144: PASS: access(file_w, W_OK) as nobody behaviour is correct.
bad count while changing owner
[[ SGX-LKL ]] FAIL: Kernel panic! Run DEBUG build with SGXLKL_KERNEL_VERBOSE=1 for more information. Aborting...
2020-05-19T19:52:36.000000Z [(H)ERROR] tid(0x7fe8a8ff1700) | :OE_ENCLAVE_ABORTING [/home/sean/openenclave-sgxlkl.git/host/calls.c:oe_call_enclave_function_by_table_id:91]
[ SGX-LKL ] ethread (4: 19) [ SGX-LKL ] FAIL: sgxlkl_ethread_init() failed (id=4 result=19 (OE_ENCLAVE_ABORTING))

SW:

It hangs after printing access02.c:144: PASS: access(file_w, W_OK) as nobody behaviour is correct.


This is somewhat different than what I am seeing in CI, so, before I proceed any further, @davidchisnall can you try recreating the above?


The source of the failing test is https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/access/access02.c

@davidchisnall
Copy link
Contributor Author

Thanks @SeanTAllen. I actually see something more interesting when I try to reproduce this locally:

tst_test.c:1106: INFO: Timeout per run is 0h 05m 00s
tst_test.c:1125: INFO: No fork support
access02.c:144: PASS: access(file_f, F_OK) as root behaviour is correct.
access02.c:144: PASS: access(file_f, F_OK) as nobody behaviour is correct.
access02.c:144: PASS: access(file_r, R_OK) as root behaviour is correct.
access02.c:144: PASS: access(file_r, R_OK) as nobody behaviour is correct.
access02.c:144: PASS: access(file_w, W_OK) as root behaviour is correct.
access02.c:144: PASS: access(file_w, W_OK) as nobody behaviour is correct.
Created new host task 7f2fc02f4540 (for 7f2fbeada4c0)

So it appears that we're hitting the code path for creating a new clone'd task (for a test that shouldn't be calling clone). I'll take a look.

This implements two new LKL hooks. The first one to create an lthread
with a specific initial register state (to capture the returns-twice
behaviour of clone, along with the caller's ability to define the stack
and TLS addresses).  The new thread is immediately associated with the
Linux task structure (normally, lthreads are associated with Linux tasks
lazily when they perform a system call).

The second hook destroys a thread.  This is done in response to an exit
system call.  This is somewhat complicated, because LKL never returns to
this thread and the thread's stack may be deallocated by the time we
exit it.

The lthread scheduler does not have an easy way of adding a mechanism to
kill a thread without that thread running.  We can add one eventually,
but for now create a temporary stack that lthreads can use during
teardown and make them run the teardown from there.

Disable access02 test.  It is spuriously passing and this makes it fail.
See #277 for more information.

Fixes #155
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants