Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tests] basic/pthread_join test gets stuck #198

Closed
letmaik opened this issue May 6, 2020 · 5 comments
Closed

[Tests] basic/pthread_join test gets stuck #198

letmaik opened this issue May 6, 2020 · 5 comments
Assignees
Labels
area: sgx-lkl Core SGX-LKL functionality bug p0 Blocking priority
Milestone

Comments

@letmaik
Copy link
Contributor

letmaik commented May 6, 2020

This test finishes regularly in ~10s but in CI it timed out once after 60s:

Execution timeout: 60
Will run tests for run-hw
/opt/sgx-lkl-debug/bin/sgx-lkl-disk create --size=5M --docker=./Dockerfile sgx-lkl-rootfs.img
Using detected host DNS server in /etc/resolv.conf: 168.63.129.16
Creating sgx-lkl-rootfs.img from Dockerfile ./Dockerfile...
Building Docker image...
Creating disk image file from Docker container...
Succesfully created sgx-lkl-rootfs.img.
SGXLKL_ETHREADS=8 SGXLKL_VERBOSE=1 SGXLKL_KERNEL_VERBOSE=1 /opt/sgx-lkl-debug/bin/sgx-lkl-run-oe --hw-debug sgx-lkl-rootfs.img pthread_join-test
[[  SGX-LKL ]] sgxlkl_enclave_init(): enter
[[  SGX-LKL ]] sgxlkl_enclave_show_attribute(): enclave base=0x7f2480000000 size=1.042 GB
[[  SGX-LKL ]] sgxlkl_enclave_show_attribute(): enclave heap base=0x7f2480b37000 size=1024.00 M end=0x7f24c0b37000
[[  SGX-LKL ]] sgxlkl_enclave_init(): calling _dlstart_c()
[[  SGX-LKL ]] _register_enclave_signal_handlers(): Registering OE exception handler...
[[  SGX-LKL ]] lkl_start_init(): kernel command line: 'mem=32M console=hvc0'
[[  SGX-LKL ]] lkl_start_init(): lkl_start_kernel() called
[    0.000000] Linux version 5.3.0+ (root@cisca24ba00001P) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #1 Wed May 6 02:56:24 UTC 2020
[    0.000000] memblock address range: 0x7f24be8b5000 - 0x7f24c08b4000
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 8079
[    0.000000] Kernel command line: mem=32M console=hvc0 virtio_mmio.device=256@0x1000000:1
[    0.000000] Dentry cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
[    0.000000] Inode-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory available: 32172k/32764k RAM
[    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] NR_IRQS: 4096
[    0.000000] lkl: irqs initialized
[    0.000000] clocksource: lkl: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.000000] lkl: time and timers initialized (irq2)
[    0.000000] printk: console [hvc0] enabled
[    0.000000] pid_max: default: 4096 minimum: 301
[    0.000000] Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.000000] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.008406] devtmpfs: initialized
[    0.008407] random: get_random_bytes called from 0x00007f2480656500 with crng_init=0
[    0.008407] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.008407] xor: automatically using best checksumming function   8regs     
[    0.008407] NET: Registered protocol family 16
[    0.013908] clocksource: Switched to clocksource lkl
[    0.014526] NET: Registered protocol family 2
[    0.015088] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 4096 bytes, linear)
[    0.015088] TCP established hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.015088] TCP bind hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.015088] TCP: Hash tables configured (established 512 bind 512)
[    0.015088] UDP hash table entries: 128 (order: 0, 4096 bytes, linear)
[    0.015088] UDP-Lite hash table entries: 128 (order: 0, 4096 bytes, linear)
[    0.015088] NET: Registered protocol family 1
[    0.015628] virtio-mmio: Registering device virtio-mmio.0 at 0x1000000-0x10000ff, IRQ 1.
[    0.015628] workingset: timestamp_bits=62 max_order=14 bucket_order=0
[    0.016646] NET: Registered protocol family 38
[    0.016646] io scheduler mq-deadline registered
[    0.016646] io scheduler kyber registered
[    0.161609] software IO TLB: mapped [mem 0x7f250ee05000-0x7f2512e05000] (64MB)
[    0.164918] loop: module loaded
[    0.165473] device-mapper: ioctl: 4.40.0-ioctl (2019-01-18) initialised: dm-devel@redhat.com
[    0.168210] wireguard: WireGuard 0.0.20191219 loaded. See www.wireguard.com for information.
[    0.168210] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
[    0.168210] NET: Registered protocol family 10
[    0.168765] Segment Routing with IPv6
[    0.168765] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    0.169290] This architecture does not have kernel memory protection.
[    0.173253] Run /init as init process
[[  SGX-LKL ]] lkl_start_init(): lkl_start_kernel() finished
[[  SGX-LKL ]] lkl_start_init(): creating LKL termination thread
[[  SGX-LKL ]] create_lkl_termination_thread(): enter
[[  SGX-LKL ]] lkl_start_init(): calling lkl_mount_virtial()
[[  SGX-LKL ]] lkl_termination_thread(): enter
[[  SGX-LKL ]] lkl_termination_thread(): Performed LKL syscall to get host task allocated (pid=36)
[[  SGX-LKL ]] lkl_start_init(): calling init_random()
[[  SGX-LKL ]] init_random(): Adding entropy to entropy pool
[    0.175255] random: crng init done
[[  SGX-LKL ]] init_enclave_clock(): Setting enclave realtime clock
[    0.178114] virtio_blk virtio1: [vda] 10240 512-byte logical blocks (5.24 MB/5.00 MiB)
[[  SGX-LKL ]] lkl_mount_disk(): lkl_mount_disk(dev="/dev/vda", mnt="/mnt/vda", ro=0)
[    0.180963] EXT4-fs (vda): mounted filesystem without journal. Opts: 
[[  SGX-LKL ]] lkl_mount_disks(): Set working directory: /
[[  SGX-LKL ]] libc_start_main_stage2(): Calling app main: pthread_join-test
Creating worker thread (run=0)
Joining worker thread...
Thread joined (ret=0, run=0)
Creating worker thread (run=1000)
Joining worker thread...
Thread joined (ret=0, run=1000)
Creating worker thread (run=2000)
Joining worker thread...
Thread joined (ret=0, run=2000)
Creating worker thread (run=3000)
Joining worker thread...
Thread joined (ret=0, run=3000)
Creating worker thread (run=4000)
Joining worker thread...
Thread joined (ret=0, run=4000)
Creating worker thread (run=5000)
Joining worker thread...
Thread joined (ret=0, run=5000)
Creating worker thread (run=6000)
Joining worker thread...
Thread joined (ret=0, run=6000)
Creating worker thread (run=7000)
Joining worker thread...
Thread joined (ret=0, run=7000)
Creating worker thread (run=8000)
Joining worker thread...
Thread joined (ret=0, run=8000)
Creating worker thread (run=9000)
Joining worker thread...
Thread joined (ret=0, run=9000)
Makefile:29: recipe for target 'run-hw' failed
make run-hw: TIMED OUT after 60 secs
Test run completed with EXIT CODE 124
@letmaik letmaik added bug area: sgx-lkl Core SGX-LKL functionality labels May 6, 2020
@prp prp added the p1 Medium priority label May 15, 2020
@prp prp added this to the Milestone 1 milestone May 15, 2020
@prp
Copy link
Member

prp commented May 15, 2020

This will be affected by #155.

@prp prp changed the title basic/pthread_join test gets stuck [Tests] basic/pthread_join test gets stuck Jun 27, 2020
@davidchisnall davidchisnall added the needs-triage Bug does not yet have a priority assigned label Jul 28, 2020
@SeanTAllen SeanTAllen removed the needs-triage Bug does not yet have a priority assigned label Jul 29, 2020
@vtikoo vtikoo added the p0 Blocking priority label Aug 3, 2020
@bodzhang bodzhang removed the p1 Medium priority label Aug 3, 2020
@prp prp self-assigned this Aug 3, 2020
@prp
Copy link
Member

prp commented Aug 3, 2020

@vtikoo do you have any suggestions how I can trigger this?

@vtikoo
Copy link
Contributor

vtikoo commented Aug 3, 2020

@prp you could try the repro steps in #686. There is a difference in how threads are created in this issue and 686, i.e directly calling lthread_create vs clone() - but there is a good chance the root cause for hanging is the same.

@prp
Copy link
Member

prp commented Aug 4, 2020

@vtikoo Hmm, I would first like to understand if this is indeed an old problem or a new regression introduced by clone. I ran the above test with 16 ethreads in hw mode in a loop, and so far had 10,000+ successful runs. If we can't repro this, we should close this issue and instead focus in #686.

@letmaik have you ever seen this happen again? Any suggestions on how to repro this?

@prp
Copy link
Member

prp commented Aug 5, 2020

Since I cannot reproduce this issue and #686 seems to be caused by a new code path specific to the clone support, I am closing this issue. Let's reopen it if we encounter this again.

@prp prp closed this as completed Aug 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: sgx-lkl Core SGX-LKL functionality bug p0 Blocking priority
Projects
None yet
Development

No branches or pull requests

6 participants