Skip to content
This repository has been archived by the owner on Mar 28, 2018. It is now read-only.

networkstatic/iperf3 containers fails to run with cors #152

Open
mcastelino opened this issue Aug 22, 2016 · 27 comments
Open

networkstatic/iperf3 containers fails to run with cors #152

mcastelino opened this issue Aug 22, 2016 · 27 comments

Comments

@mcastelino
Copy link
Contributor

mcastelino commented Aug 22, 2016

If you try to run iperf3 with runc as the runtime it works.
However if the same container is run with cor as a runtime it will fail with the error

iperf3: error - unable to create a new stream: No such file or directory

This may have something to do with the setup of /tmp or an environment variable in cor.

@jodh-intel
Copy link
Contributor

This is looking like a 9p bug. I created a test program (attached) that makes the following calls (similar to what iperf_new_stream() in iperf3 is doing. The program works fine in a normal environment, but fails in a CC on the call to ftruncate(). Here's some sample strace output:

execve("./test_iperf3-manohar", ["./test_iperf3-manohar"], [/* 9 vars */]) = 0
uname({sysname="Linux", nodename="7e55c4113e73", ...}) = 0
brk(NULL)                               = 0x186b000
brk(0x186c1c0)                          = 0x186c1c0
arch_prctl(ARCH_SET_FS, 0x186b880)      = 0
readlink("/proc/self/exe", "/shared/test_iperf3-manohar", 4096) = 27
brk(0x188d1c0)                          = 0x188d1c0
brk(0x188e000)                          = 0x188e000
fstat(1, {st_mode=S_IFREG|0644, st_size=468, ...}) = 0
gettimeofday({1471958259, 200678}, NULL) = 0
getpid()                                = 360
open("/tmp/foo.O2LCdw", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlink("/tmp/foo.O2LCdw")               = 0
ftruncate(3, 0)                       
[test_iperf3-manohar.c.txt](https://github.com/01org/cc-oci-runtime/files/432478/test_iperf3-manohar.c.txt)
  = -1 ENOENT (No such file or directory)
write(2, "ERROR: ftruncate(fd=3, 0) failed"..., 42ERROR: ftruncate(fd=3, 0) failed: errno=2
) = 42
write(1, "DEBUG: template='/tmp/foo.XXXXXX"..., 115DEBUG: template='/tmp/foo.XXXXXX'
SUCCESS: mkstemp(/tmp/foo.O2LCdw) worked
SUCCESS: unlink(/tmp/foo.O2LCdw) worked
) = 115
exit_group(1)                           = ?
+++ exited with 1 +++

A workaround is to run the following in the container before starting the app:

# mount -t ramfs -o size=20M ramfs /tmp

@jodh-intel
Copy link
Contributor

@jodh-intel
Copy link
Contributor

Problem is similar to #47.

@jcvenegas jcvenegas self-assigned this Aug 30, 2016
@grahamwhaley
Copy link

just an update. Yes, I believe the problem is similar to #47, but different as #47 did unlink-fstat, and this one is doing unlink-ftruncate . Same fundamental issue likely (that at some point in the code path the lookup is done by the pathname and not the fd/handle and thus fails to find the unlinked file), but through a slightly different code path.
This is reasonably hard to track down, and then if in the VFS layer on the client it may be reasonably hard to fix. It could be over in the qemu host side though, which is probably the first place to look (if only as it is probably a bit easier to debug)

@gorozco1
Copy link
Contributor

@anthonyzxu do you have any comments on this issue?

@grahamwhaley
Copy link

For the record. After chasing this down for rkt/rkt#3475, there was a patch set in test that would fix this:
https://sourceforge.net/p/v9fs/mailman/v9fs-developer/thread/146659832556.15781.17414806975641516683.stgit%40bahia.lan/#msg35175775

but a race condition still exists, and those patches are still in flight and not upstreamed yet.
If this issue does rise in importance then we could 'update' to those latest patches and re-evaluate.

@grahamwhaley
Copy link

As this is well understood, and I believe we should be able to use a ramfs /tmp as a workaround, can we drop the priority of this down from a P1 do we think @jodh-intel @dlespiau

@jodh-intel
Copy link
Contributor

@grahamwhaley - I concur.

The problem is that the /tmp workaround is only going to work for /tmp: iperf3 seems well-behaved in using a temporary directory, but if any other apps try to use ftruncate(2) on, say, /, they will see the problem.

As such, we'll have to document this limitation carefully. In fact, I almost wonder if we should generate some sort of warning message if these problematic 9p calls (via a patch). That way, we could audit the scope of the problem in say the top 100 images (#226). As and when a full solution is available, we can drop the 9p audit patch.

@jodh-intel
Copy link
Contributor

For completeness, here's some sample strace output from a successful run of the program attached in comment #152 (comment):

execve("./test_iperf3-manohar", ["./test_iperf3-manohar"], [/* 111 vars */]) = 0
brk(NULL)                               = 0xa63000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4974cb8000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=143417, ...}) = 0
mmap(NULL, 143417, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f4974c94000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\10\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2089496, ...}) = 0
mmap(NULL, 3938656, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f49746d4000
mprotect(0x7f497488d000, 2093056, PROT_NONE) = 0
mmap(0x7f4974a8c000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b8000) = 0x7f4974a8c000
mmap(0x7f4974a92000, 14688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f4974a92000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4974c93000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4974c92000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4974c91000
arch_prctl(ARCH_SET_FS, 0x7f4974c92700) = 0
mprotect(0x7f4974a8c000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ)     = 0
mprotect(0x7f4974cb9000, 4096, PROT_READ) = 0
munmap(0x7f4974c94000, 143417)          = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 8), ...}) = 0
brk(NULL)                               = 0xa63000
brk(0xa84000)                           = 0xa84000
brk(NULL)                               = 0xa84000
write(1, "DEBUG: template='/tmp/foo.XXXXXX"..., 34DEBUG: template='/tmp/foo.XXXXXX'
) = 34
getpid()                                = 438
open("/tmp/foo.dewhr0", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
write(1, "SUCCESS: mkstemp(/tmp/foo.dewhr0"..., 41SUCCESS: mkstemp(/tmp/foo.dewhr0) worked
) = 41
unlink("/tmp/foo.dewhr0")               = 0
write(1, "SUCCESS: unlink(/tmp/foo.dewhr0)"..., 40SUCCESS: unlink(/tmp/foo.dewhr0) worked
) = 40
ftruncate(3, 0)                         = 0
write(1, "SUCCESS: ftruncate(fd=3, 0) work"..., 35SUCCESS: ftruncate(fd=3, 0) worked
) = 35
close(3)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

@GabyCT
Copy link
Contributor

GabyCT commented Apr 3, 2017

@jcvenegas are any updates about this?

@jcvenegas
Copy link
Contributor

@GabyCT not so far, I did not take a look, I think the last update is @grahamwhaley research.

@grahamwhaley
Copy link

Right, I peeked at this issue last week and thought we should be able to 'fix' it by taking the latest kernel and QEMU patchsets for the 9pfs unlink fixes. @GabyCT , if you want to develop the tests then I think you could use the ramfs tmp mount 'trick' listed early in the ticket. I'll at least open an Issue to note we need to check and probably update the 9pfs code.

@jodh-intel
Copy link
Contributor

This sounds like a gnarly one to fix. But how about we downgrade to P2 given that there is a workaround of sorts?

@grahamwhaley
Copy link

I could agree to downgrade to P2. I'm not going to get to look at the patch upgrades in the immediate future. Worth pulling in @anthonyzxu for his thoughts in case it is a work in progress already?

@rarindam
Copy link

rarindam commented Jun 6, 2017

Can you test with the following changes in /usr/share/defaults/cc-oci-runtime/hypervisor.args:

In the qemu command line, following option is being used:
fsdev local,id=test_dev,path=,security_model=none

Instead please use the following:
-fsdev handle,id=test_dev,path=

Manohar saw the issue being fixed with the above change.

@wuzhy
Copy link

wuzhy commented Jul 20, 2017

@rarindam no, this issue still exists after your suggested method is used.

-fsdev local,id=extra-9p-hyperShared,path=/tmp/hyper/shared/pods/7cefd7d5746cef74f9fb04c9f246b05bf6ac7e0bc38c0457115f4d443307546e,security_model=none

@rarindam
Copy link

rarindam commented Jul 20, 2017 via email

@wuzhy
Copy link

wuzhy commented Jul 31, 2017

@rarindam Sorry for the late reply, Just i tried, and found that Clear Container failed when "-fsdev handle" was used. Below is the log:

2017-08-01 01:38:22.896147955 +0800 CST:7266:cc-runtime:error:Container still running, should be stopped

@rarindam
Copy link

rarindam commented Aug 4, 2017

@wuzhy, it has be some other issue other than the option I provided. You have to remove security_model option if using handle, instead of local. Please try it out.
I am providing this update from my end:

  1. -fsdev handle certainly is helping me run the test-iperf code with success:
    DEBUG: template='/tmp/foo.XXXXXX'
    SUCCESS: mkstemp(/tmp/foo.mqbCsO) worked
    SUCCESS: unlink(/tmp/foo.mqbCsO) worked
    SUCCESS: ftruncate(fd=3, 0) worked

@mcastelino
2. This issue seems to be actively tracked here:
https://bugs.launchpad.net/qemu/+bug/1336794
Here instead of ftruncate they are using fstat, hitting the same issue.
I tried the solutions provided, but that does not fix out issue. In fact this solution itself is not committed hence the bug is still open.

  1. Using 9pVhost:
    The same test also passes if I use 9pvhost. The issue is not seen there.

  2. In my testing, I could see the following:
    After UNLINK is executed, ftruncate is implemented using setattr (with set size flag).
    In the case when 9p-local is used as server, this results in error:

[ 26.551849] 9pnet: (00000550) >>> TSETATTR fid 4
[ 26.551854] 9pnet: (00000550) valid=68 mode=0 uid=2 gid=0 size=0
atime_sec=1501796060 atime_nsec=0
mtime_sec=1501796060 mtime_nsec=0
[ 26.551872] 9pnet: -- p9_client_prepare_req (550): client ffff9359f473f000 op 26
[ 26.552121] 9pnet: -- p9_idpool_get (550): id 1 pool ffff9359f2c1ac00
[ 26.552127] 9pnet: (00000550) >>> size=67 type: 26 tag: 1
[ 26.552132] 9pnet: -- p9_virtio_request (550): 9p debug: virtio request
[ 26.552149] 9pnet: -- p9_virtio_request (550): virtio request kicked
[ 26.552424] 9pnet: -- req_done (0): : request done
[ 26.552438] 9pnet: -- p9_client_cb (0): tag 1
[ 26.552454] 9pnet: -- p9_client_cb (0): wakeup: 1
[ 26.552505] 9pnet: (00000550) <<< size=11 type: 7 tag: 1
[ 26.552507] 9pnet: (00000550) <<< RLERROR (-2)
[ 26.552510] 9pnet: -- p9_free_req (550): clnt ffff9359f473f000 req ffff9359f2c0c070 tag: 1
[ 26.552512] 9pnet: -- p9_idpool_put (550): id 1 pool ffff9359f2c1ac00
[ 26.555121] 9pnet: -- v9fs_dir_release (550): inode: ffff9359f18319e0 filp: ffff9359f6300200 fid: 4
[ 26.555124] 9pnet: (00000550) >>> TCLUNK fid 4 (try 0)

But if we use 9p-handle as server, everything works fine:
[ 74.256036] 9pnet: -- v9fs_fid_find (554): dentry: foo.fk2Whc (ffff905ff4db4300) uid 0 any 0
[ 74.256037] 9pnet: -- v9fs_fid_find_inode (554): inode: ffff905ff2025750
[ 74.256038] 9pnet: (00000554) >>> TSETATTR fid 4
[ 74.256040] 9pnet: (00000554) valid=68 mode=0 uid=2 gid=0 size=0
atime_sec=1501795991 atime_nsec=0
mtime_sec=1501795991 mtime_nsec=0
[ 74.256041] 9pnet: -- p9_client_prepare_req (554): client ffff905ff42e5000 op 26
[ 74.256043] 9pnet: -- p9_idpool_get (554): id 1 pool ffff905ff5251800
[ 74.256045] 9pnet: (00000554) >>> size=67 type: 26 tag: 1
[ 74.256045] 9pnet: -- p9_virtio_request (554): 9p debug: virtio request
[ 74.256053] 9pnet: -- p9_virtio_request (554): virtio request kicked
[ 74.256275] 9pnet: -- req_done (0): : request done
[ 74.256277] 9pnet: -- p9_client_cb (0): tag 1
[ 74.256283] 9pnet: -- p9_client_cb (0): wakeup: 1
[ 74.256315] 9pnet: (00000554) <<< size=7 type: 27 tag: 1
[ 74.256316] 9pnet: (00000554) <<< RSETATTR fid 4

@rarindam
Copy link

rarindam commented Aug 4, 2017

execve("./test_iperf3-manohar", ["./test_iperf3-manohar"], [/* 25 vars */]) = 0
brk(NULL) = 0x55909c273000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1d50817000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=13660, ...}) = 0
mmap(NULL, 13660, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f1d50813000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\5\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1856752, ...}) = 0
mmap(NULL, 3959200, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f1d5022e000
mprotect(0x7f1d503eb000, 2097152, PROT_NONE) = 0
mmap(0x7f1d505eb000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x7f1d505eb000
mmap(0x7f1d505f1000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f1d505f1000
close(3) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1d50811000
arch_prctl(ARCH_SET_FS, 0x7f1d50811700) = 0
mprotect(0x7f1d505eb000, 16384, PROT_READ) = 0
mprotect(0x55909be50000, 4096, PROT_READ) = 0
mprotect(0x7f1d5081a000, 4096, PROT_READ) = 0
munmap(0x7f1d50813000, 13660) = 0
fstat(1, {st_mode=S_IFREG|0644, st_size=1633, ...}) = 0
brk(NULL) = 0x55909c273000
brk(0x55909c295000) = 0x55909c295000
getpid() = 560
open("/tmp/foo.F4F4ul", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlink("/tmp/foo.F4F4ul") = 0
ftruncate(3, 0) = 0
close(3) = 0
write(1, "DEBUG: template='/tmp/foo.XXXXXX"..., 150DEBUG: template='/tmp/foo.XXXXXX'
SUCCESS: mkstemp(/tmp/foo.F4F4ul) worked
SUCCESS: unlink(/tmp/foo.F4F4ul) worked
SUCCESS: ftruncate(fd=3, 0) worked
) = 150
exit_group(0) = ?
+++ exited with 0 +++

@mcastelino
Copy link
Contributor Author

@gorozco1 @devimc we need to also run DPDK pmd to verify that all the issues have been fixed.

@devimc
Copy link
Contributor

devimc commented Nov 17, 2017

still occurs using device driver overlay2

@rarindam
Copy link

@devimc , have you tested this with 9p-handle instead of 9p-local?

@devimc
Copy link
Contributor

devimc commented Nov 17, 2017

@rarindam nop, but I can try
I tested with qemu 2.10.1 to see if it was fixed, but still occurs 😢

@rarindam
Copy link

rarindam commented Nov 22, 2017

./test_iperf3-manohar
DEBUG: template='/tmp/foo.XXXXXX'
SUCCESS: mkstemp(/tmp/foo.mDEbT7) worked
SUCCESS: unlink(/tmp/foo.mDEbT7) worked
SUCCESS: ftruncate(fd=3, 0) worked

Kernel 4.14, Qemu 2.10:
-fsdev local,id=fsdev0,security_model=none,path=/tmp/9p_shared -device virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=qtest

execve("./test_iperf3-manohar",

["./test_iperf3-manohar"], [/* 28 vars */]) = 0
brk(NULL) = 0x55892f9fe000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fab98d1e000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=13660, ...}) = 0
mmap(NULL, 13660, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fab98d1a000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\5\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1856752, ...}) = 0
mmap(NULL, 3959200, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fab98735000
mprotect(0x7fab988f2000, 2097152, PROT_NONE) = 0
mmap(0x7fab98af2000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x7fab98af2000
mmap(0x7fab98af8000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fab98af8000
close(3) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fab98d18000
arch_prctl(ARCH_SET_FS, 0x7fab98d18700) = 0
mprotect(0x7fab98af2000, 16384, PROT_READ) = 0
mprotect(0x55892dec7000, 4096, PROT_READ) = 0
mprotect(0x7fab98d21000, 4096, PROT_READ) = 0
munmap(0x7fab98d1a000, 13660) = 0
fstat(1, {st_mode=S_IFREG|0644, st_size=1633, ...}) = 0
brk(NULL) = 0x55892f9fe000
brk(0x55892fa20000) = 0x55892fa20000
getpid() = 426
open("/tmp/foo.1zEZ5T", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
unlink("/tmp/foo.1zEZ5T") = 0
ftruncate(3, 0) = 0
close(3) = 0
write(1, "DEBUG: template='/tmp/foo.XXXXXX"..., 150DEBUG: template='/tmp/foo.XXXXXX'
SUCCESS: mkstemp(/tmp/foo.1zEZ5T) worked
SUCCESS: unlink(/tmp/foo.1zEZ5T) worked
SUCCESS: ftruncate(fd=3, 0) worked
) = 150
exit_group(0) = ?
+++ exited with 0 +++

@devimc
Copy link
Contributor

devimc commented Nov 23, 2017

@rarindam
what kernel did you use?
would you mind running the iperf test?

in clear containers this is the cmdline:

-device virtio-9p-pci,fsdev=ctr-9p-0,mount_tag=ctr-rootfs-0 -fsdev local,id=ctr-9p-0,path=/var/lib/docker/overlay2/232a11afb1dd6db15ef9a79d039c15ffdf85b09fea336420f5eb31eb186db2af/merged,security_model=none

@rarindam
Copy link

rarindam commented Dec 1, 2017

@devimc
Kernel 4.14, Qemu 2.10:
-fsdev local,id=fsdev0,security_model=none,path=/tmp/9p_shared -device virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=qtest

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants