Io #5

anasazi · 2013-07-03T00:32:15Z

UDP sockets backed by libuv. UDP stream interface built on top of UDP sockets. IPv6 support for both UDP and TCP.

…ests.

…4 data structure.

Conflicts: src/libstd/rt/uvio.rs

…guage syntax change.

…them manually.

Conflicts: src/rt/rustrt.def.in

…ve semantics rather than ~.

Conflicts: src/libstd/rt/test.rs src/rt/rustrt.def.in

terminfo parameterized strings supports a limited subset of printf-style formatting operations, such as %#5.3d.

Conflicts: src/libstd/rt/uvio.rs

Fix grammar

Add small-copy optimization for copy_from_slice ## Summary During benchmarking, I found that one of my programs spent between 5 and 10 percent of the time doing memmoves. Ultimately I tracked these down to single-byte slices being copied with a memcopy. Doing a manual copy if the slice contains only one element can speed things up significantly. For my program, this reduced the running time by 20%. ## Background I am optimizing a program that relies heavily on reading a single byte at a time. To avoid IO overhead, I read all data into a vector once, and then I use a `Cursor` around that vector to read from. During profiling, I noticed that `__memmove_avx_unaligned_erms` was hot, taking up 7.3% of the running time. It turns out that these were caused by calls to `Cursor::read()`, which calls `<&[u8] as Read>::read()`, which calls `&[T]::copy_from_slice()`, which calls `ptr::copy_nonoverlapping()`. This one is implemented as a memcopy. Copying a single byte with a memcopy is very wasteful, because (at least on my platform) it involves calling `memcpy` in libc. This is an indirect call when libc is linked dynamically, and furthermore `memcpy` is optimized for copying large amounts of data at the cost of a bit of overhead for small copies. ## Benchmarks Before I made this change, `perf` reported the following for my program. I only included the relevant functions, and how they rank. (This is on a different machine than where I ran the original benchmarks. It has an older CPU, so `__memmove_sse2_unaligned_erms` is called instead of `__memmove_avx_unaligned_erms`.) ``` #3 5.47% bench_decode libc-2.24.so [.] __memmove_sse2_unaligned_erms #5 1.67% bench_decode libc-2.24.so [.] memcpy@GLIBC_2.2.5 #6 1.51% bench_decode bench_decode [.] memcpy@plt ``` `memcpy` is eating up 8.65% of the total running time, and the overhead of dispatching to a specialized fast copy function (`memcpy@GLIBC` showing up) is clearly visible. The price of dynamic linking (`memcpy@plt` showing up) is visible too. After this change, this is what `perf` reports: ``` #5 0.33% bench_decode libc-2.24.so [.] __memmove_sse2_unaligned_erms #14 0.01% bench_decode libc-2.24.so [.] memcpy@GLIBC_2.2.5 ``` Now only 0.34% of the running time is spent on memcopies. The dynamic linking overhead is not significant at all any more. To add some more data, my program generates timing results for the operation in its main loop. These are the timings before and after the change: | Time before | Time after | After/Before | |---------------|---------------|--------------| | 29.8 ± 0.8 ns | 23.6 ± 0.5 ns | 0.79 ± 0.03 | The time is basically the total running time divided by a constant; the actual numbers are not important. This change reduced the total running time by 21% (much more than the original 9% spent on memmoves, likely because the CPU is stalling a lot less because data dependencies are more transparent). Of course YMMV and for most programs this will not matter at all. But when it does, the gains can be significant! ## Alternatives * At first I implemented this in `io::Cursor`. I moved it to `&[T]::copy_from_slice()` instead, but this might be too intrusive, especially because it applies to all `T`, not just `u8`. To restrict this to `io::Read`, `<&[u8] as Read>::read()` is probably the best place. * I tried copying bytes in a loop up to 64 or 8 bytes before calling `Read::read`, but both resulted in about a 20% slowdown instead of speedup.

LeakSanitizer, ThreadSanitizer, AddressSanitizer and MemorySanitizer support ``` $ cargo new --bin leak && cd $_ $ edit Cargo.toml && tail -n3 $_ ``` ``` toml [profile.dev] opt-level = 1 ``` ``` $ edit src/main.rs && cat $_ ``` ``` rust use std::mem; fn main() { let xs = vec![0, 1, 2, 3]; mem::forget(xs); } ``` ``` $ RUSTFLAGS="-Z sanitizer=leak" cargo run --target x86_64-unknown-linux-gnu; echo $? Finished dev [optimized + debuginfo] target(s) in 0.0 secs Running `target/debug/leak` ================================================================= ==10848==ERROR: LeakSanitizer: detected memory leaks Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x557c3488db1f in __interceptor_malloc /shared/rust/checkouts/lsan/src/compiler-rt/lib/lsan/lsan_interceptors.cc:55 #1 0x557c34888aaa in alloc::heap::exchange_malloc::h68f3f8b376a0da42 /shared/rust/checkouts/lsan/src/liballoc/heap.rs:138 #2 0x557c34888afc in leak::main::hc56ab767de6d653a $PWD/src/main.rs:4 #3 0x557c348c0806 in __rust_maybe_catch_panic ($PWD/target/debug/leak+0x3d806) SUMMARY: LeakSanitizer: 16 byte(s) leaked in 1 allocation(s). 23 ``` ``` $ cargo new --bin racy && cd $_ $ edit src/main.rs && cat $_ ``` ``` rust use std::thread; static mut ANSWER: i32 = 0; fn main() { let t1 = thread::spawn(|| unsafe { ANSWER = 42 }); unsafe { ANSWER = 24; } t1.join().ok(); } ``` ``` $ RUSTFLAGS="-Z sanitizer=thread" cargo run --target x86_64-unknown-linux-gnu; echo $? ================== WARNING: ThreadSanitizer: data race (pid=12019) Write of size 4 at 0x562105989bb4 by thread T1: #0 racy::main::_$u7b$$u7b$closure$u7d$$u7d$::hbe13ea9e8ac73f7e $PWD/src/main.rs:6 (racy+0x000000010e3f) #1 _$LT$std..panic..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h2e466a92accacc78 /shared/rust/checkouts/lsan/src/libstd/panic.rs:296 (racy+0x000000010cc5) #2 std::panicking::try::do_call::h7f4d2b38069e4042 /shared/rust/checkouts/lsan/src/libstd/panicking.rs:460 (racy+0x00000000c8f2) #3 __rust_maybe_catch_panic <null> (racy+0x0000000b4e56) #4 std::panic::catch_unwind::h31ca45621ad66d5a /shared/rust/checkouts/lsan/src/libstd/panic.rs:361 (racy+0x00000000b517) #5 std::thread::Builder::spawn::_$u7b$$u7b$closure$u7d$$u7d$::hccfc37175dea0b01 /shared/rust/checkouts/lsan/src/libstd/thread/mod.rs:357 (racy+0x00000000c226) #6 _$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$::call_box::hd880bbf91561e033 /shared/rust/checkouts/lsan/src/liballoc/boxed.rs:605 (racy+0x00000000f27e) #7 std::sys::imp::thread::Thread::new::thread_start::hebdfc4b3d17afc85 <null> (racy+0x0000000abd40) Previous write of size 4 at 0x562105989bb4 by main thread: #0 racy::main::h23e6e5ca46d085c3 $PWD/src/main.rs:8 (racy+0x000000010d7c) #1 __rust_maybe_catch_panic <null> (racy+0x0000000b4e56) #2 __libc_start_main <null> (libc.so.6+0x000000020290) Location is global 'racy::ANSWER::h543d2b139f819b19' of size 4 at 0x562105989bb4 (racy+0x0000002f8bb4) Thread T1 (tid=12028, running) created by main thread at: #0 pthread_create /shared/rust/checkouts/lsan/src/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:902 (racy+0x00000001aedb) #1 std::sys::imp::thread::Thread::new::hce44187bf4a36222 <null> (racy+0x0000000ab9ae) #2 std::thread::spawn::he382608373eb667e /shared/rust/checkouts/lsan/src/libstd/thread/mod.rs:412 (racy+0x00000000b5aa) #3 racy::main::h23e6e5ca46d085c3 $PWD/src/main.rs:6 (racy+0x000000010d5c) #4 __rust_maybe_catch_panic <null> (racy+0x0000000b4e56) #5 __libc_start_main <null> (libc.so.6+0x000000020290) SUMMARY: ThreadSanitizer: data race $PWD/src/main.rs:6 in racy::main::_$u7b$$u7b$closure$u7d$$u7d$::hbe13ea9e8ac73f7e ================== ThreadSanitizer: reported 1 warnings 66 ``` ``` $ cargo new --bin oob && cd $_ $ edit src/main.rs && cat $_ ``` ``` rust fn main() { let xs = [0, 1, 2, 3]; let y = unsafe { *xs.as_ptr().offset(4) }; } ``` ``` $ RUSTFLAGS="-Z sanitizer=address" cargo run --target x86_64-unknown-linux-gnu; echo $? ================================================================= ==13328==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff29f3ecd0 at pc 0x55802dc6bf7e bp 0x7fff29f3ec90 sp 0x7fff29f3ec88 READ of size 4 at 0x7fff29f3ecd0 thread T0 #0 0x55802dc6bf7d in oob::main::h0adc7b67e5feb2e7 $PWD/src/main.rs:3 #1 0x55802dd60426 in __rust_maybe_catch_panic ($PWD/target/debug/oob+0xfe426) #2 0x55802dd58dd9 in std::rt::lang_start::hb2951fc8a59d62a7 ($PWD/target/debug/oob+0xf6dd9) #3 0x55802dc6c002 in main ($PWD/target/debug/oob+0xa002) #4 0x7fad8c3b3290 in __libc_start_main (/usr/lib/libc.so.6+0x20290) #5 0x55802dc6b719 in _start ($PWD/target/debug/oob+0x9719) Address 0x7fff29f3ecd0 is located in stack of thread T0 at offset 48 in frame #0 0x55802dc6bd5f in oob::main::h0adc7b67e5feb2e7 $PWD/src/main.rs:1 This frame has 1 object(s): [32, 48) 'xs' <== Memory access at offset 48 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext (longjmp and C++ exceptions *are* supported) SUMMARY: AddressSanitizer: stack-buffer-overflow $PWD/src/main.rs:3 in oob::main::h0adc7b67e5feb2e7 Shadow bytes around the buggy address: 0x1000653dfd40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000653dfd50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000653dfd60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000653dfd70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000653dfd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x1000653dfd90: 00 00 00 00 f1 f1 f1 f1 00 00[f3]f3 00 00 00 00 0x1000653dfda0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000653dfdb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000653dfdc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000653dfdd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1000653dfde0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Heap right redzone: fb Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack partial redzone: f4 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==13328==ABORTING 1 ``` ``` $ cargo new --bin uninit && cd $_ $ edit src/main.rs && cat $_ ``` ``` rust use std::mem; fn main() { let xs: [u8; 4] = unsafe { mem::uninitialized() }; let y = xs[0] + xs[1]; } ``` ``` $ RUSTFLAGS="-Z sanitizer=memory" cargo run; echo $? ==30198==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x563f4b6867da in uninit::main::hc2731cd4f2ed48f8 $PWD/src/main.rs:5 #1 0x563f4b7033b6 in __rust_maybe_catch_panic ($PWD/target/debug/uninit+0x873b6) #2 0x563f4b6fbd69 in std::rt::lang_start::hb2951fc8a59d62a7 ($PWD/target/debug/uninit+0x7fd69) #3 0x563f4b6868a9 in main ($PWD/target/debug/uninit+0xa8a9) #4 0x7fe844354290 in __libc_start_main (/usr/lib/libc.so.6+0x20290) #5 0x563f4b6864f9 in _start ($PWD/target/debug/uninit+0xa4f9) SUMMARY: MemorySanitizer: use-of-uninitialized-value $PWD/src/main.rs:5 in uninit::main::hc2731cd4f2ed48f8 Exiting 77 ```

Without that flag, LLVM generates unaligned memory access instructions, which are not allowed on ARMv5. For example, the 'hello world' example from `cargo --new` failed with: ``` $ ./hello Hello, world! thread 'main' panicked at 'assertion failed: end <= len', src/libcollections/vec.rs:1113 note: Run with `RUST_BACKTRACE=1` for a backtrace. ``` I traced this error back to the following assembler code in `BufWriter::flush_buf`: ``` 6f44: e28d0018 add r0, sp, #24 [...] 6f54: e280b005 add fp, r0, #5 [...] 7018: e5cd001c strb r0, [sp, #28] 701c: e1a0082a lsr r0, sl, #16 7020: 03a01001 moveq r1, #1 7024: e5cb0002 strb r0, [fp, #2] 7028: e1cba0b0 strh sl, [fp] ``` Note that `fp` points to `sp + 29`, so the three `str*`-instructions should fill up a 32bit - value at `sp + 28`, which is later used as the value `n` in `Ok(n) => written += n`. This doesn't work on ARMv5 as the `strh` can't write to the unaligned contents of `fp`, so the upper bits of `n` won't get cleared, leading to the assertion failure in Vec::drain. With `+strict-align`, the code works as expected.

ARMv5 needs +strict-align Without that flag, LLVM generates unaligned memory access instructions, which are not allowed on ARMv5. For example, the 'hello world' example from `cargo --new` failed with: ``` $ ./hello Hello, world! thread 'main' panicked at 'assertion failed: end <= len', src/libcollections/vec.rs:1113 note: Run with `RUST_BACKTRACE=1` for a backtrace. ``` I traced this error back to the following assembler code in `BufWriter::flush_buf`: ``` 6f44: e28d0018 add r0, sp, #24 [...] 6f54: e280b005 add fp, r0, #5 [...] 7018: e5cd001c strb r0, [sp, #28] 701c: e1a0082a lsr r0, sl, #16 7020: 03a01001 moveq r1, #1 7024: e5cb0002 strb r0, [fp, #2] 7028: e1cba0b0 strh sl, [fp] ``` Note that `fp` points to `sp + 29`, so the three `str*`-instructions should fill up a 32bit - value at `sp + 28`, which is later used as the value `n` in `Ok(n) => written += n`. This doesn't work on ARMv5 as the `strh` can't write to the unaligned contents of `fp`, so the upper bits of `n` won't get cleared, leading to the assertion failure in Vec::drain. With `+strict-align`, the code works as expected.

Eric Reed added 30 commits June 12, 2013 14:15

Removing redundant libuv bindings

eb11274

Added libuv UDP function bindings.

39a575f

Corrected libuv UDP bindings.

5393e43

Added a utility function to extract the udp handle from udp send requ…

74e7255

…ests.

added bindings to extract udp handle from udp send requests

03fe59a

Added a UdpWatcher and UdpSendRequest with associated callbacks

a7f92c9

added wrappers about uv_ip{4,6}_{port,name}

9687437

Added a RtioUdpStream trait

b51d188

added a function to convert C's ipv4 data structure into the Rust ipv…

7e022c5

…4 data structure.

added Eq and TotalEq instances for IpAddr

4744375

stated to implement UdpStream

e42f28c

Started to implemented UdpStream

33ae193

Merge remote-tracking branch 'upstream/io' into io

35f3fa6

Conflicts: src/libstd/rt/uvio.rs

Wrote the Eq instance of IpAddr in a slightly different way.

d777ba0

Changed visibility from being on the impl to being on methods per lan…

083c692

…guage syntax change.

socket based UDP io

ac49b74

derived instances of Eq and TotalEq for IpAddr rather than implement …

36c0e04

…them manually.

Merge remote-tracking branch 'upstream/io' into io

55dda46

UDP networking with tests

794923c

Merge remote-tracking branch 'upstream/io' into io

4870dce

Conflicts: src/rt/rustrt.def.in

removed unncessary unsafe block that was stopping compliation.

1af2016

satisfy the formatting check

f202713

removed obsolete FIXMEs. formatting changes.

2c5cfe1

IPv6 struct

d0c812f

changed outdated match on IpAddr

c5b19f0

converted UvUdpSocket into a newtype struct

f604686

Converted UdpSocket into a newtype struct and (dis)connecting uses mo…

34b1135

…ve semantics rather than ~.

removed unecessary method

d0dc697

converted TCP interface to newtype structs

87ecfb7

cleaned up uv/net

ce97bd4

Eric Reed added 4 commits June 26, 2013 13:48

changed NOTE to TODO

42f3f06

IPv6 support for UDP and TCP.

e6c5779

Merge remote-tracking branch 'upstream/io' into io

6a1a781

Conflicts: src/libstd/rt/test.rs src/rt/rustrt.def.in

converted TODOs into XXXs

b60cf0c

brson pushed a commit that referenced this pull request Jul 3, 2013

Support printf formats in terminfo strings

c1b1091

terminfo parameterized strings supports a limited subset of printf-style formatting operations, such as %#5.3d.

Eric Reed added 3 commits July 8, 2013 13:03

Merge remote-tracking branch 'upstream/io' into io

cf23292

Conflicts: src/libstd/rt/uvio.rs

renamed finalize to drop in Drop impl for UvUdpSocket

6b2abca

changed .each() to .iter().advance()

5e0be46

brson merged commit 5e0be46 into brson:io Jul 8, 2013

brson pushed a commit that referenced this pull request Jul 30, 2015

Merge pull request #5 from Manishearth/patch-1

77e5f70

Fix grammar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Io #5

Io #5

anasazi commented Jul 3, 2013

Io #5

Io #5

Conversation

anasazi commented Jul 3, 2013