Add benchmarks for no-op submission #144

ollie-etl · 2022-10-24T12:50:45Z

This PR adds some baseline benchmarks for NoOp submission / completion. This allows users to very roughly estimate the maximum bound for the throughput achievable using tokio-uring.

2 Benchmarks are added: Criterion, measuring wall time, and Lai, measuring cpu cycles. To run lai benchmarks, cachegrind is required.

###Notes
I've no doubt these can be improved, and should someone more experienced in micro-benchmarking wish to point out the glaring inefficiencies, I'd be very happy.

There are 2 lines of commented out code. These trigger a panic, issue #145

Some results

Hardware: 6 core i7

Criterion

o_op/1                  time:   [145.16 ms 145.48 ms 145.82 ms]
                        thrpt:  [685.78 Kelem/s 687.38 Kelem/s 688.92 Kelem/s]
                 change:
                        time:   [-2.8876% -2.4510% -2.0114%] (p = 0.00 < 0.05)
                        thrpt:  [+2.0526% +2.5126% +2.9734%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
  
no_op/32                time:   [27.023 ms 27.078 ms 27.148 ms]
                        thrpt:  [3.6835 Melem/s 3.6930 Melem/s 3.7005 Melem/s]
                 change:
                        time:   [-1.3035% -0.9492% -0.5849%] (p = 0.00 < 0.05)
                        thrpt:  [+0.5883% +0.9583% +1.3207%]
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
  
no_op/64                time:   [25.175 ms 25.221 ms 25.274 ms]
                        thrpt:  [3.9567 Melem/s 3.9650 Melem/s 3.9722 Melem/s]
                 change:
                        time:   [-0.4796% -0.0752% +0.3019%] (p = 0.72 > 0.05)
                        thrpt:  [-0.3010% +0.0753% +0.4819%]
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
  
no_op/256               time:   [23.487 ms 23.539 ms 23.601 ms]
                        thrpt:  [4.2372 Melem/s 4.2482 Melem/s 4.2576 Melem/s]
                 change:
                        time:   [-2.9132% -2.5070% -2.0627%] (p = 0.00 < 0.05)
                        thrpt:  [+2.1062% +2.5715% +3.0006%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

Lai

runtime_only
  Instructions:               28479 (No change)
  L1 Accesses:                40528 (No change)
  L2 Accesses:                  243 (No change)
  RAM Accesses:                 977 (No change)
  Estimated Cycles:           75938 (No change)

no_op_x1
  Instructions:           500699958 (+0.023513%)
  L1 Accesses:            754985269 (+0.015636%)
  L2 Accesses:              1575761 (No change)
  RAM Accesses:                1308 (No change)
  Estimated Cycles:       762909854 (+0.015474%)

no_op_x32
  Instructions:           129944394 (-0.008193%)
  L1 Accesses:            204870254 (-0.005207%)
  L2 Accesses:                49632 (No change)
  RAM Accesses:                1373 (No change)
  Estimated Cycles:       205166469 (-0.005200%)

no_op_x64
  Instructions:           123956020 (+0.003089%)
  L1 Accesses:            195968649 (+0.001997%)
  L2 Accesses:                39143 (No change)
  RAM Accesses:                1441 (No change)
  Estimated Cycles:       196214799 (+0.001994%)

no_op_x256
  Instructions:           119631265 (+0.000911%)
  L1 Accesses:            188921752 (+0.000577%)
  L2 Accesses:               649009 (No change)
  RAM Accesses:                1869 (No change)
  Estimated Cycles:       192232212 (+0.000567%)

Fixes #145. A data race was observed in Pr #144, where with high concurrency in user-space, and single threaded sqpoll in the ring, we could trigger a panic when submitting an entry to the queue The speculation is that the submission queue is full, but no ops have yet been executed and placed in completion queue by the ring. The submit call therefore submits the queue, but doesn't free any sqe's . The fix, which is not very elegant, does busy-polling on full. Co-authored-by: ollie-etl <Oliver Bunting@etlsystems.com>

# 0.4.0 (November 5th, 2022) ### Fixed - Fix panic in Deref/DerefMut for Slice extending into uninitialized part of the buffer ([#52]) - docs: all-features = true ([#84]) - fix fs unit tests to avoid parallelism ([#121]) - Box the socket address to allow moving the Connect future ([#126]) - rt: Fix data race ([#146]) ### Added - Implement fs::File::readv_at()/writev_at() ([#87]) - fs: implement FromRawFd for File ([#89]) - Implement `AsRawFd` for `TcpStream` ([#94]) - net: add TcpListener.local_addr method ([#107]) - net: add TcpStream.write_all ([#111]) - driver: add Builder API as an option to start ([#113]) - Socket and TcpStream shutdown ([#124]) - fs: implement fs::File::from_std ([#131]) - net: implement FromRawFd for TcpStream ([#132]) - fs: implement OpenOptionsExt for OpenOptions ([#133]) - Add NoOp support ([#134]) - Add writev to TcpStream ([#136]) - sync TcpStream, UnixStream and UdpSocket functionality ([#141]) - Add benchmarks for no-op submission ([#144]) - Expose runtime structure ([#148]) ### Changed - driver: batch submit requests and add benchmark ([#78]) - Depend on io-uring version ^0.5.8 ([#153]) ### Internal Improvements - chore: fix clippy lints ([#99]) - io: refactor post-op logic in ops into Completable ([#116]) - Support multi completion events: v2 ([#130]) - simplify driver operation futures ([#139]) - rt: refactor runtime to avoid Rc\<RefCell\<...>> ([#142]) - Remove unused dev-dependencies ([#143]) - chore: types and fields explicitly named ([#149]) - Ignore errors from uring while cleaning up ([#154]) - rt: drop runtime before driver during shutdown ([#155]) - rt: refactor drop logic ([#157]) - rt: fix error when calling block_on twice ([#162]) ### CI changes - chore: update actions/checkout action to v3 ([#90]) - chore: add all-systems-go ci check ([#98]) - chore: add clippy to ci ([#100]) - ci: run cargo test --doc ([#135]) [#52]: #52 [#78]: #78 [#84]: #84 [#87]: #87 [#89]: #89 [#90]: #90 [#94]: #94 [#98]: #98 [#99]: #99 [#100]: #100 [#107]: #107 [#111]: #111 [#113]: #113 [#116]: #116 [#121]: #121 [#124]: #124 [#126]: #126 [#130]: #130 [#131]: #131 [#132]: #132 [#133]: #133 [#134]: #134 [#135]: #135 [#136]: #136 [#139]: #139 [#141]: #141 [#142]: #142 [#143]: #143 [#144]: #144 [#146]: #146 [#148]: #148 [#149]: #149 [#153]: #153 [#154]: #154 [#155]: #155 [#157]: #157 [#162]: #162

ollie-etl added 2 commits October 24, 2022 13:37

Add benchmarks for no-op

2e4713e

Build, don't run benchmarks in ci

397ee48

This was referenced Oct 24, 2022

driver::op::submit_with() panic #145

Closed

rt: Fix data race #146

Merged

rt: refactor runtime to avoid Rc<RefCell<...>> #142

Merged

Noah-Kennedy approved these changes Oct 24, 2022

View reviewed changes

Merge branch 'master' into simple-benchmark

1bb1d54

Noah-Kennedy merged commit c6b884e into tokio-rs:master Oct 24, 2022

ollie-etl mentioned this pull request Oct 24, 2022

rt: Hang on too small completion queue #147

Closed

FrankReh mentioned this pull request Nov 5, 2022

chore: prepare tokio-uring v0.4.0 #166

Merged

mladedav mentioned this pull request Oct 14, 2024

Fix clippy warnings #313

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarks for no-op submission #144

Add benchmarks for no-op submission #144

ollie-etl commented Oct 24, 2022 •

edited

Loading

Add benchmarks for no-op submission #144

Add benchmarks for no-op submission #144

Conversation

ollie-etl commented Oct 24, 2022 • edited Loading

Some results

Criterion

Lai

ollie-etl commented Oct 24, 2022 •

edited

Loading