Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarks for no-op submission #144

Merged
merged 3 commits into from
Oct 24, 2022

Conversation

ollie-etl
Copy link
Contributor

@ollie-etl ollie-etl commented Oct 24, 2022

This PR adds some baseline benchmarks for NoOp submission / completion. This allows users to very roughly estimate the maximum bound for the throughput achievable using tokio-uring.

2 Benchmarks are added: Criterion, measuring wall time, and Lai, measuring cpu cycles. To run lai benchmarks, cachegrind is required.

###Notes
I've no doubt these can be improved, and should someone more experienced in micro-benchmarking wish to point out the glaring inefficiencies, I'd be very happy.

There are 2 lines of commented out code. These trigger a panic, issue #145

Some results

Hardware: 6 core i7

Criterion

o_op/1                  time:   [145.16 ms 145.48 ms 145.82 ms]
                        thrpt:  [685.78 Kelem/s 687.38 Kelem/s 688.92 Kelem/s]
                 change:
                        time:   [-2.8876% -2.4510% -2.0114%] (p = 0.00 < 0.05)
                        thrpt:  [+2.0526% +2.5126% +2.9734%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
  
no_op/32                time:   [27.023 ms 27.078 ms 27.148 ms]
                        thrpt:  [3.6835 Melem/s 3.6930 Melem/s 3.7005 Melem/s]
                 change:
                        time:   [-1.3035% -0.9492% -0.5849%] (p = 0.00 < 0.05)
                        thrpt:  [+0.5883% +0.9583% +1.3207%]
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
  
no_op/64                time:   [25.175 ms 25.221 ms 25.274 ms]
                        thrpt:  [3.9567 Melem/s 3.9650 Melem/s 3.9722 Melem/s]
                 change:
                        time:   [-0.4796% -0.0752% +0.3019%] (p = 0.72 > 0.05)
                        thrpt:  [-0.3010% +0.0753% +0.4819%]
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
  
no_op/256               time:   [23.487 ms 23.539 ms 23.601 ms]
                        thrpt:  [4.2372 Melem/s 4.2482 Melem/s 4.2576 Melem/s]
                 change:
                        time:   [-2.9132% -2.5070% -2.0627%] (p = 0.00 < 0.05)
                        thrpt:  [+2.1062% +2.5715% +3.0006%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

Lai

runtime_only
  Instructions:               28479 (No change)
  L1 Accesses:                40528 (No change)
  L2 Accesses:                  243 (No change)
  RAM Accesses:                 977 (No change)
  Estimated Cycles:           75938 (No change)

no_op_x1
  Instructions:           500699958 (+0.023513%)
  L1 Accesses:            754985269 (+0.015636%)
  L2 Accesses:              1575761 (No change)
  RAM Accesses:                1308 (No change)
  Estimated Cycles:       762909854 (+0.015474%)

no_op_x32
  Instructions:           129944394 (-0.008193%)
  L1 Accesses:            204870254 (-0.005207%)
  L2 Accesses:                49632 (No change)
  RAM Accesses:                1373 (No change)
  Estimated Cycles:       205166469 (-0.005200%)

no_op_x64
  Instructions:           123956020 (+0.003089%)
  L1 Accesses:            195968649 (+0.001997%)
  L2 Accesses:                39143 (No change)
  RAM Accesses:                1441 (No change)
  Estimated Cycles:       196214799 (+0.001994%)

no_op_x256
  Instructions:           119631265 (+0.000911%)
  L1 Accesses:            188921752 (+0.000577%)
  L2 Accesses:               649009 (No change)
  RAM Accesses:                1869 (No change)
  Estimated Cycles:       192232212 (+0.000567%)

Noah-Kennedy pushed a commit that referenced this pull request Oct 24, 2022
Fixes #145. A data race was observed in Pr #144, where with high concurrency in user-space, and single threaded sqpoll in the ring, we could trigger a panic when submitting an entry to the queue

The speculation is that the submission queue is full, but no ops have yet been executed and placed in completion queue by the ring. The submit call therefore submits the queue, but doesn't free any sqe's .

The fix, which is not very elegant, does busy-polling on full.

Co-authored-by: ollie-etl <Oliver Bunting@etlsystems.com>
@Noah-Kennedy Noah-Kennedy merged commit c6b884e into tokio-rs:master Oct 24, 2022
Noah-Kennedy pushed a commit that referenced this pull request Nov 5, 2022
# 0.4.0 (November 5th, 2022)

### Fixed

- Fix panic in Deref/DerefMut for Slice extending into uninitialized
part of the buffer ([#52])
- docs: all-features = true ([#84])
- fix fs unit tests to avoid parallelism ([#121])
- Box the socket address to allow moving the Connect future ([#126])
- rt: Fix data race ([#146])

### Added

- Implement fs::File::readv_at()/writev_at() ([#87])
- fs: implement FromRawFd for File ([#89])
- Implement `AsRawFd` for `TcpStream` ([#94])
- net: add TcpListener.local_addr method ([#107])
- net: add TcpStream.write_all ([#111])
- driver: add Builder API as an option to start ([#113])
- Socket and TcpStream shutdown ([#124])
- fs: implement fs::File::from_std ([#131])
- net: implement FromRawFd for TcpStream ([#132])
- fs: implement OpenOptionsExt for OpenOptions ([#133])
- Add NoOp support ([#134])
- Add writev to TcpStream ([#136])
- sync TcpStream, UnixStream and UdpSocket functionality ([#141])
- Add benchmarks for no-op submission ([#144])
- Expose runtime structure ([#148])

### Changed

- driver: batch submit requests and add benchmark ([#78])
- Depend on io-uring version ^0.5.8 ([#153])

### Internal Improvements

- chore: fix clippy lints ([#99])
- io: refactor post-op logic in ops into Completable ([#116])
- Support multi completion events: v2 ([#130])
- simplify driver operation futures ([#139])
- rt: refactor runtime to avoid Rc\<RefCell\<...>> ([#142])
- Remove unused dev-dependencies ([#143])
- chore: types and fields explicitly named ([#149])
- Ignore errors from uring while cleaning up ([#154])
- rt: drop runtime before driver during shutdown ([#155])
- rt: refactor drop logic ([#157])
- rt: fix error when calling block_on twice ([#162])

### CI changes

- chore: update actions/checkout action to v3 ([#90])
- chore: add all-systems-go ci check ([#98])
- chore: add clippy to ci ([#100])
- ci: run cargo test --doc ([#135])


[#52]: #52
[#78]: #78
[#84]: #84
[#87]: #87
[#89]: #89
[#90]: #90
[#94]: #94
[#98]: #98
[#99]: #99
[#100]: #100
[#107]: #107
[#111]: #111
[#113]: #113
[#116]: #116
[#121]: #121
[#124]: #124
[#126]: #126
[#130]: #130
[#131]: #131
[#132]: #132
[#133]: #133
[#134]: #134
[#135]: #135
[#136]: #136
[#139]: #139
[#141]: #141
[#142]: #142
[#143]: #143
[#144]: #144
[#146]: #146
[#148]: #148
[#149]: #149
[#153]: #153
[#154]: #154
[#155]: #155
[#157]: #157
[#162]: #162
@mladedav mladedav mentioned this pull request Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants