Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: only enable quinn-udp fast-apple-datapath on arm #2427

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mxinden
Copy link
Collaborator

@mxinden mxinden commented Feb 7, 2025

Only enable fast-apple-datapath on aarch64.

Old MacOS versions (e.g. 10.15) don't fully support sendmsg_x and recvmsg_x, both enabled by fast-apple-datapath. Instead of feature gating by OS version, simply do so by CPU arch, leveraging the fact that newer MacOS devices use ARM.


Draft for now. Debugging mozilla-central failure on MacOS 10.15.

@mxinden mxinden force-pushed the fast-apple-datapath-arm branch from 4f94edf to 3b8f248 Compare February 7, 2025 16:00
Copy link

github-actions bot commented Feb 7, 2025

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to b6e4cfc.

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

@larseggert
Copy link
Collaborator

This is odd, because IIRC these syscalls predate Catalina by years.

@mxinden
Copy link
Collaborator Author

mxinden commented Feb 9, 2025

Thanks @larseggert for the info.

All suspicions for now. I just know it is related to Neqo v0.12 and MacOS 10.15.

Might as well be a syscall from the mtu crate which we are introducing through Neqo v0.12 as well.

(Thus far I have no other way to reproduce than Mozilla's CI, a single run take ~6h, and logs have not been helpful.)

I will continue to debug this.

@mxinden mxinden force-pushed the fast-apple-datapath-arm branch from 3b8f248 to f53d539 Compare February 10, 2025 08:40
Copy link

codecov bot commented Feb 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.28%. Comparing base (f8946d5) to head (f53d539).
Report is 26 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2427      +/-   ##
==========================================
- Coverage   95.32%   95.28%   -0.04%     
==========================================
  Files         114      114              
  Lines       36868    37111     +243     
  Branches    36868    37111     +243     
==========================================
+ Hits        35145    35363     +218     
- Misses       1719     1742      +23     
- Partials        4        6       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

Benchmark results

Performance differences relative to b6e4cfc.

decode 4096 bytes, mask ff: No change in performance detected.
       time:   [10.855 µs 10.886 µs 10.926 µs]
       change: [-0.4051% +0.1434% +0.7503%] (p = 0.66 > 0.05)

Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) low severe
2 (2.00%) low mild
1 (1.00%) high mild
7 (7.00%) high severe

decode 1048576 bytes, mask ff: No change in performance detected.
       time:   [3.1285 ms 3.1377 ms 3.1482 ms]
       change: [-0.3673% +0.0949% +0.5487%] (p = 0.67 > 0.05)

Found 10 outliers among 100 measurements (10.00%)
10 (10.00%) high severe

decode 4096 bytes, mask 7f: No change in performance detected.
       time:   [17.615 µs 17.652 µs 17.696 µs]
       change: [-1.1465% -0.4724% +0.1059%] (p = 0.16 > 0.05)

Found 17 outliers among 100 measurements (17.00%)
1 (1.00%) low severe
2 (2.00%) low mild
2 (2.00%) high mild
12 (12.00%) high severe

decode 1048576 bytes, mask 7f: No change in performance detected.
       time:   [5.4007 ms 5.4140 ms 5.4292 ms]
       change: [-0.4491% -0.0948% +0.2608%] (p = 0.62 > 0.05)

Found 12 outliers among 100 measurements (12.00%)
12 (12.00%) high severe

decode 4096 bytes, mask 3f: No change in performance detected.
       time:   [6.6813 µs 6.7256 µs 6.7734 µs]
       change: [-0.3220% +0.5041% +1.3374%] (p = 0.25 > 0.05)

Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) low mild
5 (5.00%) high mild
4 (4.00%) high severe

decode 1048576 bytes, mask 3f: No change in performance detected.
       time:   [1.7585 ms 1.7615 ms 1.7657 ms]
       change: [-0.0040% +0.1601% +0.4043%] (p = 0.14 > 0.05)

Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
3 (3.00%) high mild
4 (4.00%) high severe

coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [91.176 ns 91.543 ns 91.905 ns]
       change: [-0.6672% -0.1426% +0.3368%] (p = 0.59 > 0.05)

Found 13 outliers among 100 measurements (13.00%)
11 (11.00%) high mild
2 (2.00%) high severe

coalesce_acked_from_zero 3+1 entries: Change within noise threshold.
       time:   [109.51 ns 109.87 ns 110.24 ns]
       change: [+0.0489% +0.3224% +0.6528%] (p = 0.04 < 0.05)

Found 18 outliers among 100 measurements (18.00%)
1 (1.00%) low mild
5 (5.00%) high mild
12 (12.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [108.95 ns 109.36 ns 109.85 ns]
       change: [-1.1430% -0.3768% +0.2520%] (p = 0.33 > 0.05)

Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) low mild
2 (2.00%) high mild
6 (6.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [93.407 ns 93.573 ns 93.760 ns]
       change: [-0.4809% +0.4096% +1.3240%] (p = 0.38 > 0.05)

Found 14 outliers among 100 measurements (14.00%)
6 (6.00%) high mild
8 (8.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [111.16 ms 111.23 ms 111.29 ms]
       change: [-0.1858% -0.1061% -0.0254%] (p = 0.01 < 0.05)

Found 22 outliers among 100 measurements (22.00%)
11 (11.00%) low mild
9 (9.00%) high mild
2 (2.00%) high severe

SentPackets::take_ranges: No change in performance detected.
       time:   [5.1737 µs 5.2566 µs 5.3367 µs]
       change: [-17.793% -5.8337% +2.7663%] (p = 0.49 > 0.05)

Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe

transfer/pacing-false/varying-seeds: Change within noise threshold.
       time:   [36.784 ms 36.855 ms 36.928 ms]
       change: [-0.9907% -0.7265% -0.4425%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

transfer/pacing-true/varying-seeds: Change within noise threshold.
       time:   [36.695 ms 36.762 ms 36.830 ms]
       change: [-1.1623% -0.9000% -0.6245%] (p = 0.00 < 0.05)
transfer/pacing-false/same-seed: Change within noise threshold.
       time:   [36.501 ms 36.574 ms 36.645 ms]
       change: [-1.3617% -1.1059% -0.8641%] (p = 0.00 < 0.05)
transfer/pacing-true/same-seed: Change within noise threshold.
       time:   [36.843 ms 36.899 ms 36.955 ms]
       change: [-0.7508% -0.5247% -0.3075%] (p = 0.00 < 0.05)
1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: No change in performance detected.
       time:   [861.13 ms 871.38 ms 882.03 ms]
       thrpt:  [113.38 MiB/s 114.76 MiB/s 116.13 MiB/s]
change:
       time:   [-1.7234% -0.1261% +1.5011%] (p = 0.88 > 0.05)
       thrpt:  [-1.4789% +0.1263% +1.7536%]
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.
       time:   [319.81 ms 323.15 ms 326.44 ms]
       thrpt:  [30.633 Kelem/s 30.946 Kelem/s 31.269 Kelem/s]
change:
       time:   [-0.6712% +0.8043% +2.4273%] (p = 0.32 > 0.05)
       thrpt:  [-2.3698% -0.7978% +0.6757%]
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.
       time:   [25.489 ms 25.654 ms 25.821 ms]
       thrpt:  [38.729  elem/s 38.980  elem/s 39.232  elem/s]
change:
       time:   [-0.6669% +0.2354% +1.1855%] (p = 0.61 > 0.05)
       thrpt:  [-1.1716% -0.2348% +0.6714%]
1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: No change in performance detected.
       time:   [1.8596 s 1.8787 s 1.8981 s]
       thrpt:  [52.684 MiB/s 53.227 MiB/s 53.776 MiB/s]
change:
       time:   [-1.1150% +0.2648% +1.6227%] (p = 0.71 > 0.05)
       thrpt:  [-1.5968% -0.2641% +1.1275%]

Client/server transfer results

Performance differences relative to b6e4cfc.

Transfer of 33554432 bytes over loopback, 30 runs. All unit-less numbers are in milliseconds.

Client Server CC Pacing Mean ± σ Min Max Δ main Δ main
neqo neqo reno on 515.7 ± 39.9 446.8 594.7 -28.9 1.4%
neqo neqo reno 511.2 ± 42.2 457.8 655.6 -44.5 2.1%
neqo neqo cubic on 529.4 ± 31.7 452.4 583.7 💚 -13.9 0.6%
neqo neqo cubic 531.7 ± 39.3 456.5 601.2 💚 -2.8 0.1%
google neqo reno on 899.4 ± 99.4 656.2 994.0 💚 -2.7 0.1%
google neqo reno 898.4 ± 99.2 644.2 999.8 💚 -3.6 0.1%
google neqo cubic on 893.9 ± 100.9 650.7 1007.4 💚 -1.1 0.0%
google neqo cubic 894.4 ± 93.6 666.8 988.1 💚 -0.1 0.0%
google google 530.9 ± 11.1 518.4 565.0 💚 -4.0 0.2%
neqo msquic reno on 222.4 ± 23.2 203.4 333.3 💚 -3.7 0.4%
neqo msquic reno 228.9 ± 42.2 201.0 383.6 💔 7.6 0.8%
neqo msquic cubic on 213.7 ± 10.6 196.9 241.9 💚 -12.0 1.4%
neqo msquic cubic 219.8 ± 10.7 202.9 242.3 💔 0.2 0.0%
msquic msquic 111.8 ± 10.0 98.9 137.7 💚 -7.0 1.5%

⬇️ Download logs

@larseggert
Copy link
Collaborator

@mxinden in case it helps, I have an old Intel MacBook laying around somewhere.

@mxinden
Copy link
Collaborator Author

mxinden commented Feb 11, 2025

A datapoint for sendmsg_x failure on MacOS <= 13:

oven-sh/bun@26526cb#diff-8060260f9888a3a03c4d236f22769baacc4c9aec45abf7f49b5b2bbab7e39579R307

I reached out to the author.

@mxinden
Copy link
Collaborator Author

mxinden commented Feb 11, 2025

@mxinden in case it helps, I have an old Intel MacBook laying around somewhere.

@larseggert that would be very helpful! Could you checkout the quinn project and run its unit tests with the fast-apple-datapath feature enabled:

cargo test -p quinn-udp --features fast-apple-datapath

Which MacOS version is it running?

@larseggert
Copy link
Collaborator

larseggert commented Feb 11, 2025

System Version: macOS 10.15.7 (19H2026)

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/tests.rs (target/debug/deps/tests-ba553e2506d92958)

running 6 tests
test basic has been running for over 60 seconds
test ecn_v4 has been running for over 60 seconds
test ecn_v4_mapped_v6 has been running for over 60 seconds
test ecn_v6 has been running for over 60 seconds

And it's hanging there, with the CPU maxed out.

@mxinden
Copy link
Collaborator Author

mxinden commented Feb 11, 2025

That is good news. Thanks for reproducing. This matches what we are seeing on Mozilla's CI and above linked commit on oven-sh/bun.

@mxinden
Copy link
Collaborator Author

mxinden commented Feb 11, 2025

Another reference to the issue: https://bun.sh/blog/bun-v1.1.22#udp-socket-hangs-on-macos-13

@larseggert
Copy link
Collaborator

So what's happening is that in the receive path, the cmsg_iter loop in decode_recv never terminates, because the iterator returns a cmsghdr where all fields are zero, which hits the _ case in the loop.

I wonder if libc::CMSG_NXTHDR is not correct for older macOS versions... Anyway, we can work around this in quinn-udp.

mxinden added a commit to mxinden/quinn that referenced this pull request Feb 12, 2025
On MacOS < 14, with `fast-apple-datapath` feature, calls to
`libc::CMSG_NXTHDR` might continuously return empty (i.e. all zero)
`libc::cmsghdr` instead of a null pointer. This results in a busy loop
in `decode_recv`:

``` rust
let cmsg_iter = unsafe { cmsg::Iter::new(hdr) };
for cmsg in cmsg_iter {
	match (cmsg.cmsg_level, cmsg.cmsg_type) {
```
https://github.com/quinn-rs/quinn/blob/b4378bb39dab4b58a1e6a3fea4fff9f87033dab6/quinn-udp/src/unix.rs#L685C1-L687C50

This commit fixes the above, returning a `null_mut()` pointer on an
empty `libc::cmsgdhr`, thus terminating the `cmsg_iter`.

See also mozilla/neqo#2427 for details.
mxinden added a commit to mxinden/quinn that referenced this pull request Feb 12, 2025
On MacOS < 14, with `fast-apple-datapath` feature, calls to
`libc::CMSG_NXTHDR` might continuously return empty (i.e. all zero)
`libc::cmsghdr` instead of a null pointer. This results in a busy loop
in `decode_recv`:

``` rust
let cmsg_iter = unsafe { cmsg::Iter::new(hdr) };
for cmsg in cmsg_iter {
	match (cmsg.cmsg_level, cmsg.cmsg_type) {
```
https://github.com/quinn-rs/quinn/blob/b4378bb39dab4b58a1e6a3fea4fff9f87033dab6/quinn-udp/src/unix.rs#L685C1-L687C50

This commit fixes the above, returning a `null_mut()` pointer on an
empty `libc::cmsgdhr`, thus terminating the `cmsg_iter`.

See also mozilla/neqo#2427 for details.
mxinden added a commit to mxinden/quinn that referenced this pull request Feb 12, 2025
On MacOS < 14, with `fast-apple-datapath` feature, calls to
`libc::CMSG_NXTHDR` might continuously return empty (i.e. all zero)
`libc::cmsghdr` instead of a null pointer. This results in a busy loop
in `decode_recv`:

``` rust
let cmsg_iter = unsafe { cmsg::Iter::new(hdr) };
for cmsg in cmsg_iter {
	match (cmsg.cmsg_level, cmsg.cmsg_type) {
```
https://github.com/quinn-rs/quinn/blob/b4378bb39dab4b58a1e6a3fea4fff9f87033dab6/quinn-udp/src/unix.rs#L685C1-L687C50

This commit fixes the above, returning a `null_mut()` pointer on an
empty `libc::cmsgdhr`, thus terminating the `cmsg_iter`.

See also mozilla/neqo#2427 for details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants