feat: Use `FxHasher` in places where we don't need DDoS resistance #2342

larseggert · 2025-01-10T13:57:23Z

I think this may be worthwhile. The cargo benches don't consistently show a benefit, but the loopback transfers on the bencher machine are faster, e.g.,

neqo 	neqo 	cubic 	on 	1504 	495.9 ± 96.2 	426.6 	712.7

without this PR but

neqo 	neqo 	cubic 	on 	1504 	429.1 ± 9.6 	415.4 	442.6

with it.

(I'll see if I can improve CI so that we also see the differences to main for the table results.)

WIP

codecov · 2025-01-10T14:29:20Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.40%. Comparing base (1df2a5a) to head (bbcb365).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2342   +/-   ##
=======================================
  Coverage   95.39%   95.40%           
=======================================
  Files         115      115           
  Lines       36982    36982           
  Branches    36982    36982           
=======================================
+ Hits        35279    35281    +2     
+ Misses       1697     1694    -3     
- Partials        6        7    +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2025-01-10T14:32:19Z

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to 1df2a5a.

neqo-latest as client

neqo-latest vs. aioquic: Z ⚠️L1
neqo-latest vs. go-x-net: BP BA
neqo-latest vs. haproxy: 🚀C1 BP BA
neqo-latest vs. kwik: 🚀L1 ⚠️C1 BP BA
neqo-latest vs. lsquic: L1 C1
neqo-latest vs. msquic: ⚠️R Z A L1 🚀L2 C1
neqo-latest vs. mvfst: A L1 C1 BP BA
neqo-latest vs. nginx: BP BA
neqo-latest vs. ngtcp2: CM
neqo-latest vs. picoquic: ⚠️Z A 🚀C1 ⚠️L1
neqo-latest vs. quic-go: A
neqo-latest vs. quiche: BP BA
neqo-latest vs. s2n-quic: 🚀BP BA CM
neqo-latest vs. tquic: S BP BA
neqo-latest vs. xquic: A

neqo-latest as server

aioquic vs. neqo-latest: CM
go-x-net vs. neqo-latest: CM
kwik vs. neqo-latest: BP BA CM
lsquic vs. neqo-latest: ⚠️CM
msquic vs. neqo-latest: Z U CM
mvfst vs. neqo-latest: Z A L1 C1 CM
openssl vs. neqo-latest: LR M CM
quic-go vs. neqo-latest: run cancelled after 20 min
quiche vs. neqo-latest: CM
quinn vs. neqo-latest: V2 CM
s2n-quic vs. neqo-latest: CM
tquic vs. neqo-latest: CM
xquic vs. neqo-latest: M CM

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: H DC LR C20 M S R 3 B U A ⚠️L1 L2 C1 C2 6 V2 BP BA
neqo-latest vs. go-x-net: H DC LR M B U A L2 C2 6
neqo-latest vs. haproxy: H DC LR C20 M S R Z 3 B U A L1 L2 🚀C1 C2 6 V2
neqo-latest vs. kwik: H DC LR C20 M S R Z 3 B U A 🚀L1 L2 ⚠️C1 C2 6 V2
neqo-latest vs. lsquic: H DC LR C20 M S R Z 3 B U E A L2 C2 6 V2 BP BA
neqo-latest vs. msquic: H DC LR C20 M S ⚠️R B U 🚀L2 C2 6 V2 BP BA
neqo-latest vs. mvfst: H DC LR M R Z 3 B U L2 C2 6
neqo-latest vs. neqo: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
neqo-latest vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. ngtcp2: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA
neqo-latest vs. picoquic: H DC LR C20 M S R ⚠️Z 3 B U E ⚠️L1 L2 🚀C1 C2 6 V2 BP BA
neqo-latest vs. quic-go: H DC LR C20 M S R Z 3 B U L1 L2 C1 C2 6 BP BA
neqo-latest vs. quiche: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. quinn: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 BP BA
neqo-latest vs. s2n-quic: H DC LR C20 M S R 3 B U E A L1 L2 C1 C2 6 🚀BP
neqo-latest vs. tquic: H DC LR C20 M R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. xquic: H DC LR C20 M R Z 3 B U L1 L2 C1 C2 6 BP BA

neqo-latest as server

aioquic vs. neqo-latest: H DC LR C20 M S R Z 3 B A L1 L2 C1 C2 6 V2 BP BA
chrome vs. neqo-latest: 3
go-x-net vs. neqo-latest: H DC LR M B U A L2 C2 6 BP BA
kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
lsquic vs. neqo-latest: 🚀~~H DC LR M S R 3 B E A L1 L2 C1 C2 6 V2 BP BA~~
msquic vs. neqo-latest: H DC LR C20 M S R B A L1 L2 C1 C2 6 V2 BP BA
mvfst vs. neqo-latest: H DC LR M 3 B L2 C2 6 BP BA
neqo vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
ngtcp2 vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
openssl vs. neqo-latest: H DC C20 S R 3 B A L2 C2 6 BP BA
picoquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2 BP BA CM
quiche vs. neqo-latest: H DC LR M S R Z 3 B A L1 L2 C1 C2 6 BP BA
quinn vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 BP BA
s2n-quic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6 BP BA
tquic vs. neqo-latest: H DC LR M S R Z 3 B A L1 L2 C1 C2 6 BP BA
xquic vs. neqo-latest: H DC LR C20 S R Z 3 B U A L1 L2 C1 C2 6 BP BA

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: E CM
neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2 CM
neqo-latest vs. haproxy: E CM
neqo-latest vs. kwik: E CM
neqo-latest vs. lsquic: CM
neqo-latest vs. msquic: 3 E CM
neqo-latest vs. mvfst: C20 S E V2 CM
neqo-latest vs. nginx: E V2 CM
neqo-latest vs. picoquic: CM
neqo-latest vs. quic-go: E V2 CM
neqo-latest vs. quiche: E V2 CM
neqo-latest vs. quinn: V2 CM
neqo-latest vs. s2n-quic: Z V2
neqo-latest vs. tquic: E V2 CM
neqo-latest vs. xquic: S E V2 CM

neqo-latest as server

aioquic vs. neqo-latest: U E
chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2 BP BA CM
go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
kwik vs. neqo-latest: E
lsquic vs. neqo-latest: C20 Z U
msquic vs. neqo-latest: 3 E
mvfst vs. neqo-latest: C20 S R U E V2
openssl vs. neqo-latest: Z U E L1 C1 V2
quiche vs. neqo-latest: C20 U E V2
s2n-quic vs. neqo-latest: C20 Z U V2
tquic vs. neqo-latest: C20 U E V2
xquic vs. neqo-latest: E V2

github-actions · 2025-01-10T18:18:02Z

Benchmark results

Performance differences relative to 1df2a5a.

decode 4096 bytes, mask ff: No change in performance detected.

       time:   [12.320 µs 12.361 µs 12.411 µs]
       change: [-1.1343% -0.2400% +0.4683%] (p = 0.60 > 0.05)
Found 18 outliers among 100 measurements (18.00%)

1 (1.00%) low severe

7 (7.00%) low mild

1 (1.00%) high mild

9 (9.00%) high severe

decode 1048576 bytes, mask ff: 💚 Performance has improved.

       time:   [2.8331 ms 2.8421 ms 2.8516 ms]
       change: [-8.2453% -7.8524% -7.4374%] (p = 0.00 < 0.05)
Found 9 outliers among 100 measurements (9.00%)

1 (1.00%) low mild

1 (1.00%) high mild

7 (7.00%) high severe

decode 4096 bytes, mask 7f: No change in performance detected.

       time:   [20.859 µs 20.915 µs 20.975 µs]
       change: [-0.5860% -0.0010% +0.5818%] (p = 1.00 > 0.05)
Found 19 outliers among 100 measurements (19.00%)

2 (2.00%) low severe

2 (2.00%) low mild

1 (1.00%) high mild

14 (14.00%) high severe

decode 1048576 bytes, mask 7f: 💚 Performance has improved.

       time:   [4.5404 ms 4.5619 ms 4.5952 ms]
       change: [-13.094% -12.629% -11.975%] (p = 0.00 < 0.05)
Found 12 outliers among 100 measurements (12.00%)

12 (12.00%) high severe

decode 4096 bytes, mask 3f: No change in performance detected.

       time:   [8.2845 µs 8.3215 µs 8.3636 µs]
       change: [+0.0551% +1.1111% +2.5631%] (p = 0.09 > 0.05)
Found 19 outliers among 100 measurements (19.00%)

5 (5.00%) low mild

2 (2.00%) high mild

12 (12.00%) high severe

decode 1048576 bytes, mask 3f: 💚 Performance has improved.

       time:   [1.5913 ms 1.6016 ms 1.6150 ms]
       change: [-9.5069% -8.9206% -8.1949%] (p = 0.00 < 0.05)
Found 11 outliers among 100 measurements (11.00%)

3 (3.00%) high mild

8 (8.00%) high severe

coalesce_acked_from_zero 1+1 entries: No change in performance detected.

       time:   [90.959 ns 91.253 ns 91.546 ns]
       change: [-0.5692% -0.0696% +0.4224%] (p = 0.80 > 0.05)
Found 10 outliers among 100 measurements (10.00%)

7 (7.00%) high mild

3 (3.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.

       time:   [109.59 ns 109.87 ns 110.18 ns]
       change: [-1.2378% -0.5216% -0.0079%] (p = 0.10 > 0.05)
Found 13 outliers among 100 measurements (13.00%)

2 (2.00%) low mild

2 (2.00%) high mild

9 (9.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.

       time:   [109.47 ns 110.11 ns 110.86 ns]
       change: [-1.1644% -0.3243% +0.3417%] (p = 0.43 > 0.05)
Found 17 outliers among 100 measurements (17.00%)

4 (4.00%) low severe

2 (2.00%) low mild

1 (1.00%) high mild

10 (10.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.

       time:   [93.224 ns 93.416 ns 93.637 ns]
       change: [-1.8505% -0.5499% +0.6714%] (p = 0.42 > 0.05)
Found 10 outliers among 100 measurements (10.00%)

4 (4.00%) high mild

6 (6.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.

       time:   [111.71 ms 111.84 ms 112.06 ms]
       change: [-0.5380% -0.4001% -0.1859%] (p = 0.00 < 0.05)
Found 18 outliers among 100 measurements (18.00%)

10 (10.00%) low mild

7 (7.00%) high mild

1 (1.00%) high severe

SentPackets::take_ranges: No change in performance detected.

       time:   [5.2751 µs 5.4262 µs 5.5801 µs]
       change: [-5.2229% +3.1104% +18.372%] (p = 0.74 > 0.05)
Found 4 outliers among 100 measurements (4.00%)

3 (3.00%) high mild

1 (1.00%) high severe

transfer/pacing-false/varying-seeds: Change within noise threshold.

       time:   [34.013 ms 34.082 ms 34.162 ms]
       change: [-0.8593% -0.5632% -0.2669%] (p = 0.00 < 0.05)
Found 4 outliers among 100 measurements (4.00%)

3 (3.00%) high mild

1 (1.00%) high severe

transfer/pacing-true/varying-seeds: Change within noise threshold.

       time:   [34.250 ms 34.313 ms 34.380 ms]
       change: [-0.7082% -0.4630% -0.2111%] (p = 0.00 < 0.05)
Found 2 outliers among 100 measurements (2.00%)

1 (1.00%) low mild

1 (1.00%) high severe

transfer/pacing-false/same-seed: Change within noise threshold.

       time:   [34.302 ms 34.354 ms 34.410 ms]
       change: [-1.3476% -1.0841% -0.8252%] (p = 0.00 < 0.05)
Found 3 outliers among 100 measurements (3.00%)

2 (2.00%) low mild

1 (1.00%) high severe

transfer/pacing-true/same-seed: Change within noise threshold.

       time:   [34.257 ms 34.319 ms 34.388 ms]
       change: [-1.9185% -1.6382% -1.3676%] (p = 0.00 < 0.05)
Found 2 outliers among 100 measurements (2.00%)

1 (1.00%) high mild

1 (1.00%) high severe

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: 💚 Performance has improved.

       time:   [800.15 ms 811.32 ms 822.73 ms]
       thrpt:  [121.55 MiB/s 123.26 MiB/s 124.98 MiB/s]
change:
       time:   [-7.9392% -6.1693% -4.3384%] (p = 0.00 < 0.05)
       thrpt:  [+4.5352% +6.5749% +8.6239%]

1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: 💚 Performance has improved.

       time:   [301.67 ms 305.86 ms 310.08 ms]
       thrpt:  [32.250 Kelem/s 32.695 Kelem/s 33.149 Kelem/s]
change:
       time:   [-7.3405% -5.7200% -4.0728%] (p = 0.00 < 0.05)
       thrpt:  [+4.2457% +6.0670% +7.9220%]

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.

       time:   [25.409 ms 25.558 ms 25.708 ms]
       thrpt:  [38.899  elem/s 39.127  elem/s 39.357  elem/s]
change:
       time:   [-0.6559% +0.1866% +1.0525%] (p = 0.66 > 0.05)
       thrpt:  [-1.0415% -0.1862% +0.6602%]
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) high mild

1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: 💚 Performance has improved.

       time:   [1.7796 s 1.8019 s 1.8241 s]
       thrpt:  [54.821 MiB/s 55.498 MiB/s 56.193 MiB/s]
change:
       time:   [-5.4595% -4.0571% -2.6755%] (p = 0.00 < 0.05)
       thrpt:  [+2.7491% +4.2286% +5.7748%]

Client/server transfer results

Performance differences relative to 1df2a5a.

Transfer of 33554432 bytes over loopback, 30 runs. All unit-less numbers are in milliseconds.

Client	Server	CC	Pacing	Mean ± σ	Min	Max	Δ `main`	Δ `main`
neqo	neqo	reno	on	525.0 ± 41.0	459.0	647.5	12.5	0.6%
neqo	neqo	reno		547.6 ± 111.6	459.1	1105.0	💔 47.5	2.3%
neqo	neqo	cubic	on	545.3 ± 19.6	494.4	578.7	💔 24.8	1.2%
neqo	neqo	cubic		543.6 ± 27.6	486.0	586.1	💔 22.3	1.0%
google	neqo	reno	on	878.4 ± 94.1	649.2	964.3	-3.5	-0.1%
google	neqo	reno		884.0 ± 101.4	632.2	1085.5	2.8	0.1%
google	neqo	cubic	on	884.3 ± 97.0	632.5	970.8	5.4	0.2%
google	neqo	cubic		880.1 ± 96.4	643.5	971.7	-3.0	-0.1%
google	google			542.3 ± 11.1	529.7	572.7	4.3	0.2%
neqo	msquic	reno	on	220.3 ± 11.7	205.4	248.0	-0.7	-0.1%
neqo	msquic	reno		219.3 ± 12.6	203.2	243.7	0.7	0.1%
neqo	msquic	cubic	on	223.7 ± 16.7	201.8	283.0	💔 7.7	0.9%
neqo	msquic	cubic		224.5 ± 20.8	199.3	305.7	5.3	0.6%
msquic	msquic			113.1 ± 11.5	99.6	139.7	-0.2	-0.0%

⬇️ Download logs

Signed-off-by: Lars Eggert <lars@eggert.org>

neqo-transport/src/tracking.rs

mxinden

👍 in general.

That said, I would prefer only replacing std::collectionsHash* where ever it proofs beneficial, e.g. not in unit tests.

Cargo.toml

neqo-transport/src/connection/tests/stream.rs

neqo-transport/src/crypto.rs

martinthomson

I've spotted two places where EnumMap could give us bigger wins.

I think that your performance gains largely derive from the changes to the client and server code. There, the security risk is limited (we're not using this server in real deployments).

Still, you should review the changes for security risk. This hasher could expose us to DoS if the hashed values are controlled by an adversary. I've checked the usage in our server code, which is fine because attackers don't get to control memory allocations (we use pointer values for the hash). Still, that makes me wonder whether we should be using Pin.

neqo-transport/src/crypto.rs

neqo-transport/src/tparams.rs

larseggert · 2025-02-06T11:04:30Z

@martinthomson thanks for the analysis. My plan is to add some benches first in another PR. I'll add some for those instances where you suggest to look into EnumMap as well.

Even if some of the macro benefits come from speeding up the demo client and server code, it's IMO still worth doing, since eliminating those overheads makes it easier to spot other bottlenecks.

About security, I didn't do much of an analysis, but I think the main use of this insecure hasher would be when looking up items (streams, unacked chunks) that while under the control of an attacker are also quite limited in what valid values are that wouldn't immediately cause a connection clause.

neqo-http3/src/connection.rs

martinthomson · 2025-02-06T21:47:15Z

I definitely agree with the point about removing the overheads from our toy code as much as possible. This seems like a pretty substantial win there, so it's worth doing. I doubt that my EnumMap suggestions will have a major impact, but the change did highlight the possibility (and it's not that much typing to switch over).

larseggert · 2025-02-07T15:03:57Z

I think the EnumMap work should be factored out another PR, it will cause a bunch of changes throughout.

neqo-http3/src/connection.rs

mxinden · 2025-02-12T14:01:21Z

neqo-http3/src/connection.rs

-    pub send_streams: HashMap<StreamId, Box<dyn SendStream>>,
-    pub recv_streams: HashMap<StreamId, Box<dyn RecvStream>>,
+    streams_with_pending_data: HashSet<StreamId>,
+    pub send_streams: IndexMap<StreamId, Box<dyn SendStream>>,


Conceptually, why is an IndexMap faster than a HashMap here? Both end up hashing the key, right?

It's not faster, it's possibly even slower. I made the change based on this comment: https://github.com/mozilla/neqo/pull/2342/files/bd061693b40e91b846c4d4cd1bc0ecfcd27c4e45#r1945509653

Do I understand correctly that we want to be able to iterate send_streams and the like in order? Given that HashMap does not guarantee order, you are suggesting to use IndexMap like we do in neqo-transport.

If so, the requirement for ordering might be worth documenting. It wasn't obvious to me.

In that case, preference for using BTreeMap, solely because it is in std and only use IndexMap in case it proves to be more performant than BTreeMap in a meaningful way.

Given that IndexMap is a fairly small dependency, the above is not a strong opinion.

So I'm not completely sure that we need consistently ordered iteration here. We rely on that in the transport crate because we want frame sending to follow a predictable pattern (clear the old stuff out first). However, that's not something we rely on at this layer of the stack, so whatever is fastest might be the right answer.

Maybe the first question we have is whether we need (insertion-)ordered iteration over this collection?

First, I think we should differentiate between RX and TX.

On RX (both here and in transport), I don't see a point in using anything that maintains order or has semantics that go beyond a simple map or set. I think all we do here is lookups based on received data.

On TX, (both here and in transport), do we really care that iteration is consistently ordered? All these data structures should mostly have a stable order while there are no insertions or removals. When there are, do we care that we'll have a different order afterwards? I can't convince myself that we do.

In other words, I think we should use whatever is fastest.

(Also, unrelated, I find the duplication/overlap between the transport and http3 crates really odd.)

mxinden · 2025-02-12T14:02:01Z

neqo-http3/src/features/extended_connect/webtransport_session.rs

@@ -43,8 +44,8 @@ pub struct WebTransportSession {
    state: SessionState,
    frame_reader: FrameReader,
    events: Box<dyn ExtendedConnectEvents>,
-    send_streams: BTreeSet<StreamId>,
-    recv_streams: BTreeSet<StreamId>,
+    send_streams: HashSet<StreamId>,


What is the benefit of a HashSet over a BTreeSet` here?

I was assuming HashSet to be faster?

I think we should only make these changes if we have proof that they are more performant.

Connecting this to the discussion above, might ordering (i.e. thus BTreeSet) be relevant here as well?

Fair point about only changing this after benchmarking.

mxinden · 2025-02-12T14:03:10Z

neqo-transport/src/lib.rs

@@ -75,6 +78,8 @@ pub use self::{
    version::Version,
 };

+pub type IndexMap<K, V> = indexmap::IndexMap<K, V, BuildHasherDefault<FxHasher>>;


Why is this more performant? Can you add a comment?

It's probably nor, see above as to why.

mxinden · 2025-02-12T14:04:44Z

I've checked the usage in our server code, which is fine because attackers don't get to control memory allocations (we use pointer values for the hash). Still, that makes me wonder whether we should be using Pin.

Good point. Though before we introduce the complexity of Pin, we might find a simple way around hashing the pointer values in the first place.

Signed-off-by: Lars Eggert <lars@eggert.org>

martinthomson · 2025-02-13T02:48:04Z

Though before we introduce the complexity of Pin, we might find a simple way around hashing the pointer values in the first place.

Definitely the right question to be asking. I think that it might be possible to use the first connection ID as a key for this sort of thing, but we don't tend to keep that around today, once we stop using it. Everything else -- as far as I know -- is ephemeral and therefore not suitable.

larseggert · 2025-02-15T15:35:30Z

I'm doing a benchmark in #2444 to quantify the benefits first. (It's not going well, a lot of variation run-to-run for some reason.)

mxinden · 2025-02-17T14:25:20Z

a lot of variation run-to-run for some reason

That is counterintuitive for me, given that it uses test-fixtures and thus does no IO via the OS. Let me know if you want me to look into it.

larseggert · 2025-02-17T15:51:43Z

I wonder if it's the CPU scheduler and frequency control on my Mac. Bencher seems much more stable.

mxinden · 2025-02-17T16:11:33Z

For what it is worth, here is #2444 on my machine:

➜  neqo-http3 git:(test-streams-bench) ✗ cargo bench --features bench

Benchmarking 1 streams of 1 bytes/multistream: Collecting 100 samples in estimated 5.1966 s (2400 
1 streams of 1 bytes/multistream
                        time:   [31.399 µs 31.555 µs 31.731 µs]
                        change: [-12.172% -10.263% -8.3468%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

Benchmarking 1000 streams of 1 bytes/multistream: Collecting 100 samples in estimated 6.2030 s (40
1000 streams of 1 bytes/multistream
                        time:   [13.088 ms 13.117 ms 13.151 ms]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

Benchmarking 10000 streams of 1 bytes/multistream: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 87.1s, or reduce sample count to 10.
Benchmarking 10000 streams of 1 bytes/multistream: Collecting 100 samples in estimated 87.111 s (1
10000 streams of 1 bytes/multistream
                        time:   [876.43 ms 882.16 ms 888.29 ms]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking 1 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 5.0982 s (22
1 streams of 1000 bytes/multistream
                        time:   [33.435 µs 33.884 µs 34.409 µs]
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

Benchmarking 100 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 5.1397 s (
100 streams of 1000 bytes/multistream
                        time:   [1.5683 ms 1.5823 ms 1.5968 ms]

Benchmarking 1000 streams of 1000 bytes/multistream: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.2s, or reduce sample count to 60.
Benchmarking 1000 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 7.2433 s 
1000 streams of 1000 bytes/multistream
                        time:   [66.837 ms 67.100 ms 67.391 ms]
Found 20 outliers among 100 measurements (20.00%)
  5 (5.00%) high mild
  15 (15.00%) high severe

➜  neqo-http3 git:(test-streams-bench) ✗ cat /proc/cpuinfo     
                               
model name      : AMD Ryzen 7 7840U w/ Radeon  780M Graphics

I don't see much deviation. Am I running the wrong version @larseggert?

larseggert · 2025-02-17T21:23:12Z

Can you run it again and see if there are changes run to run? That is where I see random improvements or regressions.

Signed-off-by: Lars Eggert <lars@eggert.org>

mxinden · 2025-02-18T19:30:38Z

Here are two more runs with vanilla #2444. No significant deviations. Note that I am not running your optimizations in this pull request.

➜  neqo-http3 git:(test-streams-bench) ✗ cargo bench --features bench
Benchmarking 1 streams of 1 bytes/multistream: Collecting 100 samples in estimated 5.0811 s 
1 streams of 1 bytes/multistream
                        time:   [31.514 µs 31.727 µs 32.013 µs]
                        change: [-0.3795% +0.5461% +1.6585%] (p = 0.28 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

Benchmarking 1000 streams of 1 bytes/multistream: Collecting 100 samples in estimated 6.2106
1000 streams of 1 bytes/multistream
                        time:   [13.032 ms 13.066 ms 13.104 ms]
                        change: [-0.7614% -0.3884% -0.0397%] (p = 0.04 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  9 (9.00%) high severe

Benchmarking 10000 streams of 1 bytes/multistream: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 87.3s, or reduce sample count to 10.
Benchmarking 10000 streams of 1 bytes/multistream: Collecting 100 samples in estimated 87.26
10000 streams of 1 bytes/multistream
                        time:   [850.20 ms 852.13 ms 853.94 ms]
                        change: [-4.1112% -3.4050% -2.7470%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  6 (6.00%) low severe
  5 (5.00%) low mild
  3 (3.00%) high mild

Benchmarking 1 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 5.1258
1 streams of 1000 bytes/multistream
                        time:   [32.380 µs 32.615 µs 32.914 µs]
                        change: [-5.3650% -3.7472% -2.1786%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  9 (9.00%) high mild
  2 (2.00%) high severe

Benchmarking 100 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 5.16
100 streams of 1000 bytes/multistream
                        time:   [1.4970 ms 1.5041 ms 1.5121 ms]
                        change: [-5.9242% -4.9438% -3.9764%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

Benchmarking 1000 streams of 1000 bytes/multistream: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.9s, or reduce sample count to 70.
Benchmarking 1000 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 6.9
1000 streams of 1000 bytes/multistream
                        time:   [66.039 ms 66.255 ms 66.489 ms]
                        change: [-1.7872% -1.2586% -0.7446%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 21 outliers among 100 measurements (21.00%)
  18 (18.00%) high mild
  3 (3.00%) high severe

➜  neqo-http3 git:(test-streams-bench) ✗ cargo bench --features bench
Benchmarking 1 streams of 1 bytes/multistream: Collecting 100 samples in estimated 5.0222 s 
1 streams of 1 bytes/multistream
                        time:   [31.196 µs 31.566 µs 32.008 µs]
                        change: [-1.9923% -0.5099% +1.0965%] (p = 0.52 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

Benchmarking 1000 streams of 1 bytes/multistream: Collecting 100 samples in estimated 6.2479
1000 streams of 1 bytes/multistream
                        time:   [12.863 ms 12.919 ms 12.980 ms]
                        change: [-1.6309% -1.1270% -0.5695%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 19 outliers among 100 measurements (19.00%)
  10 (10.00%) high mild
  9 (9.00%) high severe

Benchmarking 10000 streams of 1 bytes/multistream: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 87.5s, or reduce sample count to 10.
Benchmarking 10000 streams of 1 bytes/multistream: Collecting 100 samples in estimated 87.46
10000 streams of 1 bytes/multistream
                        time:   [862.91 ms 864.67 ms 866.53 ms]
                        change: [+1.1571% +1.4717% +1.7784%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

Benchmarking 1 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 5.1478
1 streams of 1000 bytes/multistream
                        time:   [32.551 µs 32.892 µs 33.283 µs]
                        change: [-0.5530% +0.8511% +2.2636%] (p = 0.24 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

Benchmarking 100 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 5.22
100 streams of 1000 bytes/multistream
                        time:   [1.5114 ms 1.5174 ms 1.5245 ms]
                        change: [+0.2271% +0.8880% +1.5818%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

Benchmarking 1000 streams of 1000 bytes/multistream: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.0s, or reduce sample count to 70.
Benchmarking 1000 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 7.0
1000 streams of 1000 bytes/multistream
                        time:   [66.765 ms 66.997 ms 67.247 ms]
                        change: [+0.6277% +1.1200% +1.6225%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe

larseggert · 2025-02-19T00:20:52Z

Oh good. I think it is core pinning being awkward on macOS then.

BTW, I came across https://manuel.bernhardt.io/posts/2023-11-16-core-pinning/ today, and we should change the bencher accordingly.

feat: Try FxHasher to see if it makes a difference

eab99c3

WIP

larseggert marked this pull request as ready for review January 10, 2025 14:12

larseggert requested review from KershawChang, martinthomson and mxinden as code owners January 10, 2025 14:12

Merge branch 'main' into feat-fxhasher

0bda917

More FxHasher

6762a02

larseggert added 2 commits January 14, 2025 15:05

Merge branch 'main' into feat-fxhasher

93c8ce5

Merge branch 'main' into feat-fxhasher

2b230a5

larseggert marked this pull request as draft February 4, 2025 15:10

Merge branch 'main' into feat-fxhasher

0d353a6

Signed-off-by: Lars Eggert <lars@eggert.org>

larseggert marked this pull request as ready for review February 4, 2025 16:08

martinthomson reviewed Feb 5, 2025

View reviewed changes

neqo-transport/src/tracking.rs Outdated Show resolved Hide resolved

larseggert added 4 commits February 5, 2025 08:40

Suggestion from @martinthomson

5879e28

One more

6b305ea

Merge branch 'main' into feat-fxhasher

ee86d48

Merge branch 'main' into feat-fxhasher

bbea965

mxinden reviewed Feb 5, 2025

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

neqo-transport/src/connection/tests/stream.rs Outdated Show resolved Hide resolved

neqo-transport/src/crypto.rs Outdated Show resolved Hide resolved

Suggestions from @mxinden

bd06169

martinthomson reviewed Feb 5, 2025

View reviewed changes

neqo-transport/src/crypto.rs Outdated Show resolved Hide resolved

neqo-transport/src/tparams.rs Outdated Show resolved Hide resolved

larseggert commented Feb 6, 2025

View reviewed changes

neqo-http3/src/connection.rs Outdated Show resolved Hide resolved

Merge branch 'main' into feat-fxhasher

7d6c181

larseggert added 2 commits February 10, 2025 09:02

IndexMap

a78c1c6

More

9d9a799

larseggert added 3 commits February 10, 2025 10:02

Again

d280b66

Undo

8d0b6fb

Merge branch 'main' into feat-fxhasher

b609b9e

larseggert changed the title ~~feat: Try FxHasher to see if it makes a difference~~ feat: Use FxHasher in places where we don't need DDoS resistance Feb 10, 2025

larseggert commented Feb 10, 2025

View reviewed changes

neqo-http3/src/connection.rs Outdated Show resolved Hide resolved

larseggert added 5 commits February 10, 2025 14:00

swap_remove -> shift_remove

6e8d194

BTreeSet -> HashSet

40c8383

Minimize

420c6e2

Minimize more

97773c6

Merge branch 'main' into feat-fxhasher

f4531aa

larseggert requested review from martinthomson and mxinden February 12, 2025 09:33

mxinden reviewed Feb 12, 2025

View reviewed changes

larseggert added 4 commits February 12, 2025 16:58

Merge branch 'main' into feat-fxhasher

fd081e4

Signed-off-by: Lars Eggert <lars@eggert.org>

Fix

9f9e220

Merge branch 'main' into feat-fxhasher

548027d

Signed-off-by: Lars Eggert <lars@eggert.org>

Fix

69a6710

Merge branch 'main' into feat-fxhasher

bf294ed

larseggert marked this pull request as draft February 15, 2025 15:34

larseggert mentioned this pull request Feb 15, 2025

test: Add a bench for H3 streams #2444

Open

Merge branch 'main' into feat-fxhasher

bbcb365

Signed-off-by: Lars Eggert <lars@eggert.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Use `FxHasher` in places where we don't need DDoS resistance #2342

feat: Use `FxHasher` in places where we don't need DDoS resistance #2342

larseggert commented Jan 10, 2025 •

edited

Loading

codecov bot commented Jan 10, 2025 •

edited

Loading

github-actions bot commented Jan 10, 2025 •

edited

Loading

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

github-actions bot commented Jan 10, 2025 •

edited

Loading

mxinden left a comment

martinthomson left a comment

larseggert commented Feb 6, 2025 •

edited

Loading

martinthomson commented Feb 6, 2025

larseggert commented Feb 7, 2025

mxinden Feb 12, 2025

larseggert Feb 12, 2025

mxinden Feb 12, 2025

martinthomson Feb 13, 2025

larseggert Feb 13, 2025

mxinden Feb 12, 2025

larseggert Feb 12, 2025

mxinden Feb 12, 2025

larseggert Feb 13, 2025

mxinden Feb 12, 2025

larseggert Feb 12, 2025

mxinden commented Feb 12, 2025

martinthomson commented Feb 13, 2025

larseggert commented Feb 15, 2025 •

edited

Loading

mxinden commented Feb 17, 2025

larseggert commented Feb 17, 2025 •

edited

Loading

mxinden commented Feb 17, 2025

larseggert commented Feb 17, 2025

mxinden commented Feb 18, 2025

larseggert commented Feb 19, 2025 •

edited

Loading

feat: Use FxHasher in places where we don't need DDoS resistance #2342

Are you sure you want to change the base?

feat: Use FxHasher in places where we don't need DDoS resistance #2342

Conversation

larseggert commented Jan 10, 2025 • edited Loading

codecov bot commented Jan 10, 2025 • edited Loading

Codecov Report

github-actions bot commented Jan 10, 2025 • edited Loading

Failed Interop Tests

neqo-latest as client

neqo-latest as server

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

github-actions bot commented Jan 10, 2025 • edited Loading

Benchmark results

Client/server transfer results

mxinden left a comment

Choose a reason for hiding this comment

martinthomson left a comment

Choose a reason for hiding this comment

larseggert commented Feb 6, 2025 • edited Loading

martinthomson commented Feb 6, 2025

larseggert commented Feb 7, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mxinden commented Feb 12, 2025

martinthomson commented Feb 13, 2025

larseggert commented Feb 15, 2025 • edited Loading

mxinden commented Feb 17, 2025

larseggert commented Feb 17, 2025 • edited Loading

mxinden commented Feb 17, 2025

larseggert commented Feb 17, 2025

mxinden commented Feb 18, 2025

larseggert commented Feb 19, 2025 • edited Loading

feat: Use `FxHasher` in places where we don't need DDoS resistance #2342

feat: Use `FxHasher` in places where we don't need DDoS resistance #2342

larseggert commented Jan 10, 2025 •

edited

Loading

codecov bot commented Jan 10, 2025 •

edited

Loading

github-actions bot commented Jan 10, 2025 •

edited

Loading

github-actions bot commented Jan 10, 2025 •

edited

Loading

larseggert commented Feb 6, 2025 •

edited

Loading

larseggert commented Feb 15, 2025 •

edited

Loading

larseggert commented Feb 17, 2025 •

edited

Loading

larseggert commented Feb 19, 2025 •

edited

Loading