bug: WebTransport session establishment failed. Too many pending WebTransport sessions (64) #1896

SgtPooki · 2023-07-25T21:08:56Z

Sometimes I see thousands instances of this warning in Chrome:

WebTransport session establishment failed. Too many pending WebTransport sessions (64)

This module may need some sort of dial queue to ensure it doesn't open too many connections and trigger this error.

ported over from libp2p/js-libp2p-webtransport#64

The text was updated successfully, but these errors were encountered:

SgtPooki · 2023-08-08T17:58:01Z

I believe this is a blocker for ipfs/helia#182 because:

webtransport is one of the few consistent ways we can connect to other nodes from the browser.
A Helia nodes' connectivity become unstable once we have too many pending dials.

@maschad are you actively working on this?

maschad · 2023-08-09T17:01:48Z

I'm not actively working on it at the moment @SgtPooki although I think @achingbrain 's PR #1947 may be related.

achingbrain · 2023-08-09T17:03:08Z

I think #1947 will help with the unstable bit but I can't help but wonder if there's some cleanup we need to do that we're missing to prevent the "Too many pending sessions" thing in the first place.

Refactors session closing to happen in one function and call that function when the session has closed or failed to init. Doesn't quite solve the "Too many pending WebTransport Sessions" problem but does slow it down a little bit. Refs: #1896

…1969) Refactors session closing to happen in one function and call that function when the session has closed or failed to init. Doesn't quite solve the "Too many pending WebTransport Sessions" problem but does slow it down a little bit. Refs: #1896

achingbrain · 2023-08-24T05:37:11Z

This may be a bug in Chrome.

When we forcibly close WebTransport connections whose .ready promise hasn't resolved within the connection timeout (the peer has gone away, is on a slow connection, is overloaded, is firewalled, etc), Chrome may not be cleaning up properly in the background, so we hit this limit.

More details here: https://bugs.chromium.org/p/chromium/issues/detail?id=1473980

achingbrain · 2023-08-25T11:21:50Z

I've tried to add a global count to the WebTransport transport to ensure we don't go over 64 "pending" connections, taking "pending" as meaning "has yet to resolve/reject the .ready/.closed promises" but it doesn't solve the problem.

Counting the various WebTransport sessions that have been opened and what happened to them, it seems sessions that reject* their .ready/.closed promises are still counted as "pending".

Therefore regardless of any limit we set on how many connections we open simultaneously, once the number of errored connections plus the number of yet-to-resolve/reject connections reaches 65 no further WebTransport connections can be opened.

This is bad news and needs a browser fix because once 65 connections have errored it's essentially game over until the page is reloaded.

I've updated the chromium bug report with this information.

* = The rejection reasons are normal network things - an unreachable host, a handshake timeout, etc.

achingbrain · 2023-08-30T09:56:30Z

A comment on the Chromium bug links to this design doc - it seems Chromium unilaterally applies an anti-DOS measure by keeping "failed" connections in the "pending" state for 5 minutes after the failure.

This also seems to include sessions that have had their .close method called before .ready has resolved - which is how we cancel connections when (for example) dialling a peer on all available addresses then when one dial succeeds, aborting all the other dials.

This seriously limits the amount of connections that can be opened over time.

The Chromium bug is still valid, I think - because the 5 minute delay does not seem to be applied, failed connections are "pending" ~~forever~~ maybe not forever, but for a lot longer than 5 minutes.

I've put a simple demo page together here that doesn't have any libp2p code in it - https://webtransport-pending-sessions.on.fleek.co/

We can use this to see if the issue has been resolved over time.

Interestingly Firefox does not apply the 5 minute wait though it does crash quite reliably.

I've tried adding a dial queue to the WebTransport transport that applies the 5 minute wait for new dials once 64 have errored, but we request dial slots quicker than the old ones time out so everything sort of grinds to a halt.

We may be able to do something about this by increasing the auto dial retry threshold to something over 5 minutes, this should give Chromium enough time to reach it's internal timeout, after the bug that means it never reaches its internal timeout is fixed 🫠

SgtPooki · 2023-08-30T23:36:18Z

Thanks for staying on top of this one and keeping us updated @achingbrain

achingbrain · 2024-02-20T16:27:50Z

Ref: ipfs/in-web-browsers#211 (comment)

dhuseby · 2024-04-30T15:41:22Z

@lidel and Javier from Igalia are working with the Chrome team to get a fix into Chrome. Firefox nightly does have WebTransport and seems to work.

dhuseby · 2024-04-30T15:42:03Z

link to test page: https://libp2p-webtransport-sessions.on.fleek.co/

dhuseby · 2024-05-07T15:40:20Z

Waiting on Igalia to submit a patch to Chrome that fixes this.

achingbrain · 2024-05-30T17:32:00Z

Notes from Igalia work stream (under various Handling pending WebTransport sessions headers): https://hackmd.io/SaJIHZmyRUKfl_fQwoYfog

SgtPooki added the need/triage Needs initial labeling and prioritization label Jul 25, 2023

SgtPooki mentioned this issue Jul 25, 2023

feat: auto-dialer max queue should allow specifying max queue per transport #1897

Closed

SgtPooki changed the title ~~WebTransport session establishment failed. Too many pending WebTransport sessions (64)~~ bug: WebTransport session establishment failed. Too many pending WebTransport sessions (64) Jul 25, 2023

SgtPooki added the kind/bug A bug in existing code (including security flaws) label Jul 25, 2023

maschad self-assigned this Jul 26, 2023

maschad removed their assignment Aug 9, 2023

SgtPooki self-assigned this Aug 11, 2023

achingbrain mentioned this issue Aug 15, 2023

fix(@libp2p/webtransport): be more thorough about closing sessions #1969

Merged

p-shahi mentioned this issue Aug 16, 2023

Trying to Dial Kubo Node with Web Transport #1951

Closed

achingbrain added status/blocked Unable to be worked further until needs are met and removed need/triage Needs initial labeling and prioritization labels Aug 30, 2023

SgtPooki assigned achingbrain and unassigned SgtPooki Aug 30, 2023

p-shahi mentioned this issue Aug 31, 2023

Introduce smart dialing in js-libp2p #2010

Closed

This was referenced Jan 18, 2024

chore(main): release 1.0.0 #2365

Closed

chore(main): release 1.0.0 #2366

Closed

oblique mentioned this issue Jun 3, 2024

Chromium throttles WebTransport connections causing connection instability eigerco/lumina#287

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: WebTransport session establishment failed. Too many pending WebTransport sessions (64) #1896

bug: WebTransport session establishment failed. Too many pending WebTransport sessions (64) #1896

SgtPooki commented Jul 25, 2023

SgtPooki commented Aug 8, 2023

maschad commented Aug 9, 2023

achingbrain commented Aug 9, 2023 •

edited

Loading

achingbrain commented Aug 24, 2023 •

edited

Loading

achingbrain commented Aug 25, 2023

achingbrain commented Aug 30, 2023 •

edited

Loading

SgtPooki commented Aug 30, 2023

achingbrain commented Feb 20, 2024

dhuseby commented Apr 30, 2024

dhuseby commented Apr 30, 2024

dhuseby commented May 7, 2024

achingbrain commented May 30, 2024

bug: WebTransport session establishment failed. Too many pending WebTransport sessions (64) #1896

bug: WebTransport session establishment failed. Too many pending WebTransport sessions (64) #1896

Comments

SgtPooki commented Jul 25, 2023

SgtPooki commented Aug 8, 2023

maschad commented Aug 9, 2023

achingbrain commented Aug 9, 2023 • edited Loading

achingbrain commented Aug 24, 2023 • edited Loading

achingbrain commented Aug 25, 2023

achingbrain commented Aug 30, 2023 • edited Loading

SgtPooki commented Aug 30, 2023

achingbrain commented Feb 20, 2024

dhuseby commented Apr 30, 2024

dhuseby commented Apr 30, 2024

dhuseby commented May 7, 2024

achingbrain commented May 30, 2024

achingbrain commented Aug 9, 2023 •

edited

Loading

achingbrain commented Aug 24, 2023 •

edited

Loading

achingbrain commented Aug 30, 2023 •

edited

Loading