Timeout batch downloads, not each download #6285

alexcrichton · 2018-11-08T15:09:29Z

This commit switches the timeout logic implemented in #6130 to timeout
an entire batch of downloads instead of each download individually.
Previously if any pending download didn't receive data in 30s we would
time out, or if any pending download didn't receive 10 bytes in 30s we
would time out. On very slow network connections this is highly likely
to happen as a trickle of incoming bytes may not be spread equally
amongst all connections, and not all connections may actually be active
at any one point in time.

The fix is to instead apply timeout logic for an entire batch of
downloads. Only if zero total data isn't received in the timeout window
do we time out. Or in other words, if any data for any download is
receive we consider it as not being timed out. Similarly any progress on
any download counts as progress towards our speed limit.

Closes #6284

alexcrichton · 2018-11-08T15:10:04Z

@ehuss this actually means that the complexity of #6130 I think is required because curl only does per-connection timeout and it's a very good point that we basically don't want that any more with parallel downloads

alexcrichton · 2018-11-08T15:10:10Z

r? @ehuss

dwijnand · 2018-11-08T15:20:01Z

Look at that turnaround time, from issue to PR 😀

src/cargo/core/package.rs

This commit switches the timeout logic implemented in rust-lang#6130 to timeout an entire batch of downloads instead of each download individually. Previously if *any* pending download didn't receive data in 30s we would time out, or if *any* pending download didn't receive 10 bytes in 30s we would time out. On very slow network connections this is highly likely to happen as a trickle of incoming bytes may not be spread equally amongst all connections, and not all connections may actually be active at any one point in time. The fix is to instead apply timeout logic for an entire batch of downloads. Only if zero total data isn't received in the timeout window do we time out. Or in other words, if any data for any download is receive we consider it as not being timed out. Similarly any progress on any download counts as progress towards our speed limit. Closes rust-lang#6284

alexcrichton · 2018-11-08T23:54:46Z

Updated! Ideally with working tests this time too

ehuss · 2018-11-09T02:59:14Z

because curl only does per-connection timeout and it's a very good point that we basically don't want that any more with parallel downloads

I'm not sure I understand this statement, per-connection timeouts would be a good thing, right? The problem wasn't per-connection timeouts, but that #6130 implemented per-package timeouts? Packages could be enqueued into libcurl, but not necessarily started right away, and the timeout code didn't know when they actually started (and if it took more than 30 seconds to get started, all of them would time out at once).

IIUC, with this change, if one connection hangs, it remains hung until all other transfers finish, and then times out, and then retries. That may not be catastrophic, and is hopefully rare, but seems a little odd.

Am I understanding this correctly?

Unfortunately I can't think of any workarounds — it seems like libcurl doesn't given enough internal visibility for cargo to know what's happening. The only other option I can think of is to fix the original problem of extracting blocking the main thread.

alexcrichton · 2018-11-09T03:30:19Z

Er sorry, the term "per connection" there wasn't quite right, it's "per Easy" which we currently map to "per package". So given the way we're using libcurl right now there's no easy way (afaik) to get a per-connection timeout, but that's what this logic is basically switching to which is timing out everything.

Otherwise yeah you're right, if you have two connections and one hangs you don't realize until the other one is finished entirely. I don't think that'll happen much in practice as it's basically crates.io or static.crates.io, which are likely both soon to be cloudfront as the same fronting service.

alexcrichton · 2018-11-09T03:32:32Z

Oh also, even with extracting not blocking the main thread, we'll still want this logic. All the timeout logic before this series of parallel downloads were always applied to just one Easy which corresponds to just one package. With parallel downloads we have a lot of Easy handles active at the same time (a bunch of packages all at once). Internally curl does all the enqueueing and scheduling of what to download when.

If we timed out per-Easy though then if 1000 packages started and the first 999 didn't finish downloading in 30s then the last package would time out immediately for having no network activity. All in all I think we'll unconditionally want this logic of "time out the entire transfer" as I think it's more in line with the purpose of the timeout, which is to make sure that Cargo doesn't hang when nothing is happening on the network for too long.

ehuss · 2018-11-09T04:14:44Z

Oh also, even with extracting not blocking the main thread, we'll still want this logic.

I meant rolling back #6130 and doing non-blocking extraction instead, and relying on libcurl to do the right thing with its own internal timeout handling. Custom timeout handling wouldn't be necessary if the wait function was fast, correct?

ehuss · 2018-11-09T04:15:00Z

@bors r+

Do you want to backport this to beta?

bors · 2018-11-09T04:15:01Z

📌 Commit 4e1e3f7 has been approved by ehuss

alexcrichton · 2018-11-09T04:19:06Z

With my current understanding of libcurl (which is most certainly not perfect by any measure) I believe that if we rolled back #6130 and then did non-blocking extraction we would still find ourselves requiring this PR (and reapplying #6130 before it). I don't think libcurl has a native way of doing the timeout logic that we desire.

To be clear as well though, I think the absolutely perfect timeout logic would be to have two limits: a no activity duration and a speed requirement duration (if it goes slower we kill it). Those two limits would be applied per TCP connection that libcurl makes. This PR is a bit coarser and it applies those limits for all active TCP connections instead of each individual one.

bors · 2018-11-09T04:50:45Z

⌛ Testing commit 4e1e3f7 with merge 5f448d5...

Timeout batch downloads, not each download This commit switches the timeout logic implemented in #6130 to timeout an entire batch of downloads instead of each download individually. Previously if *any* pending download didn't receive data in 30s we would time out, or if *any* pending download didn't receive 10 bytes in 30s we would time out. On very slow network connections this is highly likely to happen as a trickle of incoming bytes may not be spread equally amongst all connections, and not all connections may actually be active at any one point in time. The fix is to instead apply timeout logic for an entire batch of downloads. Only if zero total data isn't received in the timeout window do we time out. Or in other words, if any data for any download is receive we consider it as not being timed out. Similarly any progress on any download counts as progress towards our speed limit. Closes #6284

bors · 2018-11-09T05:22:39Z

☀️ Test successful - status-appveyor, status-travis
Approved by: ehuss
Pushing 5f448d5 to master...

[beta] Timeout batch downloads, not each download This is a beta backport of #6285

rust-highfive assigned ehuss Nov 8, 2018

ehuss reviewed Nov 8, 2018

View reviewed changes

src/cargo/core/package.rs Show resolved Hide resolved

ehuss reviewed Nov 8, 2018

View reviewed changes

src/cargo/core/package.rs Outdated Show resolved Hide resolved

alexcrichton force-pushed the more-timeouts branch from 1a4f122 to 4e1e3f7 Compare November 8, 2018 23:54

bors merged commit 4e1e3f7 into rust-lang:master Nov 9, 2018

alexcrichton deleted the more-timeouts branch November 9, 2018 15:33

alexcrichton mentioned this pull request Nov 9, 2018

[beta] Timeout batch downloads, not each download #6297

Merged

bors added a commit that referenced this pull request Nov 10, 2018

Auto merge of #6297 - alexcrichton:beta-net, r=alexcrichton

5d96734

[beta] Timeout batch downloads, not each download This is a beta backport of #6285

ehuss modified the milestones: 1.32.0, 1.31.0 Feb 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout batch downloads, not each download #6285

Timeout batch downloads, not each download #6285

alexcrichton commented Nov 8, 2018

alexcrichton commented Nov 8, 2018

alexcrichton commented Nov 8, 2018

dwijnand commented Nov 8, 2018

alexcrichton commented Nov 8, 2018

ehuss commented Nov 9, 2018

alexcrichton commented Nov 9, 2018

alexcrichton commented Nov 9, 2018

ehuss commented Nov 9, 2018

ehuss commented Nov 9, 2018

bors commented Nov 9, 2018

alexcrichton commented Nov 9, 2018

bors commented Nov 9, 2018

bors commented Nov 9, 2018

Timeout batch downloads, not each download #6285

Timeout batch downloads, not each download #6285

Conversation

alexcrichton commented Nov 8, 2018

alexcrichton commented Nov 8, 2018

alexcrichton commented Nov 8, 2018

dwijnand commented Nov 8, 2018

alexcrichton commented Nov 8, 2018

ehuss commented Nov 9, 2018

alexcrichton commented Nov 9, 2018

alexcrichton commented Nov 9, 2018

ehuss commented Nov 9, 2018

ehuss commented Nov 9, 2018

bors commented Nov 9, 2018

alexcrichton commented Nov 9, 2018

bors commented Nov 9, 2018

bors commented Nov 9, 2018