Fix timeouts firing while tarballs are extracted #6130

alexcrichton · 2018-10-03T23:18:45Z

This commit fixes #6125 by ensuring that while we're extracting tarballs
or doing other synchronous work like grabbing file locks we're not
letting the timeout timers of each HTTP transfer keep ticking. This is
curl's default behavior (which we don't want in this scenario). Instead
the timeout logic is inlined directly and we manually account for the
synchronous work happening not counting towards timeout limits.

Closes #6125

rust-highfive · 2018-10-03T23:18:49Z

r? @matklad

(rust_highfive has picked a reviewer for you, use r? to override)

alexcrichton · 2018-10-03T23:18:55Z

r? @ehuss

SimonSapin · 2018-10-04T06:51:49Z

Can this synchronous work cause Cargo to pause reading bytes from a TCP socket that is in the middle of fetching another HTTP response? Not timing out unnecessarily in the client is good, but it’d also be good if we don’t give servers a reason to time out.

alexcrichton · 2018-10-04T07:14:29Z

It can indeed! That's where to fix it we'll need to do tarball extraction in parallel.

ehuss · 2018-10-05T04:44:45Z

I've been looking through this. I don't think I fully understand the details, but it seems to work. Would this continue to be necessary if the extraction was done in a separate thread?

alexcrichton · 2018-10-05T16:28:28Z

It sort of depend how we'd do extraction, but I think it would be required, yeah. Right now before we actually extract the tarball we reacquire a file lock that was previously dropped after we saw that it didn't exist. This acquisition can take a long time and, if it does, a message is printed to the console saying we're waiting on a lock. We wouldn't want to fork of 5 tarball extractions to all print "Waiting on ..." all at once, so I think the main thread will continue to be the one to acquire file locks.

To that end we can still have long pauses on the main thread we'll need to account for in timing out.

I am slightly worried though in that this feature is definitely balooning in complexity quickly, although I'm not sure how that's best handled...

bors · 2018-10-12T02:27:05Z

☔ The latest upstream changes (presumably #6166) made this pull request unmergeable. Please resolve the merge conflicts.

This is what curl recommends we use so let's be sure to not mess up any timers by accident!

Prevoiusly retries were tracked per-session, which meant that if you were downloading 100 packages and they all failed spuriously for the same reason you'd immediately abort. Instead though let's keep track of retries per package so if they're all coupled on one connection a failure of one will end up retrying all of them. We want to make sure that we actually retry again!

This commit fixes rust-lang#6125 by ensuring that while we're extracting tarballs or doing other synchronous work like grabbing file locks we're not letting the timeout timers of each HTTP transfer keep ticking. This is curl's default behavior (which we don't want in this scenario). Instead the timeout logic is inlined directly and we manually account for the synchronous work happening not counting towards timeout limits. Closes rust-lang#6125

alexcrichton · 2018-10-12T17:25:34Z

Rebased!

alexcrichton · 2018-10-17T15:02:57Z

ping @ehuss, have you had more time to think about this? With the beta cutoff happening soon I think we'll definitely want to merge this or back out parallel donwloads on the beta branch

ehuss · 2018-10-17T17:18:08Z

My concern is that this seems to add a lot of complexity. If the core issue is that wait blocks too long, would an alternative that avoids blocking be reasonable? I have a prototype that does extraction in the background, and it seems to work well and resolves the original issue. It also has the benefit that it makes things faster for everyone. I can probably polish it off in a day or two, although I had some questions about it.

Also, I'm not sure if I see it being as critical, since parallel downloads is opt-in, and it only seems to have an issue in a pathological case. Unless I'm missing something? I would definitely opt to merge this over backing anything out.

SimonSapin · 2018-10-17T18:03:02Z

since parallel downloads is opt-in

Is it? When I hit #6125 I’m pretty sure I had upgraded the toolchain without opting into anything new.

SimonSapin · 2018-10-17T18:04:28Z

Also you can call this case pathological but it was with a real crate, not a synthetic stress test.

alexcrichton · 2018-10-17T18:08:52Z

Oh so I should probably have been more clear, but #6005 actually did enable parallel downloads by default. It just disabled HTTP/2 by default! The parallel downloads part comes about from the restructuring of the code to use curl's Multi, and the parallelism by default comes from multiple connections to a domain rather than multiplexing over one connection with HTTP/2.

This indeed is a pathological case if we have one-at-a-time downloads, but with parallel I think it can actually happen somewhat commonly!

@ehuss this does remind me, though, that I could work on an alternative patch to switch to one-by-one downloads using the same infrastructure we have now, I think that'd be a much smaller patch compared to this. Do you think that'd be best to land for beta?

ehuss · 2018-10-17T18:27:50Z

Ah, for some reason I was thinking it only mattered that the http/2 connection was being held open, but I was mistaken of the other scenario with unfinished transfers in flight.

I think the background extraction would be the best approach. It helps everyone, and doesn't look too risky. Do you not agree?

(Also, I think being able to configure the concurrency would be good regardless what else we do.)

alexcrichton · 2018-10-17T18:43:05Z

Yeah background extraction is definitely the best way to go here long term I think, I was just brainstorming other short-term solutions while that's worked on. If you've got it almost ready to go though then we can do that instead!

ehuss · 2018-10-17T18:57:15Z

I can probably post it today or tomorrow, and we can see how risky it looks. I just had some questions, which I guess I'll just ask here:

How generic should Sources be? Right now I implemented it as sending a closure to a thread, which allows the Source to decide exactly what to do. However, only one impl really matters (RegistrySource), and using closures makes it a little more complex. An alternative is to have the Source respond with (cache_path, src_path, filename) and the background thread would just be hard-coded to save and extract in those directories. It would be simpler and more straight-forward, but maybe less flexible. Do either of these approaches sound better?
How detailed should the status bar be? Right now, after all the downloads are finished, there might be some background extractions in progress. Right now it just says "Extracting ..." without any more details. The vast majority of the time that status will only display for a fraction of a second. However, in some cases (very slow hard drive, very large crates), it might be displayed for a few seconds. Is it worth it to print more information? If so, I would need to keep track of in-flight extractions (either just a count, or the names).

alexcrichton · 2018-10-17T19:03:13Z

While Source started out as uber-generic it's become much less so over time, so I'd lean towards the simpler solution of just passing a tuple/struct around instead of using generic closures. For the status bar I think just saying "extracting ..." is fine! We can always add more detailed information later if necessary

ehuss · 2018-10-18T05:14:21Z

I spent some time cleaning up my background branch, and decided it is probably too risky. I'm unhappy with the way it turned out. Some problems I ran into:

Acquiring a file lock requires holding a Config (for shell output to say "Blocking") which I don't know how to send that to a thread. So the file locking is still synchronous.
The progress bar was a little more tricky than I anticipated. It now says "Downloading (extracting...)" which is unsatisfying. This could be improved, but would complicate things quite a bit.
Downloading during builds still blocks for extracting in between each batch through the unit_dependencies loop. It's better than before, and it shouldn't cause timeout issues (because there shouldn't be any in-flight transfers), but it's a little weird.
The code still looks quite ugly to me.

On the bright side, it does fix the original issue, and it does make things slightly faster (~20% faster for that version of servo on a slow hard drive). Here's the code if you want to look: ehuss@872bcb6 Let me know if you'd still be interested in a PR.

So I think I'm r+ this PR if you are OK with it. Or if you want to do something different, let me know and I'll try to help. I'm not sure if you meant by the one-by-one comment of just changing the max_host_connections or something else. Sorry for the delay and flip-flopping.

alexcrichton · 2018-10-18T14:10:38Z

Ok thanks for the info! I'm ok going with this version for now and we can always figure out how to get async extraction later should be fine. I suspect that we'll want to go "full async" here eventually and basically have an async event loop (like curl is already) where work for tarball extraction and/or acquiring file locks is all farmed out to this async loop. That means that something like a download would simply return a Future from the Box<Source> and it'd internally capture everything like file locks, tarball extraction, curl download handling, etc.

That's a bit of a larger scale refactor but I think it's probably the best looking implementation in terms of cleanliness and whatnot?

alexcrichton · 2018-10-18T14:11:22Z

Oh and for one-by-one it's not actually as easy as I thought it would be to retrofit, but I don't think it'd be the end of the world to throw it in last second.

ehuss · 2018-11-03T02:25:28Z

@alexcrichton Have you decided what you want to do about this?

alexcrichton · 2018-11-05T16:37:21Z

Oh jeez I've sort of forgotten the context here... I think I'd probably prefer to land this as-is at this point, I'm finding it hard keeping all other possibilities in my head :(

ehuss · 2018-11-06T00:47:21Z

@bors r+

Yea, I don't see anything specifically wrong with this other than it is complicated. Maybe someday in the future it will be easier to extract asynchronously which might make this less necessary.

bors · 2018-11-06T00:47:22Z

📌 Commit 0b0f089 has been approved by ehuss

bors · 2018-11-06T00:47:30Z

⌛ Testing commit 0b0f089 with merge 022bfa397829fdbcbd283fd11aa2898402c0b56d...

bors · 2018-11-06T01:51:14Z

💔 Test failed - status-appveyor

alexcrichton · 2018-11-06T01:53:15Z

@bors: retry

Tracking issue for proptest resolve errors. #6258

bors · 2018-11-06T01:53:22Z

⌛ Testing commit 0b0f089 with merge f938c4f...

Fix timeouts firing while tarballs are extracted This commit fixes #6125 by ensuring that while we're extracting tarballs or doing other synchronous work like grabbing file locks we're not letting the timeout timers of each HTTP transfer keep ticking. This is curl's default behavior (which we don't want in this scenario). Instead the timeout logic is inlined directly and we manually account for the synchronous work happening not counting towards timeout limits. Closes #6125

bors · 2018-11-06T02:33:25Z

☀️ Test successful - status-appveyor, status-travis
Approved by: ehuss
Pushing f938c4f to master...

[beta]: Fix timeouts firing with tarball extraction This is a backport of #6130 to the 1.31.0 branch

This commit switches the timeout logic implemented in rust-lang#6130 to timeout an entire batch of downloads instead of each download individually. Previously if *any* pending download didn't receive data in 30s we would time out, or if *any* pending download didn't receive 10 bytes in 30s we would time out. On very slow network connections this is highly likely to happen as a trickle of incoming bytes may not be spread equally amongst all connections, and not all connections may actually be active at any one point in time. The fix is to instead apply timeout logic for an entire batch of downloads. Only if zero total data isn't received in the timeout window do we time out. Or in other words, if any data for any download is receive we consider it as not being timed out. Similarly any progress on any download counts as progress towards our speed limit. Closes rust-lang#6284

Timeout batch downloads, not each download This commit switches the timeout logic implemented in #6130 to timeout an entire batch of downloads instead of each download individually. Previously if *any* pending download didn't receive data in 30s we would time out, or if *any* pending download didn't receive 10 bytes in 30s we would time out. On very slow network connections this is highly likely to happen as a trickle of incoming bytes may not be spread equally amongst all connections, and not all connections may actually be active at any one point in time. The fix is to instead apply timeout logic for an entire batch of downloads. Only if zero total data isn't received in the timeout window do we time out. Or in other words, if any data for any download is receive we consider it as not being timed out. Similarly any progress on any download counts as progress towards our speed limit. Closes #6284

This commit switches the timeout logic implemented in rust-lang#6130 to timeout an entire batch of downloads instead of each download individually. Previously if *any* pending download didn't receive data in 30s we would time out, or if *any* pending download didn't receive 10 bytes in 30s we would time out. On very slow network connections this is highly likely to happen as a trickle of incoming bytes may not be spread equally amongst all connections, and not all connections may actually be active at any one point in time. The fix is to instead apply timeout logic for an entire batch of downloads. Only if zero total data isn't received in the timeout window do we time out. Or in other words, if any data for any download is receive we consider it as not being timed out. Similarly any progress on any download counts as progress towards our speed limit. Closes rust-lang#6284

rust-highfive assigned matklad Oct 3, 2018

rust-highfive assigned ehuss and unassigned matklad Oct 3, 2018

alexcrichton mentioned this pull request Oct 3, 2018

Timeout while fetching Servo’s dependencies on Windows #6125

Closed

alexcrichton added 3 commits October 12, 2018 10:24

Use Multi::get_timeout instead of our own

03b3e83

This is what curl recommends we use so let's be sure to not mess up any timers by accident!

alexcrichton force-pushed the less-timeouts-u branch from a52c731 to 0b0f089 Compare October 12, 2018 17:25

alexcrichton mentioned this pull request Nov 6, 2018

Tracking issue for proptest resolve errors. #6258

Open

bors merged commit 0b0f089 into rust-lang:master Nov 6, 2018

bors mentioned this pull request Nov 6, 2018

custom compile progress via env var template #6070

Closed

alexcrichton deleted the less-timeouts-u branch November 6, 2018 21:02

alexcrichton mentioned this pull request Nov 6, 2018

[beta]: Fix timeouts firing with tarball extraction #6272

Merged

bors added a commit that referenced this pull request Nov 6, 2018

Auto merge of #6272 - alexcrichton:beta-net, r=ehuss

2c011df

[beta]: Fix timeouts firing with tarball extraction This is a backport of #6130 to the 1.31.0 branch

ehuss mentioned this pull request Nov 8, 2018

Frequent network timeouts on slow connection #6284

Closed

alexcrichton mentioned this pull request Nov 8, 2018

Timeout batch downloads, not each download #6285

Merged

ehuss modified the milestones: 1.32.0, 1.31.0 Feb 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix timeouts firing while tarballs are extracted #6130

Fix timeouts firing while tarballs are extracted #6130

alexcrichton commented Oct 3, 2018

rust-highfive commented Oct 3, 2018

alexcrichton commented Oct 3, 2018

SimonSapin commented Oct 4, 2018

alexcrichton commented Oct 4, 2018

ehuss commented Oct 5, 2018

alexcrichton commented Oct 5, 2018

bors commented Oct 12, 2018

alexcrichton commented Oct 12, 2018

alexcrichton commented Oct 17, 2018

ehuss commented Oct 17, 2018

SimonSapin commented Oct 17, 2018

SimonSapin commented Oct 17, 2018

alexcrichton commented Oct 17, 2018

ehuss commented Oct 17, 2018

alexcrichton commented Oct 17, 2018

ehuss commented Oct 17, 2018

alexcrichton commented Oct 17, 2018

ehuss commented Oct 18, 2018

alexcrichton commented Oct 18, 2018

alexcrichton commented Oct 18, 2018

ehuss commented Nov 3, 2018

alexcrichton commented Nov 5, 2018

ehuss commented Nov 6, 2018

bors commented Nov 6, 2018

bors commented Nov 6, 2018

bors commented Nov 6, 2018

alexcrichton commented Nov 6, 2018

bors commented Nov 6, 2018

bors commented Nov 6, 2018

Fix timeouts firing while tarballs are extracted #6130

Fix timeouts firing while tarballs are extracted #6130

Conversation

alexcrichton commented Oct 3, 2018

rust-highfive commented Oct 3, 2018

alexcrichton commented Oct 3, 2018

SimonSapin commented Oct 4, 2018

alexcrichton commented Oct 4, 2018

ehuss commented Oct 5, 2018

alexcrichton commented Oct 5, 2018

bors commented Oct 12, 2018

alexcrichton commented Oct 12, 2018

alexcrichton commented Oct 17, 2018

ehuss commented Oct 17, 2018

SimonSapin commented Oct 17, 2018

SimonSapin commented Oct 17, 2018

alexcrichton commented Oct 17, 2018

ehuss commented Oct 17, 2018

alexcrichton commented Oct 17, 2018

ehuss commented Oct 17, 2018

alexcrichton commented Oct 17, 2018

ehuss commented Oct 18, 2018

alexcrichton commented Oct 18, 2018

alexcrichton commented Oct 18, 2018

ehuss commented Nov 3, 2018

alexcrichton commented Nov 5, 2018

ehuss commented Nov 6, 2018

bors commented Nov 6, 2018

bors commented Nov 6, 2018

bors commented Nov 6, 2018

alexcrichton commented Nov 6, 2018

bors commented Nov 6, 2018

bors commented Nov 6, 2018