Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout while downloading branches #11265

Closed
kevans91 opened this issue May 1, 2020 · 2 comments · Fixed by #11296
Closed

Timeout while downloading branches #11265

kevans91 opened this issue May 1, 2020 · 2 comments · Fixed by #11296
Labels
issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented type/enhancement An improvement of existing functionality
Milestone

Comments

@kevans91
Copy link
Contributor

kevans91 commented May 1, 2020

Hi,

This is something I intend to look at, but filing an issue for it in case others have thoughts; basically, for large repos/branches, downloading the branch may take a long time creating the .zip archive on the backend. There are actually two related issues here, that I think have the same solution:

1.) We may timeout on the download if the archive's not been created yet, and
2.) Gitea may handout an archive that's not yet complete

I think the solution is to push archiving into a queue and make this an async process that checks in for completion; then the download endpoint can check if archiving this particular commit/branch is in progress rather than handing out an incomplete archive.

This can be observed on lower-end hardware and/or with large repos, e.g. any of the branches here on my instance: https://git.kevans.dev/kevans/freebsd/branches/ -> these should all hit nginx's default timeout (and actually caused a DoS of sorts when people were apparently attempting to download one of my branches). I'd push it to try.gitea.io, but I think the problem is easily enough visualized to not need to push the ~2-4GB repo over there.

@guillep2k guillep2k added the type/enhancement An improvement of existing functionality label May 3, 2020
kevans91 added a commit to kevans91/gitea that referenced this issue May 5, 2020
The prime benefit being sought here is for large archives to not
clog up the rendering process and cause unsightly proxy timeouts.
As a secondary benefit, archive-in-progress is moved out of the
way into a /tmp file so that new archival requests for the same
commit will not get fulfilled based on an archive that isn't yet
finished.

This asynchronous system is fairly primitive; request comes in, we'll
spawn off a new goroutine to handle it, then we'll mark it as done.
Status requests will see if the file exists in the final location,
and report the archival as done when it exists.

Fixes go-gitea#11265
@stale
Copy link

stale bot commented Jul 2, 2020

This issue has been automatically marked as stale because it has not had recent activity. I am here to help clear issues left open even if solved or waiting for more insight. This issue will be closed if no further activity occurs during the next 2 weeks. If the issue is still valid just add a comment to keep it alive. Thank you for your contributions.

@stale stale bot added the issue/stale label Jul 2, 2020
@piersh-aetheros
Copy link

piersh-aetheros commented Jul 2, 2020

i have a ~5GB repo with 671 branches. i have a 5 minute timeout on my web proxy - it always times out fetching the branches page.

this is a bug, not an enhancement request.

@stale stale bot removed the issue/stale label Jul 2, 2020
@lunny lunny added the issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented label Jul 5, 2020
lafriks added a commit that referenced this issue Nov 7, 2020
* Make archival asynchronous

The prime benefit being sought here is for large archives to not
clog up the rendering process and cause unsightly proxy timeouts.
As a secondary benefit, archive-in-progress is moved out of the
way into a /tmp file so that new archival requests for the same
commit will not get fulfilled based on an archive that isn't yet
finished.

This asynchronous system is fairly primitive; request comes in, we'll
spawn off a new goroutine to handle it, then we'll mark it as done.
Status requests will see if the file exists in the final location,
and report the archival as done when it exists.

Fixes #11265

* Archive links: drop initial delay to three-quarters of a second

Some, or perhaps even most, archives will not take all that long to archive.
The archive process starts as soon as the download button is initially
clicked, so in theory they could be done quite quickly.  Drop the initial
delay down to three-quarters of a second to make it more responsive in the
common case of the archive being quickly created.

* archiver: restructure a little bit to facilitate testing

This introduces two sync.Cond pointers to the archiver package. If they're
non-nil when we go to process a request, we'll wait until signalled (at all)
to proceed. The tests will then create the sync.Cond so that it can signal
at-will and sanity-check the state of the queue at different phases.

The author believes that nil-checking these two sync.Cond pointers on every
archive processing will introduce minimal overhead with no impact on
maintainability.

* gofmt nit: no space around binary + operator

* services: archiver: appease golangci-lint, lock queueMutex

Locking/unlocking the queueMutex is allowed, but not required, for
Cond.Signal() and Cond.Broadcast().  The magic at play here is just a little
too much for golangci-lint, as we take the address of queueMutex and this is
mostly used in archiver.go; the variable still gets flagged as unused.

* archiver: tests: fix several timing nits

Once we've signaled a cond var, it may take some small amount of time for
the goroutines released to hit the spot we're wanting them to be at. Give
them an appropriate amount of time.

* archiver: tests: no underscore in var name, ungh

* archiver: tests: Test* is run in a separate context than TestMain

We must setup the mutex/cond variables at the beginning of any test that's
going to use it, or else these will be nil when the test is actually ran.

* archiver: tests: hopefully final tweak

Things got shuffled around such that we carefully build up and release
requests from the queue, so we can validate the state of the queue at each
step. Fix some assertions that no longer hold true as fallout.

* repo: Download: restore some semblance of previous behavior

When archival was made async, the GET endpoint was only useful if a previous
POST had initiated the download. This commit restores the previous behavior,
to an extent; we'll now submit the archive request there and return a
"202 Accepted" to indicate that it's processing if we didn't manage to
complete the request within ~2 seconds of submission.

This lets a client directly GET the archive, and gives them some indication
that they may attempt to GET it again at a later time.

* archiver: tests: simplify a bit further

We don't need to risk failure and use time.ParseDuration to get 2 *
time.Second.

else if isn't really necessary if the conditions are simple enough and lead
to the same result.

* archiver: tests: resolve potential source of flakiness

Increase all timeouts to 10 seconds; these aren't hard-coded sleeps, so
there's no guarantee we'll actually take that long. If we need longer to
not have a false-positive, then so be it.

While here, various assert.{Not,}Equal arguments are flipped around so that
the wording in error output reflects reality, where the expected argument is
second and actual third.

* archiver: setup infrastructure for notifying consumers of completion

This API will *not* allow consumers to subscribe to specific requests being
completed, just *any* request being completed. The caller is responsible for
determining if their request is satisfied and waiting again if needed.

* repo: archive: make GET endpoint synchronous again

If the request isn't complete, this endpoint will now submit the request and
wait for completion using the new API. This may still be susceptible to
timeouts for larger repos, but other endpoints now exist that the web
interface will use to negotiate its way through larger archive processes.

* archiver: tests: amend test to include WaitForCompletion()

This is a trivial one, so go ahead and include it.

* archiver: tests: fix test by calling NewContext()

The mutex is otherwise uninitialized, so we need to ensure that we're
actually initializing it if we plan to test it.

* archiver: tests: integrate new WaitForCompletion a little better

We can use this to wait for archives to come in, rather than spinning and
hoping with a timeout.

* archiver: tests: combine numQueued declaration with next-instruction assignment

* routers: repo: reap unused archiving flag from DownloadStatus()

This had some planned usage before, indicating whether this request
initiated the archival process or not. After several rounds of refactoring,
this use was deemed not necessary for much of anything and got boiled down
to !complete in all cases.

* services: archiver: restructure to use a channel

We now offer two forms of waiting for a request:
- WaitForCompletion: wait for completion with no timeout
- TimedWaitForCompletion: wait for completion with timeout

In both cases, we wait for the given request's cchan to close; in the latter
case, we do so with the caller-provided timeout. This completely removes the
need for busy-wait loops in Download/InitiateDownload, as it's fairly clean
to wait on a channel with timeout.

* services: archiver: use defer to unlock now that we can

This previously carried the lock into the goroutine, but an intermediate
step just added the request to archiveInProgress outside of the new
goroutine and removed the need for the goroutine to start out with it.

* Revert "archiver: tests: combine numQueued declaration with next-instruction assignment"

This reverts commit bcc5214.

Revert "archiver: tests: integrate new WaitForCompletion a little better"

This reverts commit 9fc8bed.

Revert "archiver: tests: fix test by calling NewContext()"

This reverts commit 709c356.

Revert "archiver: tests: amend test to include WaitForCompletion()"

This reverts commit 75261f5.

* archiver: tests: first attempt at WaitForCompletion() tests

* archiver: tests: slight improvement, less busy-loop

Just wait for the requests to complete in order, instead of busy-waiting
with a timeout.  This is slightly less fragile.

While here, reverse the arguments of a nearby assert.Equal() so that
expected/actual are correct in any test output.

* archiver: address lint nits

* services: archiver: only close the channel once

* services: archiver: use a struct{} for the wait channel

This makes it obvious that the channel is only being used as a signal,
rather than anything useful being piped through it.

* archiver: tests: fix expectations

Move the close of the channel into doArchive() itself; notably, before these
goroutines move on to waiting on the Release cond.

The tests are adjusted to reflect that we can't WaitForCompletion() after
they've already completed, as WaitForCompletion() doesn't indicate that
they've been released from the queue yet.

* archiver: tests: set cchan to nil for comparison

* archiver: move ctx.Error's back into the route handlers

We shouldn't be setting this in a service, we should just be validating the
request that we were handed.

* services: archiver: use regex to match a hash

This makes sure we don't try and use refName as a hash when it's clearly not
one, e.g. heads/pull/foo.

* routers: repo: remove the weird /archive/status endpoint

We don't need to do this anymore, we can just continue POSTing to the
archive/* endpoint until we're told the download's complete. This avoids a
potential naming conflict, where a ref could start with "status/"

* archiver: tests: bump reasonable timeout to 15s

* archiver: tests: actually release timedReq

* archiver: tests: run through inFlight instead of manually checking

While we're here, add a test for manually re-processing an archive that's
already been complete. Re-open the channel and mark it incomplete, so that
doArchive can just mark it complete again.

* initArchiveLinks: prevent default behavior from clicking

* archiver: alias gitea's context, golang context import pending

* archiver: simplify logic, just reconstruct slices

While the previous logic was perhaps slightly more efficient, the
new variant's readability is much improved.

* archiver: don't block shutdown on waiting for archive

The technique established launches a goroutine to do the wait,
which will close a wait channel upon termination. For the timeout
case, we also send back a value indicating whether the timeout was
hit or not.

The timeouts are expected to be relatively small, but still a multi-
second delay to shutdown due to this could be unfortunate.

* archiver: simplify shutdown logic

We can just grab the shutdown channel from the graceful manager instead of
constructing a channel to halt the caller and/or pass a result back.

* Style issues

* Fix mis-merge

Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
Co-authored-by: Lauris BH <lauris@nix.lv>
@lafriks lafriks added this to the 1.14.0 milestone Nov 7, 2020
@go-gitea go-gitea locked and limited conversation to collaborators Dec 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue/confirmed Issue has been reviewed and confirmed to be present or accepted to be implemented type/enhancement An improvement of existing functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants