Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

504 error when trying to upload the source code of a new version #1846

Open
willdurand opened this issue Oct 13, 2021 · 57 comments
Open

504 error when trying to upload the source code of a new version #1846

willdurand opened this issue Oct 13, 2021 · 57 comments

Comments

@willdurand
Copy link
Member

willdurand commented Oct 13, 2021

Describe the problem and steps to reproduce it:

(Please include as many details as possible.)

  1. Go to AMO -dev: https://addons-dev.allizom.org/en-US/developers/addon/submit/upload-listed
  2. Submit a new add-on (doesn't matter which one as long as it passes validation, you can use a dummy one from https://staticfil.es/webext-generator/)
  3. On the next page, check that you want to attach source code to the submission
  4. Select a zip file that is rather big (e.g., I tried with a 50MB zip file)
  5. Click to continue

What happened?

This error is displayed:

504 ERROR
The request could not be satisfied.

CloudFront attempted to establish a connection with the origin, but either the attempt failed or the origin closed the connection. We can't connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.
If you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.

Generated by cloudfront (CloudFront)
Request ID: 0hn0qJOZeCrgZLbhjaPj4mtquw6NDlZ4re9fB70kbvnGcLA8jMBUUw==

What did you expect to happen?

No error.

┆Issue is synchronized with this Jira Task

@willdurand
Copy link
Member Author

This issue seems to affect all envs.

@willdurand
Copy link
Member Author

@bqbn would you be able to look into that please?

@bqbn
Copy link
Contributor

bqbn commented Oct 13, 2021

What workflow gets triggered after a user uploads a zip file?

504 means gateway timeout, could it be Nginx timed out while waiting for something to process the zip file, or it didn't time out but just that something failed to process a big zip file?

@diox
Copy link
Member

diox commented Oct 13, 2021

The endpoint is /{locale}/developers/addon/{slug}/versions/{id}

We open the zip (at this point it should be on the local filesystem, since it's over the threshold Django has to deal with uploads in memory) validate that the files in it and their size are ok, then move the zip to EFS. That hasn't been a problem in the past (up until our limit at least, which should be way above 50 MB)

@bqbn
Copy link
Contributor

bqbn commented Oct 13, 2021

What about the custom scanner and MAD service? Do those two get invoked after a user uploads a zip file?

@diox
Copy link
Member

diox commented Oct 13, 2021

No, no tasks are triggered for source uploads - and even if that was the case, tasks like these are triggered async, a 200 response is returned and clients come back to check on it periodically. Here it's failing right at upload.

@diox
Copy link
Member

diox commented Oct 13, 2021

FWIW: I tried with my local addons-server instance, setting client_max_body_size 200m; in nginx config (we default to 50M in our local config) and having it use addons-server directly (to ensure it didn't hit addons-frontend), and that worked fine.

@bqbn
Copy link
Contributor

bqbn commented Oct 14, 2021

It turns out Cloudfront has a default 30 seconds "Origin Response Timeout" [1].

I tried it with a 70 MB zip file and it looks like the app took 98.081 seconds to process the POST.

1634159656.645 "*, 130.176.100.98" US addons-dev.allizom.org "POST /en-US/developers/addon/mossonk-locer-malmorming/submit/source-listed HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/mossonk-locer-malmorming/submit/source-listed" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36" 204.776 98.081 "-"

In this case, the Cloudfront edge server already returned 504 to the browser while Nginx was busy transmitting the file to its upstream.

I think we have a couple of options.

  1. To increase the value of "Origin Response Timeout". It can be bumped up to 60 seconds, which is probably not enough for some big files. If we need a value larger than 60, we'd have to contact AWS and make a case for it.
  2. To set uwsgi_request_buffering to off [2]. This would allow Nginx to start transmitting the file to the app without buffering it first. That way when the browser finishes transmitting the file, the app should receive the file shortly.

Option 2 is Cloud agnostic, so it should probably be the way to go. However it also means this option (uwsgi_request_buffering) will be applied to all requests sent to the app, so we'll need to test it thoroughly.

That said, if there is a dedicated path prefix for uploading files, then we could make a new location (in Nginx config) for it and only apply uwsgi_request_buffering to that location. This will make the Nginx config file more complicated (because we'll need an additional named location for this purpose), but it can be our third option nonetheless.

Please let me know what you think.

@diox
Copy link
Member

diox commented Oct 18, 2021

There are only a few endpoints that do file upload, but there is no dedicated path prefix for them, and some are part of the API so it's quite tricky. uwsgi_request_buffering to off seems promising, I'm not sure we need to have it to on, but as you said it would need thorough testing.

@bqbn
Copy link
Contributor

bqbn commented Oct 19, 2021

I've set uwsgi_request_buffering to off in the dev environment. Please test it and let me know when it is ready for production.

It might be a good idea to have a dedicated path prefix for uploading files in the long run though, so that way it can be better managed/controlled.

@AlexandraMoga
Copy link

@diox other that source code upload and addon submissions, are there any other areas that need to be tested?

@diox
Copy link
Member

diox commented Oct 19, 2021

@AlexandraMoga just browsing the site normally (particularly devhub) but that will happen anyway over the course of the week.

@AlexandraMoga
Copy link

I've verified add-on and source code uploads of various sizes and I haven't run into 504 errors.

More specifically, I've uploaded a .zip file of ~55Mb and, even though the upload process was slow, the file was uploaded eventually. I could also download the file afterwards from devhub and rev tools. - see example https://reviewers.addons-dev.allizom.org/en-US/reviewers/review/620865

I've also tried uploading an archive of ~60Mb as source code and this time the upload was not completed. There wasn't a 504 triggered but the upload finished with the following error, which, I believe, is expected:
image

I haven't run into any other issues while navigating the site, but maybe more coverage will be added here over the course of the week.

@diox
Copy link
Member

diox commented Oct 20, 2021

Uncompressed size is too large would happen if the total size of the zip contents, once extracted, was over 200 MB.

FWIW I just got a 504 on dev with a 60MB zip file (the total once extracted was well below the 200 MB limit).

@bqbn
Copy link
Contributor

bqbn commented Oct 20, 2021

FWIW I just got a 504 on dev with a 60MB zip file (the total once extracted was well below the 200 MB limit).

What info can you provide so that I can search for your session in the Nginx access logs? Can you share your zip file with me so that I can try to re-produce the error?

I tried uploading a 70MB zip file and it succeeded.

@diox
Copy link
Member

diox commented Oct 20, 2021

@bqbn
Copy link
Contributor

bqbn commented Oct 20, 2021

I tried with your zip file, and even though it took 90+ seconds to upload the file, I was able to finish the source uploading step successfully.

There were 4 POST in my whole session.

"POST /en-US/developers/upload HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/submit/upload-listed" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36" 0.265 0.263 "-"
"POST /en-US/developers/addon/submit/upload-listed HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/submit/upload-listed" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36" 0.641 0.641 "-"
"POST /en-US/developers/addon/seasp-scorse-duckabugo2/submit/source-listed HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/seasp-scorse-duckabugo2/submit/source-listed" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36" 95.080 95.077 "-"
"POST /en-US/developers/addon/seasp-scorse-duckabugo2/submit/details HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/seasp-scorse-duckabugo2/submit/details" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36" 1.213 1.213 "-"

However, the source uploading step for me is to POST to /en-US/developers/addon/<slug>/submit/source-listed whereas yours is to /en-US/developers/addon/<slug>/versions/<id>.

I don't know what exact steps to take to re-produce the issue you're seeing. For me, I followed the steps in the first comment and everything seems to be fine.

@diox
Copy link
Member

diox commented Oct 20, 2021

Interesting. There are 2 ways to submit source: at version/add-on submission time, or afterwards, while editing the version. You did the former, I did the latter. To follow my steps:

  • Upload an add-on
  • Don't submit source at submission time
  • Once submitted, go to Manage Status & Versions in developer hub for that add-on
  • Click on the version
  • You should see the edit version form at this point, where you can upload the source

The code to handle the 2 different paths is mostly shared, but there are some differences, so it would be interesting to see if only one of them is broken.

@bqbn
Copy link
Contributor

bqbn commented Oct 20, 2021

I followed your step and still couldn't reproduce it. This time I even used my bigger zip file (74MB). It took 118 seconds to finish but I did get the "Changes successfully saved." message on the web page.

"POST /en-US/developers/addon/moodpermad-cronker-tigarron/versions/1541784 HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/moodpermad-cronker-tigarron/versions/1541784" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36" 118.420 118.421 "-"

I searched the logs and I see both your attempts got 302 and they only took about 70+ seconds.

"POST /en-US/developers/addon/goran-carey-leopperre/versions/1692599 HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/goran-carey-leopperre/versions/1692599" "Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0" 73.834 73.834 "-"
"POST /en-US/developers/addon/goran-carey-leopperre/versions/1692599 HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/goran-carey-leopperre/versions/1692599" "Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0" 75.769 75.769 "-"

I can't think of a reason why 504 happened to you but not to me.

@AlexandraMoga
Copy link

I also reproduced the 504 error now with a 60Mb zip when I've uploaded the file from the devhub manage versions page:

image

@bqbn
Copy link
Contributor

bqbn commented Oct 27, 2021

What info can you provide so that I can search for your session in the Nginx access logs?

Also, can you reproduce the 504 constantly, or is it intermittent?

On my side, I tried to upload a 70 MB zip file again, and it worked fine for me.

@bqbn
Copy link
Contributor

bqbn commented Oct 27, 2021

And in which environment did you test it?

@AlexandraMoga
Copy link

@bqbn I've tested on -dev.
I've tried again today with a ~60Mb zip file and one instance of the reproduction happened at 14:03 UTC (Oct 28). The addon in cause is https://addons-dev.allizom.org/en-US/developers/addon/grammarly-for-dev/versions/1692796.
Another reproduction was at 14:08 UTC (Oct 28) and another at 14:10 UTC- both consecutive source code uploads for the same addon => so the error reproduced three times in a row without intermittence.
I've tried with another add-on after that - https://addons-dev.allizom.org/en-US/developers/addon/ee2fc01910e04967ae5f/versions/1692927 at 14:13 UTC and the 504 error was also reproduced.

At 14:36 UTC I've tried again for https://addons-dev.allizom.org/en-US/developers/addon/rlmt/versions/1692905, this time with a ~54Mb zip file => the 504 error reproduced but when I opened the addon version page again, I've noticed that the source code has actually been uploaded.

Another test, at 14:41 UTC for https://addons-dev.allizom.org/en-US/developers/addon/3f22e7b50ac64eb8a017/versions/1692907, with a 45Mb zip also reproduced the error, again the archive proved to be uploaded when I've checked the version page again.

I've added the zip files I've tested with here - you should have access to the folder with your mozilla account.

@bqbn
Copy link
Contributor

bqbn commented Oct 28, 2021

@AlexandraMoga if you've been using Firefox for testing, could you try a different browser (e.g.: the latest Chrome)? I'm curious if that would make any difference.

@bqbn
Copy link
Contributor

bqbn commented Oct 29, 2021

@AlexandraMoga, in the cloudfront logs, I see 2 requests that had the same IP address as yours uploaded ~60 MB file successfully.

date    time    location    method    uri    status    request_bytes    time_taken    time_to_first_byte    x_edge_detailed_result_type
2021-10-28    07:01:49    SFO53-C1    POST    /en-US/developers/addon/unlistedexample-16-04/versions/1693053    200    62034917    176.422    176.366    Miss
2021-10-28    07:06:01    SFO53-C1    POST    /en-US/developers/addon/mv2-firefox-versions-3/versions/1692880    200    62035016    179.529    179.473    Miss

Those requests took over 175 seconds to finish, but nonetheless it succeeded.

And I can see your failed attempts in Cloudfront logs too,

date    time    location    method    uri    status    request_bytes    time_taken    time_to_first_byte    x_edge_detailed_result_type
2021-10-28    14:02:55    OTP50-C1    POST    /en-US/developers/addon/grammarly-for-dev/versions/1692796    504    62034930    55.9    55.9    OriginCommError
2021-10-28    14:08:11    OTP50-C1    POST    /en-US/developers/addon/grammarly-for-dev/versions/1692796    504    62034938    57.861    57.861    OriginCommError
2021-10-28    14:10:53    OTP50-C1    POST    /en-US/developers/addon/grammarly-for-dev/versions/1692796    504    62034874    55.262    55.262    OriginCommError
2021-10-28    14:14:02    OTP50-C1    POST    /en-US/developers/addon/ee2fc01910e04967ae5f/versions/1692927    504    62034956    55.071    55.071    OriginCommError
2021-10-28    14:34:10    OTP50-C1    POST    /en-US/developers/addon/grammarly-for-dev/versions/1692796    504    55827395    52.313    52.313    OriginCommError
2021-10-28    14:36:09    OTP50-C1    POST    /en-US/developers/addon/rlmt/versions/1692905    504    55827312    52.567    52.567    OriginCommError
2021-10-28    14:41:20    OTP50-C1    POST    /en-US/developers/addon/3f22e7b50ac64eb8a017/versions/1692907    504    46515967    49.096    49.096    OriginCommError

In the failed cases, Cloudfront returned 504 much sooner than 175 seconds (even with about the same sized files). Do you think it was possible that you didn't wait long enough for the failed cases?

Another difference is the edge location. Even though all the above requests were from the same IP, the successful ones were from SFO53-C1, whereas the failed ones were from OTP50-C1.

If the 2 successful requests were from you, do you remember what you did differently that might cause you to have a different edge server?

And lastly, as far as Nginx is concerned, all requests, even the failed ones, were completed successfully (e.g. Nginx received either 200 or 302 from the application).

@bqbn
Copy link
Contributor

bqbn commented Oct 29, 2021

And just to report, I tried https://addons-dev.allizom.org/en-US/developers/addon/grammarly-for-dev/versions/1692796 with @AlexandraMoga's testing files. For the ~60Mb zip file, I got the Uncompressed size is too large error. For the ~54Mb zip file, I was able to successfully upload it. In my case the cloudfront logs look like this:

date    time    location    method    uri    status    request_bytes    time_taken    time_to_first_byte    x_edge_detailed_result_type
2021-10-28    23:37:13    SEA19-C1    POST    /en-US/developers/addon/grammarly-for-dev/versions/1692796    200    62034555    86.22    86.158    Miss
2021-10-28    23:40:25    SEA19-C1    POST    /en-US/developers/addon/grammarly-for-dev/versions/1692796    302    55827042    87.741    87.741    Miss

I'm using Chrome though. Please give that a try and see if it makes any difference.

@AlexandraMoga
Copy link

AlexandraMoga commented Oct 29, 2021

@bqbn these are the results of my new tests (Oct 29) with Chrome, on Windows (note that the UTC time mentioned below corresponds with the moment the error appeared in the browser):

  1. 13:06 UTC - https://addons-dev.allizom.org/en-US/developers/addon/django32/versions/1692875, with the ~60Mb file example => 504 error (I haven't done anything to interrupt the upload; I just waited for the result and the 504 error was triggered after a considerable amount of time - ~1 minute); I've checked the version page after the error was triggered and there was no source code uploaded;
  2. 13:14 UTC - https://addons-dev.allizom.org/en-US/developers/addon/temporary/versions/1692835, with the ~60Mb file example => 504 error, source code file not uploaded;
  3. 13:19 UTC - https://addons-dev.allizom.org/en-US/developers/addon/terms-of-service-didn-t-read/versions/1692600, with the ~60Mb file example => 504 error, source code file not uploaded;
  4. 13:22 UTC - https://addons-dev.allizom.org/en-US/developers/addon/mozilla-labs-snowl/versions/1692165 with the ~50Mb file example => 504 error, source code file was uploaded when I checked the version page again.

Also, to answer your questions:

@AlexandraMoga, in the cloudfront logs, I see 2 requests that had the same IP address as yours uploaded ~60 MB file successfully.

date    time    location    method    uri    status    request_bytes    time_taken    time_to_first_byte    x_edge_detailed_result_type
2021-10-28    07:01:49    SFO53-C1    POST    /en-US/developers/addon/unlistedexample-16-04/versions/1693053    200    62034917    176.422    176.366    Miss
2021-10-28    07:06:01    SFO53-C1    POST    /en-US/developers/addon/mv2-firefox-versions-3/versions/1692880    200    62035016    179.529    179.473    Miss

Those requests took over 175 seconds to finish, but nonetheless it succeeded.

I've made those uploads before noon (around 07:00 UTC according to the logs) with a different account - an admin user.
I've tried to upload source code with that user again on Chrome but this time I ran into the 504 error. So, the only difference between the successful and the failed requests I can currently think about is the submission time (before noon/ afternoon).

In the failed cases, Cloudfront returned 504 much sooner than 175 seconds (even with about the same sized files). Do you think it was possible that you didn't wait long enough for the failed cases?

I didn't touch the page from the moment I've uploaded the zip file until the error was received, so I'm not sure how I could have impacted the response time.

Also, if it has any relevance, I wasn't using any active VPN software while running the above tests.

@diox
Copy link
Member

diox commented Nov 8, 2021

I also made a couple other attempts with a different file, using https://addons-dev.allizom.org/en-US/developers/addon/62764c36250f4af789cf/versions/1692459:

  • A 32 MB file worked
  • A 90 MB file (which would have returned a page with Uncompressed size is too large error, never saving the file to EFS) failed

@bqbn
Copy link
Contributor

bqbn commented Nov 8, 2021

We increased the instance size for the olympia_web component in the dev environment. Please feel free to test it again and see if that makes any difference.

We just changed the -dev instances to its original size as the size doesn't seem to be a decisive factor in this matter.

@diox
Copy link
Member

diox commented Nov 10, 2021

From discussion in standup, next steps are:

  • Setting uwsgi_request_buffering to off in stage & prod as well
  • Setting 60 seconds timeout on the CDN in stage & prod as well
  • Adding some logging in addons-server around version edit/source submission to see if we can narrow things down (separate issue)

That should allow us to downgrade the priority of this issue (since testing on dev has proven the issue is now more intermittent with the aforementioned changes), we'll keep an eye on and see if the added logging helps us pinpoint the source of the problem.

@bqbn
Copy link
Contributor

bqbn commented Nov 10, 2021

From discussion in standup, next steps are:

  • Setting uwsgi_request_buffering to off in stage & prod as well
  • Setting 60 seconds timeout on the CDN in stage & prod as well
  • Adding some logging in addons-server around version edit/source submission to see if we can narrow things down (separate issue)

Is the plan to add the logging first, then push out the other 2 changes to -stage and -prod? And that's what I'd prefer by the way.

(And please cc me on the logging PR.)

@diox
Copy link
Member

diox commented Nov 17, 2021

Filed #8593 for the logging. I've also filed https://github.com/mozilla/addons-server/issues/18336 to look into having fewer endpoints where we can upload source code.

@diox diox removed the priority:p2 label Nov 17, 2021
@conoremclaughlin
Copy link

Unsure if this is related. We're unable to upload our extension currently. A 31.2 MB zipped file fails when it reaches 60% with the following error message:

Received an empty response from the server; status: 504

Current result we're receiving when submitting a new build on Firefox.

@diox
Copy link
Member

diox commented Nov 25, 2021

@conoremclaughlin this is likely the same problem, yes. Sorry about that, we're still investigating and aim to have some mitigations in place soon.

@diox
Copy link
Member

diox commented Dec 6, 2021

We've made some changes to our configuration that should help for bigger uploads, and we have some monitoring in place for this specific issue as well. Let us know if you still encounter any issues with large source uploads.

@keinagae
Copy link

Hi i am having following error now when submitting file with size of 141MB
image

@diox
Copy link
Member

diox commented Jan 13, 2022

@keinagae that's not a bug, we don't accept zip files that would uncompress to over 200 MB total of content.

@keinagae
Copy link

then for code submission should we skip that part?

@diox
Copy link
Member

diox commented Jan 14, 2022

No - Please contact amo-admins by email as this is off-topic for this issue.

@stale
Copy link

stale bot commented Jul 31, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this bug should stay open, please comment on the issue with further details. Thank you for your contributions.

@stale stale bot added the state:stale Issues marked as stale. These can be re-opened should there be plans to fix them. label Jul 31, 2022
@stale stale bot removed the state:stale Issues marked as stale. These can be re-opened should there be plans to fix them. label Jan 19, 2023
@AbhishekMahawarsf

This comment was marked as off-topic.

@valenvb-ag
Copy link

Looks like we're still seeing this issue - with a source archive around 52 MB. It appears to actually successfully upload, but the fact it returns an error anyway is breaking automation for us.

@KevinMind
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants