-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
504 error when trying to upload the source code of a new version #1846
Comments
This issue seems to affect all envs. |
@bqbn would you be able to look into that please? |
What workflow gets triggered after a user uploads a zip file? 504 means gateway timeout, could it be Nginx timed out while waiting for something to process the zip file, or it didn't time out but just that something failed to process a big zip file? |
The endpoint is We open the zip (at this point it should be on the local filesystem, since it's over the threshold Django has to deal with uploads in memory) validate that the files in it and their size are ok, then move the zip to EFS. That hasn't been a problem in the past (up until our limit at least, which should be way above 50 MB) |
What about the custom scanner and MAD service? Do those two get invoked after a user uploads a zip file? |
No, no tasks are triggered for source uploads - and even if that was the case, tasks like these are triggered async, a 200 response is returned and clients come back to check on it periodically. Here it's failing right at upload. |
FWIW: I tried with my local addons-server instance, setting |
It turns out Cloudfront has a default 30 seconds "Origin Response Timeout" [1]. I tried it with a 70 MB zip file and it looks like the app took 98.081 seconds to process the POST.
In this case, the Cloudfront edge server already returned 504 to the browser while Nginx was busy transmitting the file to its upstream. I think we have a couple of options.
Option 2 is Cloud agnostic, so it should probably be the way to go. However it also means this option ( That said, if there is a dedicated path prefix for uploading files, then we could make a new location (in Nginx config) for it and only apply Please let me know what you think. |
There are only a few endpoints that do file upload, but there is no dedicated path prefix for them, and some are part of the API so it's quite tricky. |
I've set It might be a good idea to have a dedicated path prefix for uploading files in the long run though, so that way it can be better managed/controlled. |
@diox other that source code upload and addon submissions, are there any other areas that need to be tested? |
@AlexandraMoga just browsing the site normally (particularly devhub) but that will happen anyway over the course of the week. |
I've verified add-on and source code uploads of various sizes and I haven't run into 504 errors. More specifically, I've uploaded a I've also tried uploading an archive of ~60Mb as source code and this time the upload was not completed. There wasn't a 504 triggered but the upload finished with the following error, which, I believe, is expected: I haven't run into any other issues while navigating the site, but maybe more coverage will be added here over the course of the week. |
FWIW I just got a 504 on dev with a 60MB zip file (the total once extracted was well below the 200 MB limit). |
What info can you provide so that I can search for your session in the Nginx access logs? Can you share your zip file with me so that I can try to re-produce the error? I tried uploading a 70MB zip file and it succeeded. |
I just tried and reproduced the error at 17:43 UTC. URL was https://addons-dev.allizom.org/en-US/developers/addon/goran-carey-leopperre/versions/1692599. I've uploaded the zip to https://drive.google.com/file/d/1c-lRSnIw_bNb07iy4r26SXHSkRWs8bIt/view?usp=sharing |
I tried with your zip file, and even though it took 90+ seconds to upload the file, I was able to finish the source uploading step successfully. There were 4
However, the source uploading step for me is to POST to I don't know what exact steps to take to re-produce the issue you're seeing. For me, I followed the steps in the first comment and everything seems to be fine. |
Interesting. There are 2 ways to submit source: at version/add-on submission time, or afterwards, while editing the version. You did the former, I did the latter. To follow my steps:
The code to handle the 2 different paths is mostly shared, but there are some differences, so it would be interesting to see if only one of them is broken. |
I followed your step and still couldn't reproduce it. This time I even used my bigger zip file (74MB). It took 118 seconds to finish but I did get the "Changes successfully saved." message on the web page.
I searched the logs and I see both your attempts got 302 and they only took about 70+ seconds.
I can't think of a reason why 504 happened to you but not to me. |
What info can you provide so that I can search for your session in the Nginx access logs? Also, can you reproduce the 504 constantly, or is it intermittent? On my side, I tried to upload a 70 MB zip file again, and it worked fine for me. |
And in which environment did you test it? |
@bqbn I've tested on -dev. At 14:36 UTC I've tried again for https://addons-dev.allizom.org/en-US/developers/addon/rlmt/versions/1692905, this time with a ~54Mb zip file => the 504 error reproduced but when I opened the addon version page again, I've noticed that the source code has actually been uploaded. Another test, at 14:41 UTC for https://addons-dev.allizom.org/en-US/developers/addon/3f22e7b50ac64eb8a017/versions/1692907, with a 45Mb zip also reproduced the error, again the archive proved to be uploaded when I've checked the version page again. I've added the zip files I've tested with here - you should have access to the folder with your mozilla account. |
@AlexandraMoga if you've been using Firefox for testing, could you try a different browser (e.g.: the latest Chrome)? I'm curious if that would make any difference. |
@AlexandraMoga, in the cloudfront logs, I see 2 requests that had the same IP address as yours uploaded ~60 MB file successfully.
Those requests took over 175 seconds to finish, but nonetheless it succeeded. And I can see your failed attempts in Cloudfront logs too,
In the failed cases, Cloudfront returned 504 much sooner than 175 seconds (even with about the same sized files). Do you think it was possible that you didn't wait long enough for the failed cases? Another difference is the edge location. Even though all the above requests were from the same IP, the successful ones were from If the 2 successful requests were from you, do you remember what you did differently that might cause you to have a different edge server? And lastly, as far as Nginx is concerned, all requests, even the failed ones, were completed successfully (e.g. Nginx received either 200 or 302 from the application). |
And just to report, I tried https://addons-dev.allizom.org/en-US/developers/addon/grammarly-for-dev/versions/1692796 with @AlexandraMoga's testing files. For the ~60Mb zip file, I got the
I'm using Chrome though. Please give that a try and see if it makes any difference. |
@bqbn these are the results of my new tests (Oct 29) with Chrome, on Windows (note that the UTC time mentioned below corresponds with the moment the error appeared in the browser):
Also, to answer your questions:
I've made those uploads before noon (around 07:00 UTC according to the logs) with a different account - an admin user.
I didn't touch the page from the moment I've uploaded the zip file until the error was received, so I'm not sure how I could have impacted the response time. Also, if it has any relevance, I wasn't using any active VPN software while running the above tests. |
I also made a couple other attempts with a different file, using https://addons-dev.allizom.org/en-US/developers/addon/62764c36250f4af789cf/versions/1692459:
|
We just changed the -dev instances to its original size as the size doesn't seem to be a decisive factor in this matter. |
From discussion in standup, next steps are:
That should allow us to downgrade the priority of this issue (since testing on dev has proven the issue is now more intermittent with the aforementioned changes), we'll keep an eye on and see if the added logging helps us pinpoint the source of the problem. |
Is the plan to add the logging first, then push out the other 2 changes to -stage and -prod? And that's what I'd prefer by the way. (And please cc me on the logging PR.) |
Filed #8593 for the logging. I've also filed https://github.com/mozilla/addons-server/issues/18336 to look into having fewer endpoints where we can upload source code. |
Unsure if this is related. We're unable to upload our extension currently. A 31.2 MB zipped file fails when it reaches 60% with the following error message:
Current result we're receiving when submitting a new build on Firefox. |
@conoremclaughlin this is likely the same problem, yes. Sorry about that, we're still investigating and aim to have some mitigations in place soon. |
We've made some changes to our configuration that should help for bigger uploads, and we have some monitoring in place for this specific issue as well. Let us know if you still encounter any issues with large source uploads. |
@keinagae that's not a bug, we don't accept zip files that would uncompress to over 200 MB total of content. |
then for code submission should we skip that part? |
No - Please contact amo-admins by email as this is off-topic for this issue. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this bug should stay open, please comment on the issue with further details. Thank you for your contributions. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Looks like we're still seeing this issue - with a source archive around 52 MB. It appears to actually successfully upload, but the fact it returns an error anyway is breaking automation for us. |
Old Jira Ticket: https://mozilla-hub.atlassian.net/browse/ADDSRV-60 |
Describe the problem and steps to reproduce it:
(Please include as many details as possible.)
What happened?
This error is displayed:
What did you expect to happen?
No error.
┆Issue is synchronized with this Jira Task
The text was updated successfully, but these errors were encountered: