Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow-down on 429 maybe not optimal #1750

Closed
uriesk opened this issue Jan 23, 2023 · 7 comments · Fixed by #1751
Closed

Slow-down on 429 maybe not optimal #1750

uriesk opened this issue Jan 23, 2023 · 7 comments · Fixed by #1751
Assignees
Labels
Milestone

Comments

@uriesk
Copy link
Collaborator

uriesk commented Jan 23, 2023

Currently, when we get a 429, we slow down by running less downloads in parallel. So basically by indirectly limiting bandwidth.
But if the rate-limit upstream is by requests per minute, this is not optimal, because we will eventually run into lots of small files, hitting the limit sooner and slowing down much further than we have to, when we eventually download larger files again.

Do we know how the upstream wikimedia rate-limit works on downloading files?

@uriesk
Copy link
Collaborator Author

uriesk commented Jan 23, 2023

Another thing i just noticed is that when you visit one of the URLs that mwoffliner gets rate-limited on in the zimfarm... you also see the 429.
image

So did their CDN / cache, cache the 429 and shows it to everyone, or is the backend rate-limiting the cache?
Looks like they got a weird setup there....
Can recreate it with:

npm start -- --addNamespaces=100 --adminEmail=contact@kiwix.org --customMainPage=User:The_other_Kiwix_guy/Landing --format=novid:maxi --mwUrl=https://en.wikipedia.org/ --articleList=Canicattì --webp

@kelson42
Copy link
Collaborator

kelson42 commented Jan 23, 2023

@uriesk I agree that current system is unoptimal. But all of this really depends how the WAF is configured. It can vary from Mediawiki instance to other instance. We should investigate if there is not a bug in HTTP 429 response caching at Wikimedia.

@kelson42 kelson42 added this to the 1.13.0 milestone Jan 23, 2023
@uriesk
Copy link
Collaborator Author

uriesk commented Jan 23, 2023

Now the image gives Internal Server Error 500.
I think wikimedia got an issue while creating the thumbnail, give a 429 while it's trying, and once it has failed returns a 500.

Causing mwoffliner to slow down for no reason.
If there is no way for us to tell the difference between a legit rate-limit and a faulty thumbnail, we can't improve it. Hmm.

@uriesk
Copy link
Collaborator Author

uriesk commented Jan 23, 2023

Looking at it a bit more, most cases look like this:

500 Error -> Immediately try again -> 500 Error -> try again -> 500 -> try again -> ... -> Rate Limited

So a basic improvement would be to wait a minute before retrying after error. The weird upstream caches might not have much of an effect afterall.

[log] [2023-01-23T00:32:39.382Z] Not able to download content for https://upload.wikimedia.org/wikipedia/en/thumb/1/15/Phalanx_-_The_Enforce_Fighter_A-144000.png/220px-Phalanx_-_The_Enforce_Fighter_A-144000.png due to AxiosError: Request failed with status code 500
[log] [2023-01-23T00:32:40.066Z] Not able to download content for https://upload.wikimedia.org/wikipedia/en/thumb/1/15/Phalanx_-_The_Enforce_Fighter_A-144000.png/220px-Phalanx_-_The_Enforce_Fighter_A-144000.png due to AxiosError: Request failed with status code 500
[log] [2023-01-23T00:32:41.574Z] Not able to download content for https://upload.wikimedia.org/wikipedia/en/thumb/1/15/Phalanx_-_The_Enforce_Fighter_A-144000.png/220px-Phalanx_-_The_Enforce_Fighter_A-144000.png due to AxiosError: Request failed with status code 500
[log] [2023-01-23T00:32:42.619Z] Not able to download content for https://upload.wikimedia.org/wikipedia/en/thumb/1/15/Phalanx_-_The_Enforce_Fighter_A-144000.png/220px-Phalanx_-_The_Enforce_Fighter_A-144000.png due to AxiosError: Request failed with status code 500
[log] [2023-01-23T00:32:44.285Z] Not able to download content for https://upload.wikimedia.org/wikipedia/en/thumb/1/15/Phalanx_-_The_Enforce_Fighter_A-144000.png/220px-Phalanx_-_The_Enforce_Fighter_A-144000.png due to AxiosError: Request failed with status code 500
[log] [2023-01-23T00:32:47.169Z] Received a [status=429], slowing down
[log] [2023-01-23T00:32:47.169Z] Setting maxActiveRequests from [71] to [70]
[log] [2023-01-23T00:32:47.169Z] Not able to download content for https://upload.wikimedia.org/wikipedia/en/thumb/1/15/Phalanx_-_The_Enforce_Fighter_A-144000.png/220px-Phalanx_-_The_Enforce_Fighter_A-144000.png due to AxiosError: Request failed with status code 429
[log] [2023-01-23T00:32:51.281Z] Received a [status=429], slowing down
[log] [2023-01-23T00:32:51.281Z] Setting maxActiveRequests from [70] to [69]
[log] [2023-01-23T00:32:51.281Z] Not able to download content for https://upload.wikimedia.org/wikipedia/en/thumb/1/15/Phalanx_-_The_Enforce_Fighter_A-144000.png/220px-Phalanx_-_The_Enforce_Fighter_A-144000.png due to AxiosError: Request failed with status code 429
[log] [2023-01-23T00:32:58.304Z] Received a [status=429], slowing down
[log] [2023-01-23T00:32:58.304Z] Setting maxActiveRequests from [69] to [68]```

@kelson42
Copy link
Collaborator

kelson42 commented Jan 23, 2023

I'm in favour of a proper backoff strategy using fibonacci or similar. Current strategy is primitive. I guess it should be doable using a module which already exists. But at the core of the problem is the HTTP 500. Like you said, errors by generating the thumbnail are pretty common on Wikimedia image backend.

@uriesk
Copy link
Collaborator Author

uriesk commented Jan 23, 2023

We will never get a satisfying backoff strategy, cause that downloadContent is used on different places with probably different rate limits.
We should treat the downloadFiles different from the rest.
I got something in mind. I will try it later today. If it's good, it should be low-effort and we can do this for 1.12 still.

@uriesk uriesk self-assigned this Jan 23, 2023
@kelson42
Copy link
Collaborator

@uriesk agree that images might better be treated differently than text content

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants