You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For openzim/zim-requests#1172, we are going to have 40 recipes, one per category on shamela.ws to have pratical recipe duration and practical ZIM sizes.
However, there is only one upstream server, so we need to ensure that we do not run more than one task (out of these 40 recipes) per worker, or even probably only one task on the whole platform to be fair with their servers.
Shall we create a new shamela.ws platform we would assign manually to the corresponding recipes?
The text was updated successfully, but these errors were encountered:
I am torn. On one hand, it has virtually no cost for us to do, so I'd be in favor but on the other, it's just 40 concurrent access crawling parts of the website (with little overlap) so just setting a reasonable delay should do it.
We can discuss it today.
I see there's a contact email and they've gone great lengths to make this available with android/ios/windows softwares so they seem keen on distributing it offline widely. We could simply ask.
The main reason why we are creating 40 recipes is that in total there is about 10M links to explore, and we are using zimit scraper. I already had to set worker: 4 to parallelize the recipe. And with this setting, we need about 3 months to grab the 10M links with this level of parallelism. I would prefer to be fair with their server and not run multiple recipes in parallel.
But as you found there is already another offline version based on apps, maybe it is worth asking them for other solutions to access their content and create a custom scraper.
For openzim/zim-requests#1172, we are going to have 40 recipes, one per category on shamela.ws to have pratical recipe duration and practical ZIM sizes.
However, there is only one upstream server, so we need to ensure that we do not run more than one task (out of these 40 recipes) per worker, or even probably only one task on the whole platform to be fair with their servers.
Shall we create a new
shamela.ws
platform we would assign manually to the corresponding recipes?The text was updated successfully, but these errors were encountered: