-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Twitter giving frequent rate limits #3557
Comments
Please do a test run using |
I think this is because of cached guest tokens and Twitter reducing the rate limit for searches to 350 per 15m. Twitter rate limits are bound to a guest token or account, and gallery-dl reuses the same guest token for up to one hour, even across multiple gallery-dl instances. snscrape on the other hand requests a new token each time it is run. You can prevent guest token reuse by disabling gallery-dl's cache: |
Then there is nothing that can be done. I'm afraid, or at least nothing that I'm aware of. When you are logged in, you have a rate limit separate from any guest tokens at also 350 requests every 15 minutes, and it applies to all requests that your account sends. Sending a guest token together with your login cookies, which gallery-dl currently does not do - either token when logged out or cookies, does not help either. In this case Twitter still uses your account's rate limit and ignores the token. You might be able to use the syndication API while not logged in, if that's an option for you. snscrape doesn't support login/cookies for Twitter, does it? |
The snscrape dev has made it very clear he'll never add support for authentication (source) Perhaps there's a way to only search age restricted tweets that I can do logged in after the rest of the download? How well would the syndication api work for me? Would I still get every tweet I would have if I was logged in and is the metadata much different? Metadata is really important to me. |
I've used "twitter": {
"sleep": 0.5,
"sleep-request": 0.5
}, together with a dummy acc for a few weeks by now: nowhere nearly as much rate-limiting after they limited the request count, which ##SFW.
gallery-dl -v 'https://twitter.com/MidPrem' 'https://twitter.com/MidPrem/media' alone always got a couple times (I think) even with the archive file. I chose |
I have way too many accounts to download for only 1 instance of the downloader to ever get through them. I usually have 10 running at once. Adding a 0.5 second delay wouldn't fix it for me. |
Yeah, your use case is too extreme for simple delays. Only now realized you were the OP, to boot. |
Probably. As long as Twitter returns the IDs of age-restricted tweets there would be no difference.
The only difference I'd noticed was the metadata for users. I implemented the The caveat with the syndication API is that it needs to be called for each age-restricted tweet, so you'll probably going to run into rate limits as well.
There's always the option of investing in a Raspberry Pi and letting your download jobs run 24/7. I don't have a lot of accounts to download, so I don't care if I have to set a 10 sec delay and let it run for several days. |
Was the rate limit reduced even more? I'm being able to download only 50 posts each 15 minutes. I'm using an input-file with a bunch of links. |
Same here, I tried with an old and new account, I can only download/see 50 post every 15 minutes |
Ever since the twitter extractor was fixed after it broke I've been getting frequent rate limits. This doesn't happen with snscrape. I haven't tested other scrapers. I used to run 10+ instances of gallery-dl at a time, very fast and without rate limiting. snscrape scrapes faster as usual (probably because it isn't creating a ton of files like gallery-dl) and doesn't get rate limited. Is there something that can be done to fix this? I've tried 2 different accounts with username and password but that didn't fix the issue.
Thanks :)
The text was updated successfully, but these errors were encountered: