-
-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deviantart Scraps Downloader 403's randomly #655
Comments
I'm gonna piggyback on this topic instead of making my own since it's also related to the topic about scraps in config. I'm still not really good at writing JSON, so I was hoping for some assistance in being able to add scraps extractor to both my deviantart and furaffinity extractors in my config. Here's what I have currently
|
This isn't going to solve the underlying problem, but it should at least provide the server response when those errors happen.
@sledgehammer93 I suspect there is a hidden rate limit for the @biznizz If you want to download scraps as well as the regular gallery when using user profile links as input, you can add "deviantart":
{
"include": "gallery,scraps",
"...": "..."
},
"furaffinity":
{
"include": "gallery,scraps",
"...": "..."
}, or you just use two URLs per user, e.g. https://www.deviantart.com/USER/gallery/ and https://www.deviantart.com/USER/gallery/scraps, instead of https://www.deviantart.com/USER |
This is the output that I have just received (some sensitive data has been blocked, if that matters, just let me know) It occurs on random URL's for me.:
Edit: Added some more info that my computer apparently didn't select. |
That worked, now I can rip main and scraps galleries with only one command run, thank you! By-the-by, what is this |
If I'm not mistaken, |
This error is apparently from DeviantArt's CDN CloudFront, which will block all requests to deviantart.com after getting too much traffic from the same address. I've set up an endless loop to fetch scraps from the same artist over and over again, and after some time I got the same 403 error. Even visiting the website in a normal browser showed me the following: Going slower through all Deviations by using a delay ( And |
@mikf, Just did a test run with the settings you recommended, and it still manged to 403 after a while. I also wasn't able to access any of Deviantart in a normal browser as well. Here's the biggest change I made to my config file:
|
@sledgehammer93 Did you try these new settings with a new client IP address on your side first? |
I can report that I just had a 403 Error as well that interrupted a rip of a DA gallery and temporarily gave me the error page on my browser for a little while. After using a VPN to change my ISP, I was able to finish downloading and when I turned off VPN, I was able to get on site again normally. Seems that ripping a large amount of images from them now triggers a kind of DDOS prevention routine.
|
I would get banned after ~300 images each time I tried to download the artist I was trying to download. I used a different IP each time. The artist has ~585 images total and only 5 of them are scraps. I gave up and haven't tried downloading since. This was on March 17th btw. I was never able to finish downloading them. The ban only affects your current IP and not your API key. |
It'd still kick you even if you had already downloaded the first 300 images? Like, even though you'd already have them, it'd stop in the same position every time? Odd, you'd think that, since images you'd downloaded are automatically skipped (unless your config is set to redownload them everytime), it'd skip each image you have until image 301 and continue to rip normally afterwards. |
DeviantArt's developers did something on March 17th, it seems. There is even an entry in the API changelog: https://www.deviantart.com/developers/changelog and, of course, they only removed stuff as their first update in 4½ years. @sledgehammer93 try @Twi-Hard use @biznizz It's not downloading images that's causing the ban - most of them are hosted on |
That appears to work reasonably well, @mikf . At the moment, I just have
Update: Tried it again with |
I have Is there any benefit to having abort over exit in |
'/extended_fetch' as well as Deviation webpages now again contain Deviation UUIDs needed to grab Deviation info through the OAuth API, meaning cookies are no longer necessary to grab original files. The only instance were cookies are still needed are scraps marked as "mature", since those entries are hidden for public users. (#655, #657, #660)
- add a 2 second wait time between requests to deviantart.com - catch 403 "Request blocked" errors and wait for 3 minutes until retrying
I've been trying to figure out under what conditions these "Request blocked" errors occur by writing little script that continuously sends HTTP requests to https://www.deviantart.com/ and waits a certain time in between. When waiting <1 seconds, I'd get the error after ca. 250 requests, and after roughly 10min you could send another batch of 250 requests until it happened again. Waiting for 5 seconds didn't result in any errors at all, even after 1500 requests. ff7c0b7 and f9a590f now add a mandatory 2 second wait time between all regular non-OAuth requests, and, should this error still happen, will wait 3 minutes until trying again, hoping that the internal rate limiting is gone. (It'll continue to wait until the block is actually gone) @Twi-Hard Somewhere in between I've also fixed |
@mikf , That appears to have done the trick. Currently downloading a massive number of scraps at the moment, and it has yet to 403 once. Thanks again! |
This was already done for non-OAuth requests (#655) but CF is now blocking OAuth API requests as well.
* save cookies to tempfile, then rename avoids wiping the cookies file if the disk is full * [deviantart:stash] fix 'index' metadata (mikf#5335) * [deviantart:stash] recognize 'deviantart.com/stash/…' URLs * [gofile] fix extraction * [kemonoparty] add 'revision_count' metadata field (mikf#5334) * [kemonoparty] add 'order-revisions' option (mikf#5334) * Fix imagefap extrcator * [twitter] add 'birdwatch' metadata field (mikf#5317) should probably get a better name, but this is what it's called internally by Twitter * [hiperdex] update URL patterns & fix 'manga' metadata (mikf#5340) * [flickr] add 'contexts' option (mikf#5324) * [tests] show full path for nested values 'user.name' instead of just 'name' when testing for "user": { … , "name": "…", … } * [bluesky] add 'instance' metadata field (mikf#4438) * [vipergirls] add 'like' option (mikf#4166) * [vipergirls] add 'domain' option (mikf#4166) * [gelbooru] detect returned favorites order (mikf#5220) * [gelbooru] add 'date_favorited' metadata field * Update fapello.py get fullsize image instead resized * fapello.py Fullsize image by remove ".md" and ".th" in image url, it will download fullsize of images * [formatter] fix local DST datetime offsets for ':O' 'O' would get the *current* local UTC offset and apply it to all 'datetime' objects it gets applied to. This would result in a wrong offset if the current offset includes DST and the target 'datetime' does not or vice-versa. 'O' now determines the correct local UTC offset while respecting DST for each individual 'datetime'. * [subscribestar] fix 'date' metadata * [idolcomplex] support new pool URLs * [idolcomplex] fix metadata extraction - replace legacy 'id' vales with alphanumeric ones, since the former are no longer available - approximate 'vote_average', since the real value is no longer available - fix 'vote_count' * [bunkr] remove 'description' metadata album descriptions are no longer available on album pages and the previous code erroneously returned just '0' * [deviantart] improve 'index' extraction for stash files (mikf#5335) * [kemonoparty] fix exception for '/revision/' URLs caused by 03a9ce9 * [steamgriddb] raise proper exception for deleted assets * [tests] update extractor results * [pornhub:gif] extract 'viewkey' and 'timestamp' metadata (mikf#4463) mikf#4463 (comment) * [tests] use 'datetime.timezone.utc' instead of 'datetime.UTC' 'datetime.UTC' was added in Python 3.11 and is not defined in older versions. * [gelbooru] add 'order-posts' option for favorites (mikf#5220) * [deviantart] handle CloudFront blocks in general (mikf#5363) This was already done for non-OAuth requests (mikf#655) but CF is now blocking OAuth API requests as well. * release version 1.26.9 * [kemonoparty] fix KeyError for empty files (mikf#5368) * [twitter] fix pattern for single tweet (mikf#5371) - Add optional slash - Update tests to include some non-standard tweet URLs * [kemonoparty:favorite] support 'sort' and 'order' query params (mikf#5375) * [kemonoparty] add 'announcements' option (mikf#5262) mikf#5262 (comment) * [wikimedia] suppress exception for entries without 'imageinfo' (mikf#5384) * [docs] update defaults of 'sleep-request', 'browser', 'tls12' * [docs] complete Authentication info in supportedsites.md * [twitter] prevent crash when extracting 'birdwatch' metadata (mikf#5403) * [workflows] build complete docs Pages only on gdl-org/docs deploy only docs/oauth-redirect.html on mikf.github.io/gallery-dl * [docs] document 'actions' (mikf#4543) or at least attempt to * store 'match' and 'groups' in Extractor objects * [foolfuuka] improve 'board' pattern & support pages (mikf#5408) * [reddit] support comment embeds (mikf#5366) * [build] add minimal pyproject.toml * [build] generate sdist and wheel packages using 'build' module * [build] include only the latest CHANGELOG entries The CHANGELOG is now at a size where it takes up roughly 50kB or 10% of an sdist or wheel package. * [oauth] use Extractor.request() for HTTP requests (mikf#5433) Enables using proxies and general network options. * [kemonoparty] fix crash on posts with missing datetime info (mikf#5422) * restore LD_LIBRARY_PATH for PyInstaller builds (mikf#5421) * remove 'contextlib' imports * [pp:ugoira] log errors for general exceptions * [twitter] match '/photo/' Tweet URLs (mikf#5443) fixes regression introduced in 40c0553 * [pp:mtime] do not overwrite '_mtime' for None values (mikf#5439) * [wikimedia] fix exception for files with empty 'metadata' * [wikimedia] support wiki.gg wikis * [pixiv:novel] add 'covers' option (mikf#5373) * [tapas] add 'creator' extractor (mikf#5306) * [twitter] implement 'relogin' option (mikf#5445) * [docs] update docs/configuration links (mikf#5059, mikf#5369, mikf#5423) * [docs] replace AnchorJS with custom script use it in rendered .rst documents as well as in .md ones * [text] catch general Exceptions * compute tempfile path only once * Add warnings flag This commit adds a warnings flag It can be combined with -q / --quiet to display warnings. The intent is to provide a silent option that still surfaces warning and error messages so that they are visible in logs. * re-order verbose and warning options * [gelbooru] improve pagination logic for meta tags (mikf#5478) similar to 494acab * [common] add Extractor.input() method * [twitter] improve username & password login procedure (mikf#5445) - handle more subtasks - support 2FA - support email verification codes * [common] update Extractor.wait() message format * [common] simplify 'status_code' check in Extractor.request() * [common] add 'sleep-429' option (mikf#5160) * [common] fix NameError in Extractor.request() … when accessing 'code' after an requests exception was raised. Caused by the changes in 566472f * [common] show full URL in Extractor.request() error messages * [hotleak] download files with 404 status code (mikf#5395) * [pixiv] change 'sanity_level' debug message to a warning (mikf#5180) * [twitter] handle missing 'expanded_url' fields (mikf#5463, mikf#5490) * [tests] allow filtering extractor result tests by URL or comment python test_results.py twitter:+/i/web/ python test_results.py twitter:~twitpic * [exhentai] detect CAPTCHAs during login (mikf#5492) * [output] extend 'output.colors' (mikf#2566) allow specifying ANSI colors for all loglevels (debug, info, warning, error) * [output] enable colors by default * add '--no-colors' command-line option --------- Co-authored-by: Luc Ritchie <luc.ritchie@gmail.com> Co-authored-by: Mike Fährmann <mike_faehrmann@web.de> Co-authored-by: Herp <asdf@qwer.com> Co-authored-by: wankio <31354933+wankio@users.noreply.github.com> Co-authored-by: fireattack <human.peng@gmail.com> Co-authored-by: Aidan Harris <me@aidanharr.is>
Hello again,
I appear to be having issues involving the scraps downloader of gallery-dl this time. It is now randomly crashing on different links set in my URL file, and then pulling 403 errors for the rest of my links. For example, the downloader will work perfectly fine for the first few URL's, only to show the following error (verbose output):
It then continues to attempt to download from the various links set in my URL file, except it will 403 on every last one. I have my cookies file properly configured. Here is my gallery-dl.conf file:
Thanks again.
The text was updated successfully, but these errors were encountered: