-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tumblr] Blog extraction tripping over some posts #2957
Comments
It depends on the The HTML page is supposed to always have a URL with the correct token and is what gets used by gallery-dl to upgrade a low-res image URL to its original size.
For some reason step 3) returned the same URL as step 2) in this case. which is obviously not supposed to happen. I guess we could retry step 3) until it returns a different result than step 2).
Yeah, that's kind of a problem. I let it go through the first 1k-2k posts of Was this the only time such an error happened, or was it several times?
It's not. The HTTP request in step 3) doesn't use any form of authentication, regardless of your OAuth settings.
Would be useful, but doing this with the current infrastructure would be painful since each extractor would have to manually update its current post URL value. |
Two attempts with
Yeah, I realize that |
@mikf I just did a quick test with the Python one, and I ended up with the same issues I've been seeing in their web API console, e.g. always getting a 404 when trying to fetch a specific post. Funnily enough, there's a fork on GitHub (and PyPI) ( I think this confirms that the results from their API console don't really reflect the actual workings of their live API, and should probably not be relied on.. Still, if it's not related to authentication, as you said as well, I'm wondering what the culprit here might be. Just trying to rule out the possibilities. The config for Tumblr? |
The problem is not API related, in that gallery-dl uses a non-API way to get higher-resolution images than the ones returned by the API. The API will most likely always return images only up to a certain size. The culprit is the method used not being 100% reliable, or at least sometimes not updating the token at the end of an image URL. What I would like to see is the |
and download the smaller version instead of failing with a 404 error
It now retries fetching the higher-resolution version and prints a better warning when even that fails. It currently also downloads the lower-resolution version instead, which might not be the best idea ...
|
Wait... did this lower-res fallback also happen to you in your initial test? Or is this unrelated to the latest changes now? |
The lower-res "fallback" (it's not a real fallback that can be disabled) did not happen before the latest changes. |
I've now implemented this as a proper fallback that can be disabled |
If it was working (working better) for you before these changes I'd consider this a regression? Okay, I'm testing this specific post again, first with the old revision and then with the latest commits. 1. gallery-dl version used as in the original post of this issue, i.e. without 32c3075 and f728b5c
So, the second GET request here is for This means it's getting the correct token for the low-res upgrade by gallery-dl in this case, right? In any way, the created JPG file is:
bespokeprovocateur2_686406675559841792_01.jpg.json{
"bkey": "",
"blog": {
"ask": true,
"ask_anon": true,
"ask_page_title": "Ask",
"asks_allow_media": true,
"avatar": [
{
"height": 512,
"url": "https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s512x512u_c1/cc2663fea6ccc0e5eb0164977dedb3e05f102133.png",
"width": 512
},
{
"height": 128,
"url": "https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s128x128u_c1/476a1836ec3d368494e897c7daf3b6c38fd3687e.png",
"width": 128
},
{
"height": 96,
"url": "https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s96x96u_c1/3cc11b810cb9696e8786cf5a967c332762897828.png",
"width": 96
},
{
"height": 64,
"url": "https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s64x64u_c1/58ef4f69c1766c5b51a7adb09167048b844868f4.png",
"width": 64
}
],
"can_chat": false,
"can_send_fan_mail": true,
"can_submit": true,
"can_subscribe": true,
"description": "<p>TX/30M/🔓</p>",
"followed": true,
"is_blocked_from_primary": false,
"is_nsfw": false,
"name": "bespokeprovocateur2",
"posts": 22118,
"share_likes": false,
"submission_page_title": "Submit",
"submission_terms": {
"accepted_types": [
"text",
"photo",
"quote",
"link",
"video"
],
"guidelines": "",
"tags": [
"submission"
],
"title": "Submit"
},
"subscribed": false,
"theme": {
"avatar_shape": "square",
"background_color": "#000000",
"body_font": "Helvetica Neue",
"header_bounds": "",
"header_full_height": 1055,
"header_full_width": 3000,
"header_image": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s3000x1055/e24516c8fcf8e9dc81b6065d326e46b274a5047c.png",
"header_image_focused": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png",
"header_image_poster": "",
"header_image_scaled": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png",
"header_stretch": true,
"link_color": "#b6b4b4",
"show_avatar": true,
"show_description": true,
"show_header_image": false,
"show_title": true,
"title_color": "#ffffff",
"title_font": "Bodoni Recut FS",
"title_font_weight": "regular"
},
"title": "Bespoke Provocateur",
"total_posts": 22118,
"updated": 1664387332,
"url": "https://bespokeprovocateur2.tumblr.com/",
"uuid": "bespokeprovocateur2.tumblr.com"
},
"blog_name": "bespokeprovocateur2",
"body": "<p><a class=\"tumblr_blog\" href=\"https://risiblesvmours.tumblr.com/post/685723365852545024\" target=\"_blank\">risiblesvmours</a>:</p><blockquote><figure class=\"tmblr-full\" data-orig-height=\"1533\" data-orig-width=\"1242\"><img src=\"https://64.media.tumblr.com/1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21-63/s640x960/cee6f8e33799615ac0acc958e244a0d1fbc5ef0c.jpg\" data-orig-height=\"1533\" data-orig-width=\"1242\"/></figure></blockquote>",
"can_like": true,
"can_reblog": true,
"can_reply": true,
"can_send_in_message": true,
"category": "tumblr",
"ckey": "",
"count": 1,
"date": "2022-06-07 13:26:57",
"display_avatar": true,
"extension": "jpg",
"filename": "c76d6df4266c74173985c757304a2a9bf214859b",
"followed": true,
"format": "html",
"hash": "c76d6df4266c74173985c757304a2a9bf214859b",
"id": 686406675559841792,
"id_string": "686406675559841792",
"interactability_reblog": "everyone",
"liked": false,
"mkey": "",
"note_count": 241,
"num": 1,
"post_url": "https://bespokeprovocateur2.tumblr.com/post/686406675559841792",
"reblog": {
"comment": "",
"tree_html": "<p><a class=\"tumblr_blog\" href=\"https://risiblesvmours.tumblr.com/post/685723365852545024\" target=\"_blank\">risiblesvmours</a>:</p><blockquote><figure class=\"tmblr-full\" data-orig-height=\"1533\" data-orig-width=\"1242\"><img src=\"https://64.media.tumblr.com/1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21-63/s640x960/cee6f8e33799615ac0acc958e244a0d1fbc5ef0c.jpg\" data-orig-height=\"1533\" data-orig-width=\"1242\"></figure></blockquote>"
},
"reblog_key": "VzmnKPLR",
"reblogged": true,
"reblogged_from_can_message": true,
"reblogged_from_following": false,
"reblogged_from_id": "686153284087627776",
"reblogged_from_name": "honeyandrosewater",
"reblogged_from_title": "Honey & Rose Water",
"reblogged_from_url": "https://honeyandrosewater.tumblr.com/post/686153284087627776",
"reblogged_from_uuid": "t:XXI_6JjZWGCBZ39HYtKpfA",
"reblogged_root_can_message": true,
"reblogged_root_following": false,
"reblogged_root_id": "685723365852545024",
"reblogged_root_name": "risiblesvmours",
"reblogged_root_title": "Martinelli",
"reblogged_root_url": "https://risiblesvmours.tumblr.com/post/685723365852545024",
"reblogged_root_uuid": "t:sp6ON9v2PhXlW9pxdJkAaQ",
"recommended_color": null,
"recommended_source": null,
"short_url": "https://tmblr.co/ZbBQMSc6cpmKCu00",
"should_open_in_legacy": false,
"skey": "",
"slug": "",
"state": "published",
"subcategory": "post",
"summary": "",
"tags": [],
"timestamp": 1654608417,
"title": "",
"tkey": "",
"type": "text"
}
info.json{
"bkey": "",
"blog": {
"ask": true,
"ask_anon": true,
"ask_page_title": "Ask",
"asks_allow_media": true,
"avatar": [
{
"height": 512,
"url": "https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s512x512u_c1/cc2663fea6ccc0e5eb0164977dedb3e05f102133.png",
"width": 512
},
{
"height": 128,
"url": "https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s128x128u_c1/476a1836ec3d368494e897c7daf3b6c38fd3687e.png",
"width": 128
},
{
"height": 96,
"url": "https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s96x96u_c1/3cc11b810cb9696e8786cf5a967c332762897828.png",
"width": 96
},
{
"height": 64,
"url": "https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s64x64u_c1/58ef4f69c1766c5b51a7adb09167048b844868f4.png",
"width": 64
}
],
"can_chat": false,
"can_send_fan_mail": true,
"can_submit": true,
"can_subscribe": true,
"description": "<p>TX/30M/🔓</p>",
"followed": true,
"is_blocked_from_primary": false,
"is_nsfw": false,
"name": "bespokeprovocateur2",
"posts": 22118,
"share_likes": false,
"submission_page_title": "Submit",
"submission_terms": {
"accepted_types": [
"text",
"photo",
"quote",
"link",
"video"
],
"guidelines": "",
"tags": [
"submission"
],
"title": "Submit"
},
"subscribed": false,
"theme": {
"avatar_shape": "square",
"background_color": "#000000",
"body_font": "Helvetica Neue",
"header_bounds": "",
"header_full_height": 1055,
"header_full_width": 3000,
"header_image": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s3000x1055/e24516c8fcf8e9dc81b6065d326e46b274a5047c.png",
"header_image_focused": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png",
"header_image_poster": "",
"header_image_scaled": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png",
"header_stretch": true,
"link_color": "#b6b4b4",
"show_avatar": true,
"show_description": true,
"show_header_image": false,
"show_title": true,
"title_color": "#ffffff",
"title_font": "Bodoni Recut FS",
"title_font_weight": "regular"
},
"title": "Bespoke Provocateur",
"total_posts": 22118,
"updated": 1664387332,
"url": "https://bespokeprovocateur2.tumblr.com/",
"uuid": "bespokeprovocateur2.tumblr.com"
},
"blog_name": "bespokeprovocateur2",
"body": "<p><a class=\"tumblr_blog\" href=\"https://risiblesvmours.tumblr.com/post/685723365852545024\" target=\"_blank\">risiblesvmours</a>:</p><blockquote><figure class=\"tmblr-full\" data-orig-height=\"1533\" data-orig-width=\"1242\"><img src=\"https://64.media.tumblr.com/1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21-63/s640x960/cee6f8e33799615ac0acc958e244a0d1fbc5ef0c.jpg\" data-orig-height=\"1533\" data-orig-width=\"1242\"/></figure></blockquote>",
"can_like": true,
"can_reblog": true,
"can_reply": true,
"can_send_in_message": true,
"category": "tumblr",
"ckey": "",
"count": 1,
"date": "2022-06-07 13:26:57",
"display_avatar": true,
"followed": true,
"format": "html",
"id": 686406675559841792,
"id_string": "686406675559841792",
"interactability_reblog": "everyone",
"liked": false,
"mkey": "",
"note_count": 241,
"post_url": "https://bespokeprovocateur2.tumblr.com/post/686406675559841792",
"reblog": {
"comment": "",
"tree_html": "<p><a class=\"tumblr_blog\" href=\"https://risiblesvmours.tumblr.com/post/685723365852545024\" target=\"_blank\">risiblesvmours</a>:</p><blockquote><figure class=\"tmblr-full\" data-orig-height=\"1533\" data-orig-width=\"1242\"><img src=\"https://64.media.tumblr.com/1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21-63/s640x960/cee6f8e33799615ac0acc958e244a0d1fbc5ef0c.jpg\" data-orig-height=\"1533\" data-orig-width=\"1242\"></figure></blockquote>"
},
"reblog_key": "VzmnKPLR",
"reblogged": true,
"reblogged_from_can_message": true,
"reblogged_from_following": false,
"reblogged_from_id": "686153284087627776",
"reblogged_from_name": "honeyandrosewater",
"reblogged_from_title": "Honey & Rose Water",
"reblogged_from_url": "https://honeyandrosewater.tumblr.com/post/686153284087627776",
"reblogged_from_uuid": "t:XXI_6JjZWGCBZ39HYtKpfA",
"reblogged_root_can_message": true,
"reblogged_root_following": false,
"reblogged_root_id": "685723365852545024",
"reblogged_root_name": "risiblesvmours",
"reblogged_root_title": "Martinelli",
"reblogged_root_url": "https://risiblesvmours.tumblr.com/post/685723365852545024",
"reblogged_root_uuid": "t:sp6ON9v2PhXlW9pxdJkAaQ",
"recommended_color": null,
"recommended_source": null,
"short_url": "https://tmblr.co/ZbBQMSc6cpmKCu00",
"should_open_in_legacy": false,
"skey": "",
"slug": "",
"state": "published",
"subcategory": "post",
"summary": "",
"tags": [],
"timestamp": 1654608417,
"title": "",
"tkey": "",
"type": "text"
}
Btw, the only difference seems to be that the first file has these four lines that do not exist in the second file
is this normal?
1. gallery-dl latest dev version from the repo
In any way, the created JPG file is:
(phew, at least something is working 😄 )
bespokeprovocateur2_686406675559841792_01.jpg.json{
"bkey": "",
"blog": {
"ask": true,
"ask_anon": true,
"ask_page_title": "Ask",
"asks_allow_media": true,
"avatar": [
{
"height": 512,
"url": "https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s512x512u_c1/cc2663fea6ccc0e5eb0164977dedb3e05f102133.png",
"width": 512
},
{
"height": 128,
"url": "https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s128x128u_c1/476a1836ec3d368494e897c7daf3b6c38fd3687e.png",
"width": 128
},
{
"height": 96,
"url": "https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s96x96u_c1/3cc11b810cb9696e8786cf5a967c332762897828.png",
"width": 96
},
{
"height": 64,
"url": "https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s64x64u_c1/58ef4f69c1766c5b51a7adb09167048b844868f4.png",
"width": 64
}
],
"can_chat": false,
"can_send_fan_mail": true,
"can_submit": true,
"can_subscribe": true,
"description": "<p>TX/30M/🔓</p>",
"followed": true,
"is_blocked_from_primary": false,
"is_nsfw": false,
"name": "bespokeprovocateur2",
"posts": 22118,
"share_likes": false,
"submission_page_title": "Submit",
"submission_terms": {
"accepted_types": [
"text",
"photo",
"quote",
"link",
"video"
],
"guidelines": "",
"tags": [
"submission"
],
"title": "Submit"
},
"subscribed": false,
"theme": {
"avatar_shape": "square",
"background_color": "#000000",
"body_font": "Helvetica Neue",
"header_bounds": "",
"header_full_height": 1055,
"header_full_width": 3000,
"header_image": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s3000x1055/e24516c8fcf8e9dc81b6065d326e46b274a5047c.png",
"header_image_focused": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png",
"header_image_poster": "",
"header_image_scaled": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png",
"header_stretch": true,
"link_color": "#b6b4b4",
"show_avatar": true,
"show_description": true,
"show_header_image": false,
"show_title": true,
"title_color": "#ffffff",
"title_font": "Bodoni Recut FS",
"title_font_weight": "regular"
},
"title": "Bespoke Provocateur",
"total_posts": 22118,
"updated": 1664387332,
"url": "https://bespokeprovocateur2.tumblr.com/",
"uuid": "bespokeprovocateur2.tumblr.com"
},
"blog_name": "bespokeprovocateur2",
"body": "<p><a class=\"tumblr_blog\" href=\"https://risiblesvmours.tumblr.com/post/685723365852545024\" target=\"_blank\">risiblesvmours</a>:</p><blockquote><figure class=\"tmblr-full\" data-orig-height=\"1533\" data-orig-width=\"1242\"><img src=\"https://64.media.tumblr.com/1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21-63/s640x960/cee6f8e33799615ac0acc958e244a0d1fbc5ef0c.jpg\" data-orig-height=\"1533\" data-orig-width=\"1242\"/></figure></blockquote>",
"can_like": true,
"can_reblog": true,
"can_reply": true,
"can_send_in_message": true,
"category": "tumblr",
"ckey": "",
"count": 1,
"date": "2022-06-07 13:26:57",
"display_avatar": true,
"extension": "jpg",
"filename": "c76d6df4266c74173985c757304a2a9bf214859b",
"followed": true,
"format": "html",
"hash": "c76d6df4266c74173985c757304a2a9bf214859b",
"id": 686406675559841792,
"id_string": "686406675559841792",
"interactability_reblog": "everyone",
"liked": false,
"mkey": "",
"note_count": 241,
"num": 1,
"post_url": "https://bespokeprovocateur2.tumblr.com/post/686406675559841792",
"reblog": {
"comment": "",
"tree_html": "<p><a class=\"tumblr_blog\" href=\"https://risiblesvmours.tumblr.com/post/685723365852545024\" target=\"_blank\">risiblesvmours</a>:</p><blockquote><figure class=\"tmblr-full\" data-orig-height=\"1533\" data-orig-width=\"1242\"><img src=\"https://64.media.tumblr.com/1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21-63/s640x960/cee6f8e33799615ac0acc958e244a0d1fbc5ef0c.jpg\" data-orig-height=\"1533\" data-orig-width=\"1242\"></figure></blockquote>"
},
"reblog_key": "VzmnKPLR",
"reblogged": true,
"reblogged_from_can_message": true,
"reblogged_from_following": false,
"reblogged_from_id": "686153284087627776",
"reblogged_from_name": "honeyandrosewater",
"reblogged_from_title": "Honey & Rose Water",
"reblogged_from_url": "https://honeyandrosewater.tumblr.com/post/686153284087627776",
"reblogged_from_uuid": "t:XXI_6JjZWGCBZ39HYtKpfA",
"reblogged_root_can_message": true,
"reblogged_root_following": false,
"reblogged_root_id": "685723365852545024",
"reblogged_root_name": "risiblesvmours",
"reblogged_root_title": "Martinelli",
"reblogged_root_url": "https://risiblesvmours.tumblr.com/post/685723365852545024",
"reblogged_root_uuid": "t:sp6ON9v2PhXlW9pxdJkAaQ",
"recommended_color": null,
"recommended_source": null,
"short_url": "https://tmblr.co/ZbBQMSc6cpmKCu00",
"should_open_in_legacy": false,
"skey": "",
"slug": "",
"state": "published",
"subcategory": "post",
"summary": "",
"tags": [],
"timestamp": 1654608417,
"title": "",
"tkey": "",
"type": "text"
}
info.json{
"bkey": "",
"blog": {
"ask": true,
"ask_anon": true,
"ask_page_title": "Ask",
"asks_allow_media": true,
"avatar": [
{
"height": 512,
"url": "https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s512x512u_c1/cc2663fea6ccc0e5eb0164977dedb3e05f102133.png",
"width": 512
},
{
"height": 128,
"url": "https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s128x128u_c1/476a1836ec3d368494e897c7daf3b6c38fd3687e.png",
"width": 128
},
{
"height": 96,
"url": "https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s96x96u_c1/3cc11b810cb9696e8786cf5a967c332762897828.png",
"width": 96
},
{
"height": 64,
"url": "https://64.media.tumblr.com/fb9a6870e7df61d39357513e04a86947/003da3d2eca6f293-88/s64x64u_c1/58ef4f69c1766c5b51a7adb09167048b844868f4.png",
"width": 64
}
],
"can_chat": false,
"can_send_fan_mail": true,
"can_submit": true,
"can_subscribe": true,
"description": "<p>TX/30M/🔓</p>",
"followed": true,
"is_blocked_from_primary": false,
"is_nsfw": false,
"name": "bespokeprovocateur2",
"posts": 22118,
"share_likes": false,
"submission_page_title": "Submit",
"submission_terms": {
"accepted_types": [
"text",
"photo",
"quote",
"link",
"video"
],
"guidelines": "",
"tags": [
"submission"
],
"title": "Submit"
},
"subscribed": false,
"theme": {
"avatar_shape": "square",
"background_color": "#000000",
"body_font": "Helvetica Neue",
"header_bounds": "",
"header_full_height": 1055,
"header_full_width": 3000,
"header_image": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s3000x1055/e24516c8fcf8e9dc81b6065d326e46b274a5047c.png",
"header_image_focused": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png",
"header_image_poster": "",
"header_image_scaled": "https://64.media.tumblr.com/9791e3ac6b55616ef7caa5d5fffa1886/003da3d2eca6f293-f0/s2048x3072/ded99f0016c56b65e0b9cad2f7af5fc89d167f1f.png",
"header_stretch": true,
"link_color": "#b6b4b4",
"show_avatar": true,
"show_description": true,
"show_header_image": false,
"show_title": true,
"title_color": "#ffffff",
"title_font": "Bodoni Recut FS",
"title_font_weight": "regular"
},
"title": "Bespoke Provocateur",
"total_posts": 22118,
"updated": 1664387332,
"url": "https://bespokeprovocateur2.tumblr.com/",
"uuid": "bespokeprovocateur2.tumblr.com"
},
"blog_name": "bespokeprovocateur2",
"body": "<p><a class=\"tumblr_blog\" href=\"https://risiblesvmours.tumblr.com/post/685723365852545024\" target=\"_blank\">risiblesvmours</a>:</p><blockquote><figure class=\"tmblr-full\" data-orig-height=\"1533\" data-orig-width=\"1242\"><img src=\"https://64.media.tumblr.com/1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21-63/s640x960/cee6f8e33799615ac0acc958e244a0d1fbc5ef0c.jpg\" data-orig-height=\"1533\" data-orig-width=\"1242\"/></figure></blockquote>",
"can_like": true,
"can_reblog": true,
"can_reply": true,
"can_send_in_message": true,
"category": "tumblr",
"ckey": "",
"count": 1,
"date": "2022-06-07 13:26:57",
"display_avatar": true,
"followed": true,
"format": "html",
"id": 686406675559841792,
"id_string": "686406675559841792",
"interactability_reblog": "everyone",
"liked": false,
"mkey": "",
"note_count": 241,
"post_url": "https://bespokeprovocateur2.tumblr.com/post/686406675559841792",
"reblog": {
"comment": "",
"tree_html": "<p><a class=\"tumblr_blog\" href=\"https://risiblesvmours.tumblr.com/post/685723365852545024\" target=\"_blank\">risiblesvmours</a>:</p><blockquote><figure class=\"tmblr-full\" data-orig-height=\"1533\" data-orig-width=\"1242\"><img src=\"https://64.media.tumblr.com/1d78c58335cb946162352f2b63bb2740/7ca740d84fe9eb21-63/s640x960/cee6f8e33799615ac0acc958e244a0d1fbc5ef0c.jpg\" data-orig-height=\"1533\" data-orig-width=\"1242\"></figure></blockquote>"
},
"reblog_key": "VzmnKPLR",
"reblogged": true,
"reblogged_from_can_message": true,
"reblogged_from_following": false,
"reblogged_from_id": "686153284087627776",
"reblogged_from_name": "honeyandrosewater",
"reblogged_from_title": "Honey & Rose Water",
"reblogged_from_url": "https://honeyandrosewater.tumblr.com/post/686153284087627776",
"reblogged_from_uuid": "t:XXI_6JjZWGCBZ39HYtKpfA",
"reblogged_root_can_message": true,
"reblogged_root_following": false,
"reblogged_root_id": "685723365852545024",
"reblogged_root_name": "risiblesvmours",
"reblogged_root_title": "Martinelli",
"reblogged_root_url": "https://risiblesvmours.tumblr.com/post/685723365852545024",
"reblogged_root_uuid": "t:sp6ON9v2PhXlW9pxdJkAaQ",
"recommended_color": null,
"recommended_source": null,
"short_url": "https://tmblr.co/ZbBQMSc6cpmKCu00",
"should_open_in_legacy": false,
"skey": "",
"slug": "",
"state": "published",
"subcategory": "post",
"summary": "",
"tags": [],
"timestamp": 1654608417,
"title": "",
"tkey": "",
"type": "text"
}
The
The JPG files are identical as well
The rest is also the same..
|
I reckon the I'll start a full blog extraction of this thing right now.. |
Nope... that was pretty quick this time. After just 22 downloaded files:
Same thing in the logfile: I'll bet that using the post extractor would work here, again. |
Thanks for the thorough bug report. I gave it another shot with e1d7149 by catching any 404 Not Found errors and relying on the fallback to sort it out. I've also finally managed to trigger this error myself after only 3 file downloads, but from then on never again ... and of course I didn't use |
Oh, this is getting interesting now.. 😄 I'll update to e1d7149 now and will start a new run later today... By the way, I noticed this while scrolling through gallery-dl/gallery_dl/extractor/tumblr.py Lines 252 to 258 in e1d7149
What's the point of these three yield statements? One more thing, just to rule out possible causes step-by-step, I've also tested a full blog extraction run with the standalone executable from here: Result: Shows exactly the same issue as my system's python interpreter with the python package. |
So the changes from e1d7149 still result in HttpErrors coming from the extractor side? It really shouldn't ...
Three extra/fallback attempts to grab the correct URL.
But that one is from before e1d7149 ... |
True. I was aware, this was simply a test on my end, to confirm that there would be no differences between my system python interpreter and the bundled standalone executable. And, as shown, both indeed had the same behaviour for me..
No, it's making progress now. gallery-dl/gallery_dl/extractor/tumblr.py Lines 244 to 250 in e1d7149
I think the exception is now handled here, and I could download the whole blog thing (23.6 GiB - thanks, hugely inefficient GIF format!) but it seems like it's still getting that unexpected token thing... gallery-dl log for this tumblr blog extraction
But I noticed something, many times these errors here did not happen randomly and in isolation, it seems like they were bunched together, i.e. many of them happening in direct succession for a dozen times or so. If this is indeed not related to any authentication, and not some wrong result from the API, I'm inclined to believe that this could be caused by the response from (in this case) |
Maybe there is some sort of invisible rate limit? Would explain why these errors come bundled. In any case, retrying these requests 3 times doesn't solve anything it seems. |
Yup.. but maybe adding a forced delay between these 3 request repetitions would help? |
I've now added a 2 minute wait time between each fallback. Maybe that helps. (e5d229c) |
Okay, I've downloaded this whole thing again with the latest gallery-dl release (v1.23.3) While the last run with the old version failed with 594 request attempts, it now logged just 35 failed attempts. Here, the first 30 lines from the logfile:
You can see the two minute gaps in the timestamps. Also, you can see the "using fallback 1, 2, 3" messages, and then just "fallback 1, 2" and also a couple of "using fallback 1" before it proceeds, I'll check the remaining error emitting URLs manually, to see if the image res upgrade is actually making a difference here.. Edit: Maybe I should also add, for the sake of completeness, that this was done with the latest stable Python release, i.e.
While the old run was still on 3.10.7 |
Okay, I've checked the remaining URLs from my log... Ignoring duplicate entries, this is basically all that remained:
The only thing remarkable here, is that contrary to my earlier tests in this thread, all of these (except 4) did not work in a clean browser profile this time. They only seem to work for me when I use my normal browser profile, where I'm signed in to Tumblr. Still no dice with curl either, even if I'm setting an
Regarding the URL hi-res replacing, I did not see any difference here in this case. The replaced URL works in the browser, but the image would've been within normal Tumblr size anyway. Might depend on the blog in this case, I assume here it's only stuff circulating on Tumblr without any other material, this might explain why.. And Tumblr being Tumblr, all of the content from the logged URLs here have been duplicates as well anyway, already downloaded by the initial blog grab.. Well, except for one picture maybe, but don't pin me down to this, maybe this one slipped through and I forgot it or something.. The hardcoded wait time between the fallbacks made an improvement here, so I'm definitely in favor of keeping this. The only other "solution" that would help in such a case as described here is to log the actual post URLs causing any HTTP error, as I've mentioned in some earlier comment. Because the logfile could then be basically reused as an input file, in effect feeding all "leftover" URLs to the TumblrPostExtractor in one step, and pronto, already done. Yeah, that would be the only real improvement I can think of right now.. Otherwise, I'd be inclined to close this issue as solved because there isn't anything that can be done on the client side anymore, I believe.. |
specifically 'fallback-delay' and 'fallback-retries' and change default number of retries to 2 (down from 3)
It's not hardcoded anymore: 7c6af27
It's only the post ID and not the entire URL, but the number at the end of a final warning message is the ID of the post the failing image is from. For example it'd be
so the referenced post is at https://bespokeprovocateur2.tumblr.com/post/686362190566080512 |
Thanks for these examples, appreciate it!
I'll take that as well, if you absolutely insist.. 😉
Well.. you're right, obviously. The Post ID is enough to be able to always reconstruct the URL.
Yeah, reading does help. Should've looked at the actual source itself, because even I understand enough of Python for this.. gallery-dl/gallery_dl/extractor/tumblr.py Lines 258 to 259 in c8af1f5
Okay, knowing that I can extract those IDs from the log is all that's really needed here.. |
specifically 'fallback-delay' and 'fallback-retries' and change default number of retries to 2 (down from 3)
Been trying Tumblr extraction for the first time since, well, last year or so, and encountered some strange 404 error causing blog extraction to halt.
From my logfile:
What's a bit strange to me here: This URL works in the browser, even in incognito/empty profile.
Please note, contrary to the URL itself which indicates a direct link to a JPG file, I end up redirected in the browser.
(This has been a redesign of their site, not too long ago. Direct URLs like this used to work)
"View Source" of that redirection page from Firefox private tab
Fun fact, if I do the usual
Copy Image Link
orView Image
from the context menu, I'll get the actual JPG image. The URL is nowIt's not the same URL, the path segment in the URL now ends in
c76d6df4266c74173985c757304a2a9bf214859b
(right before.jpg
)Would be interesting to know if this URL could be directly derived from the original URL that ends up in the log..
While working in a browser (tested here with Firefox and Chrome), it actually does not work with curl, even when changing the User-Agent, it'll simply show this 404 error...
Full curl output
So, if anyone knows the missing headers or something, please share here..
One more thing, the resulting redirected page from the browser seems to hint at this post URL here:
Which is not from the blog I've used with gallery-dl, because that would be
so it seems the logged URL from gallery-dl is already a step into the extraction process and following a reblog here?
I've checked the
/archive
of the target blog and compared with the IDs of already downloaded files to confirm the order and check whether there would be anything unusual about this specific post, but it seems perfectly normal, straight from the middle.The post URL seems to be:
Short Intermission here:
I'm using
{extractor.url}
in the log settings, which always gives the default blog URL, i.e. the "input" URL as fed to gallery-dl, is it possible to directly log the offending post URL somehow?If it's not possible, maybe consider this a feature request..
As you may have guessed, using this post URL with my normal config actually works.
Even better:
So, I'm afraid, and a little bit at loss as how to reliable reproduce this. I've made this test twice now, including deleting gallery-dl's cache file and archive file for tumblr etc., but it tripped over two different postings at each attempt, unfortunately..
I also checked with
https://api.tumblr.com/console/
, the output is as follows:https://api.tumblr.com/console/calls/blog/info
:JSON
So, total post count is 21994 at the moment, so it's not that small of a blog, unfortunately.
https://api.tumblr.com/console/calls/blog/posts
withID = 686406675559841792
:API Console does not seem to work either here...
But, for what's it worth, I get the same 404 result for all posts here.
Even tried it with my own test blog, but it's the same error here as well, so either I'm using Tumblr's API console wrong, or it's actually not working as implied..
gallery-dl and Python versions:
Using my own OAuth v1 API key of course, i.e. I see this message in my verbose log:
gallery-dl/gallery_dl/oauth.py
Line 126 in 583bee7
So it's probably not related to authentication?
The text was updated successfully, but these errors were encountered: