Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] get reddit metadata without category transfer for DirectlinkExtractor. #6703

Closed
dajotim937 opened this issue Dec 22, 2024 · 12 comments

Comments

@dajotim937
Copy link

NSFW: https://www.reddit.com/r/SauceSharingCommunity/comments/1gjuyzr/sauce_please/

So, before mass purge from imgur even if submission on reddit was imgur direct link, I set in reddit extractor "parent-metadata": "reddit_metadata", and for imgur extractor:

"album": {
 "filename": {
  "'reddit_metadata' in locals() and reddit_metadata['subcategory'] == 'user'": "*first version of filename for albums*.{extension}",
  "'reddit_metadata' in locals()": "*second version of filename for albums*.{extension}",
  "": "*third version of filename for albums*.{extension}"
 },
 "directory": [
  "*directory for album sumbission"
 ]
},
"filename": {
  "'reddit_metadata' in locals() and reddit_metadata['subcategory'] == 'user'": "*first version of filename*.{extension}",
  "'reddit_metadata' in locals()": "*second version of filename*.{extension}",
  "": "*third version of filename*.{extension}"
},
"directory": [
 "."
],

It worked fine for imgur and redgif extractors.

But I can't get the same for imgchest. No matter what I do for imagechest extractor (copied from -K for random post on imgchest.com), gallery-dl always spawn DirectlinkExtractor and set .\cdn.imgchest.com_files_w7w6cw5o8gy.jpg filename instead for filename from my config for imagechest extractor (config works fine if I download albums/posts from imgchest.com links).
I can't use category-transfer because I set similar (as imgur) config for redgif, where I need both reddit metadata and redgif metadata. So. How can I solve this situation?
Do I need now to create config for directlink extractor and check if domain == cdn.imgchest.com and then check for 'reddit_metadata' in locals() and duplicate config from imagechest extractor to directlink extractor?

@mikf
Copy link
Owner

mikf commented Dec 22, 2024

Kind of, but you don't need to check for 'reddit_metadata' in locals() by using reddit>directlink as category. I don't think there is a better way to distinguish between reddit:user and reddit in general other than reddit_metadata['subcategory'] == 'user'.

{
    "extractor": {
        "reddit>directlink": {
            "filename": {
                "domain == 'cdn.imgchest.com' and reddit_metadata['subcategory'] == 'user'": "...",
                "domain == 'cdn.imgchest.com'"                                             : "..."
            }
        }
    }
}

@dajotim937
Copy link
Author

Okay.
Then, is there a reason why direct links from imgur/redgif from reddit extractor execute imgur/redgif subextractor but direct link from imgchest execute directlink extractor?
Example: https://www.reddit.com/r/memes/comments/gi8l07/they_what/ is having https://i.imgur.com/KpwIuSO.png as submission, but still trigger imgur extractor.

@Hrxn
Copy link
Contributor

Hrxn commented Dec 22, 2024

Yes, because this URL matches

class ImgurImageExtractor(ImgurExtractor):

@dajotim937
Copy link
Author

I'm asking logical reason, not technical.

@mikf
Copy link
Owner

mikf commented Dec 22, 2024

Idea: Use a directlink URL's domain (imgchest.com in this case) as subcategory, so the domain == 'cdn.imgchest.com' check can be simplified.

Idea 2: Support subcategories for parent extractor categories, e.g. reddit:user>directlink, so reddit_metadata['subcategory'] == 'user' can be simplified. Should be a similar simple change as * wildcards were (5ab2ae1).

@mikf
Copy link
Owner

mikf commented Dec 22, 2024

I'm asking logical reason, not technical.

There is no special extractor for imagechest direct links since there's no extra metadata to be extracted, so they get handled by the generic directlink extractor.

imgur and redgifs direct links include post IDs and allow for the full set of metadata to be extracted, hence they get handled by site-specific extractors.

@dajotim937
Copy link
Author

Idea: Use a directlink URL's domain (imgchest.com in this case) as subcategory, so the domain == 'cdn.imgchest.com' check can be simplified.

Well, specifically for this case it would be better. But it general it just would make config a little bit prettier, because users probably wouldn't have more domains in directlink extractor config (in my case this one domain would be the only domain that I need to set, so no problem to make simple check without this feature).

Idea 2: Support subcategories for parent extractor categories, e.g. reddit:user>directlink, so reddit_metadata['subcategory'] == 'user' can be simplified.

This one probably would be better in general for app because users would be able to split easier.

As for me, I don't have many checks and splits by subcategory>subextractor, so couple checks don't really stand out in my config. I will refactor my config and change from 'reddit_metadata' in locals() in few subextractor to reddit>subextractor and it would be fine for me.

Also quick question. If I have config:

"*some_extractor*": {
 "filename": "..."
 "postprocessors": [...]
}
"reddit>*some_extractor*"{
  "filename": "..."
}

Will be postprocessor from *some_extractor* triggered from reddit>*some_extractor*?

@mikf
Copy link
Owner

mikf commented Dec 22, 2024

Will be postprocessor from *some_extractor* triggered from reddit>*some_extractor*?

It will. All settings from *some_extractor* still apply, unless overwritten by a setting from reddit>*some_extractor*.

@dajotim937
Copy link
Author

Thanks. All right then. Feel free to close this issue.

@dajotim937
Copy link
Author

dajotim937 commented Dec 24, 2024

Also question. Sorry, if I should create new issue.

Is it possible to trigger both extractors archive instead of subextractor archive?
In case link from first post, gallery-dl triggers directlink archieve instead of reddit one.
Which is good, because in case of reddit there can be many submissions with same link. But since main link is reddit it would be nice also write to its archive.

Unless default format for reddit archive doesn't contains submission id. In that case nevermind.

@mikf
Copy link
Owner

mikf commented Dec 24, 2024

Archives currently record only downloaded files. Since the Reddit post does not directly contain a file, it can not and does not get recorded. The archive would also not be checked before processing the Reddit post.

@dajotim937
Copy link
Author

Sorry that I'm still writing to closed issue.
NSFW: https://www.reddit.com/r/HentaiSource/comments/12w28df/lf_color_source_1girl_1boy_grandma_white_hair_big/

This one fall under CatboxFileExtractor instead of direct link, when there is no difference between extracted metadata from link from first post:

 gallery-dl.exe https://cdn.imgchest.com/files/w7w6cw5o8gy.jpg -K
Keywords for directory names:
-----------------------------
category
  directlink
domain
  cdn.imgchest.com
extension
  jpg
filename
  w7w6cw5o8gy
fragment
  None
path
  files
query
  None
subcategory


Keywords for filenames and --filter:
------------------------------------
category
  directlink
domain
  cdn.imgchest.com
extension
  jpg
filename
  w7w6cw5o8gy
fragment
  None
path
  files
query
  None
subcategory

and this one:

gallery-dl.exe https://files.catbox.moe/0ovtiw.jpg -K
                                                                                                                       Keywords for directory names:
-----------------------------
category
  catbox
extension
  jpg
filename
  0ovtiw
subcategory
  file
url
  https://files.catbox.moe/0ovtiw.jpg

Keywords for filenames and --filter:
------------------------------------
category
  catbox
extension
  jpg
filename
  0ovtiw
subcategory
  file
url
  https://files.catbox.moe/0ovtiw.jpg

They both are direct links, they both have same metadata fields but one fall under specific extractor, and other one under general extractor, when, in my opinion, there shouldn't be differences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants