Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions, Feedback, Suggestions #5 #6582

Open
mikf opened this issue Dec 1, 2024 · 46 comments
Open

Questions, Feedback, Suggestions #5 #6582

mikf opened this issue Dec 1, 2024 · 46 comments

Comments

@mikf
Copy link
Owner

mikf commented Dec 1, 2024

Continuation of the previous issue as a central place for any sort of question or suggestion not deserving their own separate issue.

Links to older issues: #11, #74, #146, #5262.

@SpiffyChatterbox
Copy link

Any news or thoughts on version 2.0? Anything we can do to help?

@noshii117
Copy link

is Facebook supported? whenever I try to download from a page using this

gallery-dl --cookies-from-browser firefox https://www.facebook.com/pagenamehere

it says Unsupported URL

@Hrxn
Copy link
Contributor

Hrxn commented Dec 5, 2024

@noshii117 Yes, it should be, make sure you are actually running the latest version of gallery-dl..

@biggestsonicfan
Copy link

Got a bit of a pickle. I use gallery-dl inside a WSL instance. Not usually a problem because I can point to a mounting path. However, I've come across an instance where I need to use yt-dlp with gallery-dl, and I don't know how I'd pass my Window's browser cookies to yt-dlp in the .gallery-dl.conf file.

@mikf
Copy link
Owner Author

mikf commented Dec 6, 2024

@biggestsonicfan
Not entirely sure if this works in WSL, but try exporting them as Netscape cookies.txt and pass this file's path via cmdline-args or raw-options.

            "cmdline-args": [
                "--cookies", "C:/path/to/cookies.txt"
            ],
            
            "raw-options": {
                "cookiefile": "C:/path/to/cookies.txt"
            }

When using yt-dlp as downloader, it can directly use gallery-dl's cookies via forward-cookies.

@SpiffyChatterbox
I'll try to start working on it with the start of 2025, but no promises.

@nightbrd
Copy link

nightbrd commented Dec 9, 2024

Hey there, is there any way to change how an enumeration index is put into a file name when using "skip": "enumerate"? eg. instead of file.1.ext, turn it into file (1).ext?

@mikf
Copy link
Owner Author

mikf commented Dec 9, 2024

@nightbrd
There isn't. The format for "skip": "enumerate" is currently hardcoded.

@Hrxn
Copy link
Contributor

Hrxn commented Dec 9, 2024

@nightbrd You can simply use on of the bajillion already existing tools for renaming files, or write a shell script on your own.

As long as the names are consistent, you can easily turn something like file.1.ext into file (1).ext

@noshii117
Copy link

@noshii117 Yes, it should be, make sure you are actually running the latest version of gallery-dl..

very late but, yes, it's the first thing I did and it's the same result.

@biggestsonicfan
Copy link

biggestsonicfan commented Dec 14, 2024

@mikf

When using yt-dlp as downloader, it can directly use gallery-dl's cookies via forward-cookies.

Tried about everything at this point and I am unsure if it's a race condition or something is just not getting passed correctly, but please see attached verbose output.
verbose-ytdlp.txt

The link itself is a publicly shared video on patreon.

EDIT: Not gonna double post, but I will ask: Is there a way I can run my own automated tests to see if my configuration will give me the results I want instead of constantly jumping back and forth between my config file? I'm still trying to figure out how to dump my json metadata into "paid-posts", "unpaid-posts", and categorizing the metadata into the corresponding cost tier. I feel like it shouldn't be this hard but I am ashamed to have as spent as much time as I have trying to get it to work.

@Hrxn
Copy link
Contributor

Hrxn commented Dec 14, 2024

Is there a way I can run my own automated tests to see if my configuration will give me the results I want instead of constantly jumping back and forth between my config file? I'm still trying to figure out how to dump my json metadata into "paid-posts", "unpaid-posts", and categorizing the metadata into the corresponding cost tier. I feel like it shouldn't be this hard but I am ashamed to have as spent as much time as I have trying to get it to work.

Not sure what kind of automated testing exactly you mean here, but I think (have not really tried this for myself, to be honest, but I want to as well, because it seems very useful to me) gallery-dl has these --print options since a couple of months, basically, and they should be able to help with everything related to conditional formatting options etc.

I'm not sure if --print implies something like --simulate, but if not it's not a problem to use them together.

@roastme
Copy link

roastme commented Dec 15, 2024

Is artfight.net supported?

@biggestsonicfan
Copy link

biggestsonicfan commented Dec 15, 2024

I'm not sure if --print implies something like --simulate, but if not it's not a problem to use them together.

I actually can't get --print to work with --simulate but I think I can use it with --no-download for now, and I do think print will give me the output of what I want. Thanks much!

EDIT: I feel so stupid. Ever since this post, I thought I needed to use locals().get in the filter. Instead isRestricted == False and isRestricted == True are just working a treat.

@rsn-yk
Copy link

rsn-yk commented Dec 21, 2024

Is it possible to store the Deviation UUID for each download?

As you can only download 10,000 images it would help if I could un-favourite some to download more. To un-fave them I could use curl - https://www.deviantart.com/developers/console/collections/collections_unfave/af7303d7e9023da0bbd6df11c2f38728.

Even the Deviant Art site fails to display more than 10,000 itself - it shows I have 1700 pages, but stops at 417 (at 24 items per page that's 10,008 images).

@mikf
Copy link
Owner Author

mikf commented Dec 21, 2024

@rsn-yk
You could use --print-to-file file:deviationid FILENAME or a custom metadata post processor in general to write the UUID of each downloaded file to FILENAME. An archive with {deviationid} as archive-format might also work.

@roastme
No, see docs/supportedsites.

@mikf
Copy link
Owner Author

mikf commented Dec 21, 2024

I'm not sure if --print implies something like --simulate, but if not it's not a problem to use them together.

--print does not imply any other options. It is implemented as a metadata post processor with "filename": "-", and therefore works only for the default DownloadJob. --simulate (SimulationJob), --get-urls (UrlJob), or any other jobs don't run post processors.

@mikf
Copy link
Owner Author

mikf commented Dec 21, 2024

@biggestsonicfan

Tried about everything at this point and I am unsure if it's a race condition or something is just not getting passed correctly, but please see attached verbose output.

For whatever reason your link downloads without any problems for me:
https://gist.githubusercontent.com/mikf/7b18358c40e4a8051651c14605fbaae1/raw/05b82e50f4acaa29fc47794857be3613f8720c38/patreon_ytdlp_video.log

Maybe because I'm not passing a session_id cookie, so it doesn't do this extra HEAD request before invoking yt-dlp:

urllib3.connectionpool: https://www.patreon.com:443 "HEAD /api/video/387527528/video.m3u8 HTTP/11" 302 0
urllib3.connectionpool: Starting new HTTPS connection (1): stream.mux.com:443
urllib3.connectionpool: https://stream.mux.com:443 "HEAD /nimWUWbI5FNH0200g7tzpKbNFd01en1dSk9Mvj7S9WdpRM.m3u8?token=eyJhbGciOiJSUzI1NiIsImtpZCI6Ik5CY3o3Sk5RcUNmdDdWcmo5MWhra2lEY3Vyc2xtRGNmSU1oSFUzallZMDI0IiwidHlwIjoiSldUIn0.eyJzdWIiOiJuaW1XVVdiSTVGTkgwMjAwZzd0enBLYk5GZDAxZW4xZFNrOU12ajdTOVdkcFJNIiwiZXhwIjoxNzM0MzAwMDAwLCJhdWQiOiJ2IiwicGxheWJhY2tfcmVzdHJpY3Rpb25faWQiOiJGUVExRFZGS2dTZEtKSnFQTVg1T1ZBdk0wMnYyZk9vVll5UnVjV1hGUlVUdyJ9.OV3XcWjfz8U5PZyScwoKu18AdIR8fydlPL0BolC9ikCQzFp2UgDqGYPyFep-5vai_CdCpk24TJVUyXEXNqIaBOQ23SxxQucoN17Pk_92FXEGfCAYsrEeAzxEoE7pOb1qTkTAn65CuVo1URv5K9p2dTTjc8X7YKNb1tOpUwUDKNyDdbaAo8Ah_1e4G-APVYEpUfzBIH4QqcK2xHFBxMGtk2Mr4rDkBi0nyf26WUhFJRZojDdTXM0jbfHfHCBgAMd2FbMZYs82jjpsE61hTHpObGo4iZ6uNuEsAPa-Wby-AoiNwYMQ_uQCFDiIB_QEx4Up7k_GZJSIYOhVHovhtl6PFQ HTTP/11" 200 0

It is also not forwarding gallery-dl cookies to yt-dlp for you. Try it with -o forward-cookies=1

[downloader.ytdl][debug] Forwarding cookies to yt_dlp.YoutubeDL

Is there a way I can run my own automated tests to see if my configuration will give me the results I want

If your results include post processor files, then --simulate is not an option.

Maybe specifying a different base-directory (-d), disabling archives, changing any absolute paths, and then just letting gallery-dl run would be suitable: -d "/tmp/" -o archive= -O archive=

Ever since this post, I thought I needed to use locals().get in the filter. Instead isRestricted == False and isRestricted == True are just working a treat.

locals().get('isRestricted') should work, it will return None when isRestricted is not defined, but you get its actual value if it is defined.

There have been some changes to how accessing undefined variables in filters work in general, see filters-environment. Instead of raising a NameError exception, it now silently evaluates any filter expression as false whenever it would raise an exception.

isRestricted == True and isRestricted == False

I'd recommend isRestricted and not isRestricted instead as those are more general.

@Hrxn
Copy link
Contributor

Hrxn commented Dec 21, 2024

Okay, so for the record, when doing something like "debugging" your conditional naming settings used in your config, you probably want to use --print together with --no-download

@Wiiplay123
Copy link
Contributor

Is there a way to make the current extractor abort only if it encounters X amount of threads with no new posts? It's downloading them in order from old to new, so it has to go through all the old posts in a thread before finding any new ones.

@mikf
Copy link
Owner Author

mikf commented Dec 21, 2024

@Wiiplay123
There is -A / --abort:

  -A, --abort N               Stop current extractor run after N consecutive
                              file downloads were skipped

Depending on the site and if it provides date metadata, you could also use something like the following to stop when encountering a file before 2024-12-01:

--filter "date >= datetime(2024, 12, 1) or abort()"

@Wiiplay123
Copy link
Contributor

I use that for other cases, but it doesn't work here because it's encountering the old files first.
Basically, I want it to act like this:

Thread 1 Post 1 (Old)
Thread 1 Post 2 (New)
Thread 2 Post 1 (Old)
Thread 2 Post 2 (Old)
(Abort here)
Thread 3 Post 1 (Old)

@biggestsonicfan
Copy link

biggestsonicfan commented Dec 21, 2024

@mikf

locals().get('isRestricted') should work, it will return None when isRestricted is not defined, but you get its actual value if it is defined.

That was my issue, it always returned None, which the filter refused to evaluate so I couldn't get conditional postprocessors to run at all.

It is also not forwarding gallery-dl cookies to yt-dlp for you. Try it with -o forward-cookies=1

That did the trick! So I will use gallery-dl patreon.com/home -o forward-cookies=1 from now on!

EDIT: Actually, is there a global forward-cookies I can use for the extractor config in .gallery-dl.conf?

@mikf
Copy link
Owner Author

mikf commented Dec 21, 2024

@biggestsonicfan
forward-cookies is a ytdl downloader option. You can enable it there.

{
    "downloader": {
        "ytdl": {
            "forward-cookies": true
        }
    }
}

Actually, forward-cookies is enabled by default since v1.28.0 so you probably have it disabled somewhere in your config file.

@mikf
Copy link
Owner Author

mikf commented Dec 22, 2024

@Wiiplay123
If I understand your problem correctly, it is possible to achieve something like this with a bunch of python post processors. The following will stop after processing 3 (THREAD_MAX) threads without new files:

config.json

{
    "extractor": {
        "postprocessors": [
            {
                "name": "python",
                "event": "init",
                "function": "/tmp/chan.py:thread_init"
            },
            {
                "name": "python",
                "event": "finalize",
                "function": "/tmp/chan.py:thread_done"
            },
            {
                "name": "python",
                "event": "file",
                "function": "/tmp/chan.py:reset"
            }
        ]
    }
}

chan.py

from gallery_dl import exception

THREAD_MAX = 3
THREAD_CNT = 0


def reset(_):
    global THREAD_CNT
    THREAD_CNT = 0

def thread_init(_):
    global THREAD_CNT
    THREAD_CNT += 1

def thread_done(_):
    if THREAD_CNT >= THREAD_MAX:
        print("DONE")
        raise exception.TerminateExtraction()

@Wiiplay123
Copy link
Contributor

Took me a while to get around to trying it, just made a few changes and it works perfectly! Makes the extractor go a LOT faster.

I added a second reset for a "metadata" event that I added to metadata.py that runs every time metadata is written without skipping, to account for text-only posts. I upgraded a couple of the extractors to work better with text posts. I'll push the changes to my repo when I have time.

@Nightmare-Serene

This comment was marked as spam.

@Hrxn
Copy link
Contributor

Hrxn commented Dec 29, 2024

^ #6721 please don't post (even more) spam here

@biggestsonicfan
Copy link

Actually, forward-cookies is enabled by default since v1.28.0 so you probably have it disabled somewhere in your config file.

Ding ding ding! I did, in my downloader settings, oops! All fixed!

@WyohKnott
Copy link
Contributor

[SubscribeStar.adult] Embeds and attachments not downloading

^ #6721 please don't post (even more) spam here

Fixing it in #6758. It needs a review by owner/contributors.

@tisfyx
Copy link

tisfyx commented Jan 3, 2025

I have access to a danbooru instance that is running on a custom domain.
Is there a way to use gallery-dl to download from it? Right now it says [gallery-dl][error] Unsupported URL which makes sense, but i'm pretty sure just using the danbooru extractor on it should work if i could figure out how to.

@mikf
Copy link
Owner Author

mikf commented Jan 3, 2025

@tisfyx
Prefix its URLs with Danbooru: or add an entry for Danbooru to your config file as outlined here: #1658

@tisfyx
Copy link

tisfyx commented Jan 3, 2025

That worked, thank you for the quick reply!

@OutshineIssue
Copy link

Can someone help me write a script that opens the downloaded image? If one image is downloaded, it should open that image; if multiple images are downloaded, it should open the containing folder. If it's not possible let me know.

@WyohKnott
Copy link
Contributor

WyohKnott commented Jan 4, 2025

Can someone help me write a script that opens the downloaded image? If one image is downloaded, it should open that image; if multiple images are downloaded, it should open the containing folder. If it's not possible let me know.

Use a postprocessor in your config file:

            "postprocessors": [
                {
                    "name": "exec",
                    "event": "post-after",
                    "command": "mpv --loop-playlist=inf --image-display-duration=5 {_directory}"
                }

or for example

                {
                    "name": "exec",
                    "event": "post-after",
                    "command": "gwenview --slideshow {_directory}"
                }
            ]

The event post-after happens after all files have been downloaded. You can pass either 3 parameters :

  • {_path} for the full path to the last file downloaded
  • {_directory} for the path to the directory where files have been downloaded
  • {_filename} for only the filename of the last file downloaded

@WyohKnott
Copy link
Contributor

Is there a proper way to add multiple formatter in filenames?

I wanna do somehing like {content!H!g[:120]} but I have an error : FilenameFormatError: Applying filename format string failed (ValueError: expected ':' after conversion specifier)

@mikf
Copy link
Owner Author

mikf commented Jan 5, 2025

Use the C format specifier:

{content:CHg/[:120]}

@purple5pumpkin235
Copy link

If I want to install using pipx on Ubuntu 24.04, is this the correct install command?:
pipx install gallery-dl

@WyohKnott
Copy link
Contributor

If I want to install using pipx on Ubuntu 24.04, is this the correct install command?: pipx install gallery-dl

this is not officially supported, but you can do it that way, yes.

@OutshineIssue
Copy link

OutshineIssue commented Jan 8, 2025

Can someone help me write a script that opens the downloaded image? If one image is downloaded, it should open that image; if multiple images are downloaded, it should open the containing folder. If it's not possible let me know.

Use a postprocessor in your config file:

            "postprocessors": [
                {
                    "name": "exec",
                    "event": "post-after",
                    "command": "mpv --loop-playlist=inf --image-display-duration=5 {_directory}"
                }

or for example

                {
                    "name": "exec",
                    "event": "post-after",
                    "command": "gwenview --slideshow {_directory}"
                }
            ]

The event post-after happens after all files have been downloaded. You can pass either 3 parameters :

* `{_path}` for the full path to the last file downloaded

* `{_directory}` for the path to the directory where files have been downloaded

* `{_filename}` for only the filename of the last file downloaded

@WyohKnott Here's the code I put together based on yours and some information I found, but I'm running into an error: "'exec' initialization failed: KeyError: 'command'".

{
	"extractor": {
		"instagram": {
			"postprocessors": ["exec"]
		}
	},
	"postprocessor":  {
            "name": "exec",
            "event": "post-after",
            "command": "explorer.exe {_directory}"
        }
}

@mikf
Copy link
Owner Author

mikf commented Jan 8, 2025

@OutshineIssue

{
    "extractor": {
        "instagram": {
            "postprocessors": ["exec-explorer"]
        }
    },

    "postprocessor":  {
        "exec-explorer": {
            "name"   : "exec",
            "event"  : "post-after",
            "command": "explorer.exe {_directory}"
        }
    }
}

@baodrate
Copy link

re-raising this suggestion (#5262 (comment)) since it might have been missed the first time (feel free to shoot it down though)

could it be allowed that the default config be in toml? so the user does not have to specify --config-toml FILE on the command line every time?

i.e. add to gallery_dl.config._default_configs:

* `${XDG_CONFIG_HOME}/gallery-dl/config.toml`

* `~/.config/gallery-dl/config.toml`

(And it would probably make sense to also add the equivalent yaml paths)

@ghbook
Copy link

ghbook commented Jan 12, 2025

Hi @mikf , How to use g: or general extractor with input txt file like here gallery-dl -i <txtfile>.

and is it possible to define postprocessors in extractor file.

@mikf
Copy link
Owner Author

mikf commented Jan 19, 2025

@baodrate
I'd like to avoid adding even more possible config file paths to the list if possible, but I guess two more wouldn't be that bad.

so the user does not have to specify --config-toml FILE on the command line every time

What about creating an alias that includes --config-toml?

alias gallery-dl='gallery-dl --config-ignore --config-toml FILE'

@ghbook
Either prefix all URLs in <txtfile> with g: or generic:,

or disable all extractor modules except generic and enable it to be used for all otherwise unsupported URLs:

gallery-dl -o extractor.modules=generic -o extractor.generic.enabled=1 -i <txtfile>

extractor file

What do you mean by that? A file given by --input-file?

@arisboch
Copy link

How do I download all the replies to a Bluesky post made by the post's author themselves, I can't even manage to download all the replies, here's the relevant config section:

        "bluesky":
        {
        	"filename": "bluesky {author['handle']} {post_id} {num}.{extension}",
        	"directory": ["bluesky {author['handle']} {post_id}"],
        	"include": ["posts", "replies", "media"],
			"metadata": true,
			"reposts": true,
			"quoted": true
        },

@ghbook
Copy link

ghbook commented Jan 22, 2025

and is it possible to define postprocessors in extractor file.

extractor file

What do you mean by that? A file given by --input-file?

Its an another question, not related to input file. I was talking about .py file like reddit.py in extractor folder. I never seen postprocessors defined in .py file along with directory_fmt, filename_fmt properties, Its always defined in config.json file. So I have been thinking if its possible to define in class or init method. Any example would be helpful.

Also last question, are there any helper methods to get real directory, filename in items method. I need to check if the file already exists in one of extractor before making a request in items method. given URL already has all the keys required. Reason being api rate limit per day.

@mikf
Copy link
Owner Author

mikf commented Jan 24, 2025

@arisboch
For post URLs like https://bsky.app/profile/mikf.bsky.social/post/3l46q5glfex27, you can use the depth option to get replies and --filter "user['did'] == author['did']" to filter out any from users other than the post's author.

gallery-dl -o depth=5 -o metadata=1 --filter "user['did'] == author['did']" https://bsky.app/profile/mikf.bsky.social/post/3l46q5glfex27
    "depth": 50,
    "metadata": true,
    "image-filter": "user['did'] == author['did']"

For "timeline" URLs like https://bsky.app/profile/bsky.app, this is not supported yet.


@ghbook
An extractor's only job is data extraction. It has no concept of a file system, files, directories, etc. and doesn't care how its extracted data is eventually used. There is no builtin way to specify default post processors or to access the paths where files are downloaded to. You should be able to modify the code to add a reference of the current job object to an extractor and check files and access paths using the Job's internals, but this is "officially" not supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests