Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How to put metadata in subdirectory #520

Closed
Defrost4528 opened this issue Dec 19, 2019 · 8 comments
Closed

[Question] How to put metadata in subdirectory #520

Defrost4528 opened this issue Dec 19, 2019 · 8 comments

Comments

@Defrost4528
Copy link

Hello once again! Been making good use of your program for the past months, the addition of tweet content into the metadata really has been a live saver.

Anyways, I've been trying to figure out the best way to have the metadata file to be put into a subdirectory is. After tinkering with the postprocessors and other config settings with no luck, I eventually just made an external script to crawl all the folders and move all JSON files into a "metadata" subfolder. While it works, it takes a while since I have 800+ folders, and it incurs a lot of useless HDD reads for most of the metadata that has already been moved.

So, here is my question: What's the best way this can be accomplished in-program?

Thanks!

@Hrxn
Copy link
Contributor

Hrxn commented Dec 19, 2019

Not sure if I understand that correctly...

What do you mean by "accomplished in-program"? Like by using the exec postprocessor?
(See here: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#execcommand)

But honestly, I think you've already described the optimal solution here, using a little script. Should be easy enough with Bash, and is definitely easy with PowerShell, which I would use (because I know that by heart). 800 directories does not sound that much to me, to be honest, I don't understand why that should take a lot of time. Unless maybe those JSON files you are moving into a sub-directory are actually kinda huge?

@mikf
Copy link
Owner

mikf commented Dec 19, 2019

Using the exec post processor is actually a really good idea. You can use the final option to run a script after all downloads etc. are done, so it shouldn't cause too much overhead. Something like:

"postprocessors": [
    {
        "name": "metadata",
        …
    },
    {
        "name": "exec",
        "final": true,
        "command": "mkdir -p {}/metadata && mv -t {}/metadata -- *.json"
    }
]

Unless maybe those JSON files you are moving into a sub-directory are actually kinda huge?

Moving files within a filesystem costs (almost) nothing, so filesize shouldn't be an issue. Going though 800 directories on the other hand can take some time, if stat needs to be called for all directories and files inside of them. I have a folder with 6k sub-directories and it takes ages to go though them all …

@Defrost4528
Copy link
Author

By "accomplished in-program", I meant within gallery-dl and its configuration, without having to execute something else manually/externally.

Part of why my script takes a while is probably because I have it scan for all of the images/videos, which then checks if there is a json file associated to move it to the subfolder. So whether or not it finds a json, it'll go through each and every media file. Useful cause it lets me log which files don't have metadata. Not efficient for simple moving though, but frankly it was the easiest way for me to avoid jumping into the metadata folder and making another metadata subfolder when i was scanning the folders recursively (though there are probably more elegant ways).

Anyways, thanks for the input from both of you. The exec processor does seems to be the best way to do this. I somehow missed this as a possible solution. I did run into some errors trying to set it up, but after updating gallery-dl I've got it to work. This is what I came up with:

"postprocessors":
[
	{
		"name": "metadata",
		"mode": "json",
		"extension": "json"
	},
	{
		"name": "exec",
		"command": "(if not exist {}metadata mkdir {}metadata) & (if exist {}*.json move /y {}*.json {}metadata)",
		"final": true
	}
],

It seems to work very well so far, but I'll refrain from closing this until I'm absolutely sure. Thank you for the solution!

@Defrost4528
Copy link
Author

Zero issues through my testing. When a single item fails though, the final command does not execute, but that's the expected result based on the "after all files have been downloaded successfully", and I'm not bothered by it. Thanks again!

@mikf
Copy link
Owner

mikf commented Jan 5, 2020

Concerning "in-program" solutions: There is now a directory option for the metadata post processor, which might be preferable to running an external script.

@wilhelmloof
Copy link

wilhelmloof commented May 3, 2024

I also wanted to move all metadata files to a subfolder, but using a one-liner to store all metadata files per download job in a subfolder (called '_metadata'). I made two solutions, a classic one with --exec and a modern one -P and -O. It was quite tricky and I could not find any other solution online. The few examples that existed used config files.

Classic version with --exec

Inspired by @mikf's reply above.

gallery-dl --write-metadata --exec "mkdir -p {_directory}_metadata && gmv -t {_directory}_metadata -- {}.json" <URL>

A little bit more stylish, that first checks if the metadata folder already exists and if so doesn't run mkdir:

gallery-dl --write-metadata --exec "[ ! -d {_directory}_metadata ] && mkdir -p {_directory}_metadata; gmv -t {_directory}_metadata -- {}.json" <URL>

MacOS's native mv command doesn't support -t, so instead I used gmv from coreutils installed with Homebrew. To move all .json files per folder including previously downloaded json-files, replace {}.json with {_directory}*.json.

Modern version with -P and -O

gallery-dl --write-metadata -P metadata -O 'directory'='_metadata' <URL>

Syntax according to gallery-dl's man page -O, --postprocessor-option KEY=VALUE and key from the postprocessor-options documentation.

To only download metadata files, add --no-download -o skip=false to any of the commands.

--
Adding @mikf to this. Can you have a look and see if everything looks ok?

@mikf
Copy link
Owner

mikf commented May 9, 2024

gallery-dl --write-metadata -P metadata -O 'directory'='_metadata' <URL>
  • --write-metadata and -P metadata are basically the same
  • no need for quotes around 'directory'='_metadata'
gallery-dl -P metadata -O directory=_metadata <URL>

@wilhelmloof
Copy link

Thanks for clariginying @mikf, especially since the documentation regarding this is not crystal clear.

So -P metadata implies --write-metadata, and thus --write-metadata can be omitted when -P metadata is used.

Gallery-dl is a fantastic tool, I just wish that the documentation was a bit more extensive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants