Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save more info in --download-archive #1299

Open
TestPolygon opened this issue Feb 6, 2021 · 8 comments
Open

Save more info in --download-archive #1299

TestPolygon opened this issue Feb 6, 2021 · 8 comments

Comments

@TestPolygon
Copy link

I would like to save more info in --download-archive SQLite DB.
Or probably it makes sense to add an additional SQLite DB with the new argument like --info-archive or --metadata-archive.

I want to save the description and tags of the work (other info I can save in filename), and I don't like how --write-metadata works. It generate too much files. And in fact it contains a lof of unnecessary information.

For example, for Pixiv I can save user[id]}, {id}, {user[name]}, {date:%Y.%m.%d}, {title} in filename and last-modified as file's mtime.
However I can't save the description and tags. --write-metadata is overkill. And these json files make chaos in the folder.
I think, the best decidion of this — one file (database) with the selected information (columns for each {category}). In order to work with it programatically later.

@Hrxn
Copy link
Contributor

Hrxn commented Feb 9, 2021

Well, you also can work programmatically with JSON files in folder, might even be easier.

@TestPolygon
Copy link
Author

TestPolygon commented Feb 10, 2021

Yes, I can write a script what I would run after gallery-dl to concatenate all JSONs into one single file and then remove them.
But it would be my special decision suited only for me.

And this is the problem. --write-metadata is the thing intended only for programmers.

There is no way for users to collect the additional information in user friendly way. --write-metadata creates tons of JSONs that are hard to read manually, that add the chaos in the folder by their existence.

I assume to collect the info in a DB what can be viewed like a usual excel file, with https://sqlitebrowser.org/.
So all data would be in a table that can be parsed visually by a human.

For example, in case of Pixiv I expect to see a table for each artist where there are columns: id, title, creating date... description, all tags and so on.
It's really useful thing to have the ability to easily check the descriptions of all downloaded artworks because of they can contain links to additional materials or maybe just an interesting text that a user want to read.

The other example. A booru. A user have downloaded some works. And decided after some time to upload them to another booru site.
Okay, he has the images... but tags. Oops, there are no tags. He should go to the original site to copy the image's tags for each, damn, image. What is about if the site is down?

So, this functional should allow to users to collect the additional information just by adding a keyword and specifing that info they want to save.

But I think it would be really nice if by default to define some default fields for each extractor that should be saved by just adding the keyword at executing this program. (without the manual configuring of the setting file)

@TestPolygon
Copy link
Author

TestPolygon commented Feb 26, 2021

I found something interesting here
postprocessors.metadata
It allows to extract only required fields from --write-metadata JSON. Also it can look nice if you will save the data to a HTML file.
For example:

"postprocessors": [{
        "name": "metadata",
        "mode": "custom",
	"extension": "html",
        "format": "<h4>ID: {id}</h4>\n<br>\n{caption}\n<hr>\n{tags}\n<hr>"
}],

The result:

image

@TestPolygon
Copy link
Author

TestPolygon commented Feb 26, 2021

But it still creates tons of files among the downloaded images.
Well, if I specify "directory": "metadata", all these files will be in separate folder, but this folder will be near the downloaded files.


Request 1 (Metadata root folder)

I think it would be useful, if it would be possible to specify a root folder for metadata and --write-metadata files.
In order to have a seperate folder with all metadata files in one place.

For example, if I download an image to
WORK_DIR/gallery-dl/pixiv-50306023/84677043_p0.jpg
The metadata file should be saved in:
METADATA_DIR/gallery-dl/pixiv-50306023/84677043_p0.jpg.json

With this setting you can always use --write-metadata and do not care about these JSONs while you do not need them.


Request 2 (Array quotes)

Also I think it would be more appropriate if {tags} (in postprocessors) use " instead of ' in order to it be a valid JSON.


Request 3 (Translated tags [pixiv])

Add the support of the translated tags.

https://www.pixiv.net/ajax/illust/84677043
image

image


With Req 2 + Req 3:
Instead of
['鉛筆', 'ドローイング', '落書き', 'pencildrawing', 'モノクロ', 'drawing', '人物', 'アナログ', 'portrait', 'sketch']
I should get (for example, {translated_tags})
["pencil", "drawing", "doodle", "pencildrawing", "lack&white", "drawing", "character", "traditional", "portrait", "sketch"]

@mikf
Copy link
Owner

mikf commented Feb 26, 2021

Request 1 (Metadata root folder)

directory can be an absolute path. Or, when using it as relative path, you can go several levels down first: "directory": "../../../metadata""

Request 2 (Array quotes)

Replace {tags} with [\"{tags:J\", \"}\"] as a workaround.
The __repr__of a string in Python always uses single quotes.

@TestPolygon
Copy link
Author

TestPolygon commented Feb 26, 2021

directory can be an absolute path.

Request 4 (C:%HOMEPATH% in directory)

This "directory": "C:%HOMEPATH%/metadata", just creates ./%HOMEPATH%/.
If there is no workaround I think I would nice to replace C:%HOMEPATH% with the correct home dir path.

UPD.
I have seen that some guys used "cookies": "%HOMEPATH%/...'.
But this (without C:) does not work, I get: [pixiv][error] Unable to download data: OSError: [Errno 22] Invalid argument: ...


Also this settings does not apply to --write-metadata. However it's probably not so important, I can just save all info with
"mode": "json",

@TestPolygon
Copy link
Author

TestPolygon commented Sep 14, 2021

Technitically you can just save the entire JSONs in DB.

For good representation it's possible to use JSON1 extension to create virtual tables with the required fields based on the main table with JSONs.

It's easy to implement. Just save JSONs in a BD now. For example, with --write-metadata-db.

The virtual tables for the representation can be created later. All information is already stored in one place.

Also the virtual tables can be modified/changed multiple times without any problem.

@AyHa1810
Copy link

AyHa1810 commented Feb 24, 2024

Maybe you could add something like this in your config.json and use a custom DB file instead of what the archive option makes?

"pixiv": {
    "postprocessors": [{
        "name": "exec",
        "command": [
            "sqlite3",
            "~/gallery-dl/pixiv/pixiv.sqlite3",
            "INSERT OR REPLACE INTO pixiv_gdl (filename, filepath, page_count, rating, tags, title, description) VALUES('{filename}.{extension}', '{user['id']} {user['account']}', '{page_count}', '{rating}', '{tags}', '{title}', '{caption}');"
        ]
    }]
}

where you manually create the pixiv.sqlite3 file with the following SQLite commands beforehand:

CREATE TABLE IF NOT EXISTS pixiv_gdl (
    filename varchar(64)   NOT NULL UNIQUE,
    filepath varchar(64)   NOT NULL,
    page_count integer     NOT NULL,
    rating   varchar(16)   NOT NULL,
    tags     varchar(1024) DEFAULT '[]',
    title    varchar(64),
    description varchar(3072)
);

just a suggestion :P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants