Save more info in --download-archive #1299

TestPolygon · 2021-02-06T09:36:21Z

I would like to save more info in --download-archive SQLite DB.
Or probably it makes sense to add an additional SQLite DB with the new argument like --info-archive or --metadata-archive.

I want to save the description and tags of the work (other info I can save in filename), and I don't like how --write-metadata works. It generate too much files. And in fact it contains a lof of unnecessary information.

For example, for Pixiv I can save user[id]}, {id}, {user[name]}, {date:%Y.%m.%d}, {title} in filename and last-modified as file's mtime.
However I can't save the description and tags. --write-metadata is overkill. And these json files make chaos in the folder.
I think, the best decidion of this — one file (database) with the selected information (columns for each {category}). In order to work with it programatically later.

The text was updated successfully, but these errors were encountered:

Hrxn · 2021-02-09T16:02:06Z

Well, you also can work programmatically with JSON files in folder, might even be easier.

TestPolygon · 2021-02-10T18:47:02Z

Yes, I can write a script what I would run after gallery-dl to concatenate all JSONs into one single file and then remove them.
But it would be my special decision suited only for me.

And this is the problem. --write-metadata is the thing intended only for programmers.

There is no way for users to collect the additional information in user friendly way. --write-metadata creates tons of JSONs that are hard to read manually, that add the chaos in the folder by their existence.

I assume to collect the info in a DB what can be viewed like a usual excel file, with https://sqlitebrowser.org/.
So all data would be in a table that can be parsed visually by a human.

For example, in case of Pixiv I expect to see a table for each artist where there are columns: id, title, creating date... description, all tags and so on.
It's really useful thing to have the ability to easily check the descriptions of all downloaded artworks because of they can contain links to additional materials or maybe just an interesting text that a user want to read.

The other example. A booru. A user have downloaded some works. And decided after some time to upload them to another booru site.
Okay, he has the images... but tags. Oops, there are no tags. He should go to the original site to copy the image's tags for each, damn, image. What is about if the site is down?

So, this functional should allow to users to collect the additional information just by adding a keyword and specifing that info they want to save.

But I think it would be really nice if by default to define some default fields for each extractor that should be saved by just adding the keyword at executing this program. (without the manual configuring of the setting file)

TestPolygon · 2021-02-26T20:25:20Z

I found something interesting here
postprocessors.metadata
It allows to extract only required fields from --write-metadata JSON. Also it can look nice if you will save the data to a HTML file.
For example:

"postprocessors": [{
        "name": "metadata",
        "mode": "custom",
	"extension": "html",
        "format": "<h4>ID: {id}</h4>\n<br>\n{caption}\n<hr>\n{tags}\n<hr>"
}],

The result:

TestPolygon · 2021-02-26T20:25:29Z

But it still creates tons of files among the downloaded images.
Well, if I specify "directory": "metadata", all these files will be in separate folder, but this folder will be near the downloaded files.

Request 1 (Metadata root folder)

I think it would be useful, if it would be possible to specify a root folder for metadata and --write-metadata files.
In order to have a seperate folder with all metadata files in one place.

For example, if I download an image to
WORK_DIR/gallery-dl/pixiv-50306023/84677043_p0.jpg
The metadata file should be saved in:
METADATA_DIR/gallery-dl/pixiv-50306023/84677043_p0.jpg.json

With this setting you can always use --write-metadata and do not care about these JSONs while you do not need them.

Request 2 (Array quotes)

Also I think it would be more appropriate if {tags} (in postprocessors) use " instead of ' in order to it be a valid JSON.

Request 3 (Translated tags [pixiv])

Add the support of the translated tags.

https://www.pixiv.net/ajax/illust/84677043

With Req 2 + Req 3:
Instead of
['鉛筆', 'ドローイング', '落書き', 'pencildrawing', 'モノクロ', 'drawing', '人物', 'アナログ', 'portrait', 'sketch']
I should get (for example, {translated_tags})
["pencil", "drawing", "doodle", "pencildrawing", "lack&white", "drawing", "character", "traditional", "portrait", "sketch"]

mikf · 2021-02-26T21:33:12Z

Request 1 (Metadata root folder)

directory can be an absolute path. Or, when using it as relative path, you can go several levels down first: "directory": "../../../metadata""

Request 2 (Array quotes)

Replace {tags} with [\"{tags:J\", \"}\"] as a workaround.
The __repr__of a string in Python always uses single quotes.

TestPolygon · 2021-02-26T22:16:12Z

directory can be an absolute path.

Request 4 (`C:%HOMEPATH%` in `directory`)

This "directory": "C:%HOMEPATH%/metadata", just creates ./%HOMEPATH%/.
If there is no workaround I think I would nice to replace C:%HOMEPATH% with the correct home dir path.

UPD.
I have seen that some guys used "cookies": "%HOMEPATH%/...'.
But this (without C:) does not work, I get: [pixiv][error] Unable to download data: OSError: [Errno 22] Invalid argument: ...

Also this settings does not apply to --write-metadata. However it's probably not so important, I can just save all info with
"mode": "json",

(#1299)

TestPolygon · 2021-09-14T10:06:54Z

Technitically you can just save the entire JSONs in DB.

For good representation it's possible to use JSON1 extension to create virtual tables with the required fields based on the main table with JSONs.

It's easy to implement. Just save JSONs in a BD now. For example, with --write-metadata-db.

The virtual tables for the representation can be created later. All information is already stored in one place.

Also the virtual tables can be modified/changed multiple times without any problem.

AyHa1810 · 2024-02-24T18:24:10Z

Maybe you could add something like this in your config.json and use a custom DB file instead of what the archive option makes?

"pixiv": {
    "postprocessors": [{
        "name": "exec",
        "command": [
            "sqlite3",
            "~/gallery-dl/pixiv/pixiv.sqlite3",
            "INSERT OR REPLACE INTO pixiv_gdl (filename, filepath, page_count, rating, tags, title, description) VALUES('{filename}.{extension}', '{user['id']} {user['account']}', '{page_count}', '{rating}', '{tags}', '{title}', '{caption}');"
        ]
    }]
}

where you manually create the pixiv.sqlite3 file with the following SQLite commands beforehand:

CREATE TABLE IF NOT EXISTS pixiv_gdl (
    filename varchar(64)   NOT NULL UNIQUE,
    filepath varchar(64)   NOT NULL,
    page_count integer     NOT NULL,
    rating   varchar(16)   NOT NULL,
    tags     varchar(1024) DEFAULT '[]',
    title    varchar(64),
    description varchar(3072)
);

just a suggestion :P

mikf added the feature-request label Feb 10, 2021

mikf added a commit that referenced this issue Mar 1, 2021

[postprocessor:metadata] call expand_path() on custom paths

1bd3d7c

(#1299)

TestPolygon mentioned this issue Mar 4, 2021

[pixiv] Translated tags #1354

Closed

mikf mentioned this issue Jan 26, 2024

[Kemono] How to add "mode json" to an archive? #5100

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save more info in --download-archive #1299

Save more info in --download-archive #1299

TestPolygon commented Feb 6, 2021

Hrxn commented Feb 9, 2021

TestPolygon commented Feb 10, 2021 •

edited

Loading

TestPolygon commented Feb 26, 2021 •

edited

Loading

TestPolygon commented Feb 26, 2021 •

edited

Loading

mikf commented Feb 26, 2021

TestPolygon commented Feb 26, 2021 •

edited

Loading

TestPolygon commented Sep 14, 2021 •

edited

Loading

AyHa1810 commented Feb 24, 2024 •

edited

Loading

Save more info in --download-archive #1299

Save more info in --download-archive #1299

Comments

TestPolygon commented Feb 6, 2021

Hrxn commented Feb 9, 2021

TestPolygon commented Feb 10, 2021 • edited Loading

TestPolygon commented Feb 26, 2021 • edited Loading

TestPolygon commented Feb 26, 2021 • edited Loading

Request 1 (Metadata root folder)

Request 2 (Array quotes)

Request 3 (Translated tags [pixiv])

mikf commented Feb 26, 2021

TestPolygon commented Feb 26, 2021 • edited Loading

Request 4 (C:%HOMEPATH% in directory)

TestPolygon commented Sep 14, 2021 • edited Loading

AyHa1810 commented Feb 24, 2024 • edited Loading

TestPolygon commented Feb 10, 2021 •

edited

Loading

TestPolygon commented Feb 26, 2021 •

edited

Loading

TestPolygon commented Feb 26, 2021 •

edited

Loading

TestPolygon commented Feb 26, 2021 •

edited

Loading

Request 4 (`C:%HOMEPATH%` in `directory`)

TestPolygon commented Sep 14, 2021 •

edited

Loading

AyHa1810 commented Feb 24, 2024 •

edited

Loading