Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tags in filename for sankaku or other booru host #94

Closed
wankio opened this issue Jul 11, 2018 · 25 comments
Closed

tags in filename for sankaku or other booru host #94

wankio opened this issue Jul 11, 2018 · 25 comments

Comments

@wankio
Copy link
Contributor

wankio commented Jul 11, 2018

1, host_id_rawfilename
Can it change to host_id_tags ? Because i don't see option in config file and filename already limited to 255char ?

2, does it have link history to avoid dupplicate downloaded file ? like ripme

3, can it have filename format like below software ?
https://github.com/Nandaka/DanbooruDownloader

thank 👍

@mikf
Copy link
Owner

mikf commented Jul 11, 2018

1, host_id_rawfilename
Can it change to host_id_tags ? Because i don't see option in config file and filename already limited to 255char ?
...
3, can it have filename format like below software ?

You can configure the output filename and directory with the extractor.filename and extractor.directory options. To change the filename format for sankaku to "host_id_rawfilename", you would put something like this in your config file:

{
  "extractor": {
    "sankaku": {
      "filename": "{category}_{id}_{tags}.{extension}"
    }
  }
}

2, does it have link history to avoid dupplicate downloaded file ? like ripme

gallery-dl skips downloads for files that already exist and there is also the archive option (also available with the --download-archive command-line switch)

@wankio
Copy link
Contributor Author

wankio commented Jul 11, 2018

oh thank, i will try that :)

update :
in config.json
"sankaku":
{
"username": null,
"password": null,
"wait-min": 2.5,
"wait-max": 5.0,
"filename": "{category}{id}{tags}.{extension}"
},
it have Errno 22 Invalid argument

@mikf
Copy link
Owner

mikf commented Jul 11, 2018

The config snippet you posted looks fine and should work.

Could you post the whole output when you run gallery-dl with the --verbose option? It would be helpful to know where exactly this exception occurs.

@wankio
Copy link
Contributor Author

wankio commented Jul 11, 2018

I:\DOWNLOADS\Command tools>gallery-dl https://chan.sankakucomplex.com/?tags=chan_co --verbose
[gallery-dl][debug] Version 1.4.2
[gallery-dl][debug] Python 3.4.4 - Windows-10-10.0.17134
[gallery-dl][debug] requests 2.19.1 - urllib3 1.23
[gallery-dl][debug] Starting DownloadJob for 'https://chan.sankakucomplex.com/?tags=chan_co'
[sankaku][debug] Using SankakuTagExtractor for 'https://chan.sankakucomplex.com/?tags=chan_co'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): chan.sankakucomplex.com:443
[urllib3.connectionpool][debug] https://chan.sankakucomplex.com:443 "GET /?tags=chan_co&page=1 HTTP/1.1" 200 None
[urllib3.connectionpool][debug] https://chan.sankakucomplex.com:443 "GET /post/show/7024858 HTTP/1.1" 200 None
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): cs.sankakucomplex.com:443
[urllib3.connectionpool][debug] https://cs.sankakucomplex.com:443 "GET /data/64/bf/64bf0aa8829e737468e9a0a229ad0166.jpg?e=1531388877&m=Y6qa7KMsjcFbb6NBDTI6pQ HTTP/1.1" 200 695172
  .\gallery-dl\Chan.Sankaku\chan_co\7024858_2018-07-11 03... hair, white bikini, white gloves, white swimsuit, wink.jpg
[sankaku][error] Unable to download data: [Errno 22] Invalid argument: '\\\\?\\I:\\DOWNLOADS\\Command tools\\gallery-dl\\Chan.Sankaku\\chan_co\\7024858_2018-07-11 03_45_fate (series), fate_grand order, bb (fate), chan co, simple background, 1_1 aspect ratio, 1girl, asymmetrical hair, bangs, bikini, black choker, breasts, choker, clavicle, cleavage, ;d, eyebrows visible through hair, female, front-tie bikini, front-tie top, gloves, hair ornament, hair ribbon, hand on hip, hand up, large breasts, long hair, long sleeves, looking at viewer, megane, navel, one eye closed, open mouth, pointer, ponytail, purple eyes, purple hair, red ribbon, ribbon, rimless eyewear, side ponytail, side-tie bikini, smile, solo, star, swimsuit, tied hair, very long hair, white bikini, white gloves, white swimsuit, wink.jpg.part'

My new config

"filename": "{id}_{created_at}_{tags}.{extension}",
            "directory":["Chan.Sankaku","{search_tags}"],
            "archive": "./gallery-dl/archive-chan.sankaku.sqlite3"

@mikf
Copy link
Owner

mikf commented Jul 11, 2018

OK, that filename is way too long (670 characters) and there is currently, as also noted in #92, no way to prevent that.

I guess too long filenames could just be cut short to fit into the 255 character limit, but a more configurable approach (like string slicing for format string replacement fields) would be nice as well. I'll think of something ...

And, by the way: Python, at least on Linux, recognizes long filenames: OSError: [Errno 36] File name too long, so I wasn't quite sure how this error came to be. But on Windows you either get [Errno 2] No such file or directory or [Errno 22] Invalid argument.

@wankio
Copy link
Contributor Author

wankio commented Jul 11, 2018

that's what i'm thinking...filename too long because we can't limited how many tags can be add in filename...anyway , thank :)

and can it support these format too ?

- %provider% 	= provider Name
- %id% 		= Image ID
- %tags% 	= Image Tags
- %rating% 	= Image Rating
- %md5% 	= MD5 Hash
- %artist% 	= Artist Tag
- %copyright% 	= Copyright Tag
- %character% 	= Character Tag
- %circle% 	= Circle Tag, yande.re extension
- %faults% 	= Faults Tag, yande.re extension
- %originalFilename% = Original Filename
- %searchtag% 	= Search tag

@mikf
Copy link
Owner

mikf commented Jul 11, 2018

All of these fields are already available, but under different names.

  • %provider% -> {category}
  • %id% -> {id}
  • %tags% -> {tags} (or {tag_string} on danbooru)
  • %rating% -> {score}
  • %originalFilename% -> {name}.{extension}

and so on. The exact names depend on the booru board in question, as gallery-dl is just using the API responses without much modification. Take a look at the output with -K to get a complete list of replacement field names.

To enable {tags_artist}, {tags_character} and so on, you need to set extractor.*.tags to true.

@wankio
Copy link
Contributor Author

wankio commented Jul 12, 2018

  1. so after you add option to prevent long filename, i just need add tags:true in extractor {sankaku..} to enable artist/character ?

  2. Can gallery-dl use this search_tags ? : [tags]+date:<=yyyy.mm.dd
    because after 1000result downloaded you can't download anymore...so you need add +date:<=yyyy.mm.dd after tag to have download more 1000result. yyyy.mm.dd is created_at i think so

to compare with danbooru downloader and other, i think gallery-dl is better
1 - low memory usage (i think because it only use one thread instead multi-thread)
2 - archive (skipped downloaded id) (ripme have it but danbooru downloader and other dont)
3 - bunch download from pastebin (ripme have rip from clipbloard, danbooro downloader dont have)

mikf added a commit that referenced this issue Jul 14, 2018
@mikf
Copy link
Owner

mikf commented Jul 14, 2018

so after you add option to prevent long filename, i just need add tags:true in extractor {sankaku..} to enable artist/character ?

Yes, but it would be easier to enable this option for all boorus by just setting extractor.tags to true. Otherwise you would have to enable it for each site individually, i.e. extractor.sankaku.tags, extractor.gelbooru.tags, and so on.

Concerning filename lengths: you can now (since 8fe9056) slice values in format strings.
{tags[:200]} would limit it to 200 characters max - everything after that will be cut off.

Can gallery-dl use this search_tags ? : [tags]+date:<=yyyy.mm.dd
because after 1000result downloaded you can't download anymore...so you need add +date:<=yyyy.mm.dd after tag to have download more 1000result. yyyy.mm.dd is created_at i think so

It can, but that's not necessary if you want to go past 1000 results / page 50. You don't even need to provide username and password if you want to go past page 25. Being logged in only lets you use more than 5 tags at once and allows you to jump to higher page numbers faster (with --range 800-, for example)

@ghost
Copy link

ghost commented Jul 21, 2018

[danbooru][error] An unexpected error occurred: AttributeError - 'list' object has no attribute 'startswith'.

Edit: this is my first post here. am i doing it right?

@mikf
Copy link
Owner

mikf commented Jul 21, 2018

You should open a new issue, post the URL in question and, if possible, the complete error output with --verbose.

@wankio
Copy link
Contributor Author

wankio commented Jul 22, 2018

ok i will test it soon :)

@wankio
Copy link
Contributor Author

wankio commented Jul 29, 2018

Yes, but it would be easier to enable this option for all boorus by just setting extractor.tags to true. Otherwise you would have to enable it for each site individually, i.e. extractor.sankaku.tags, extractor.gelbooru.tags, and so on.

Concerning filename lengths: you can now (since 8fe9056) slice values in format strings.
{tags[:200]} would limit it to 200 characters max - everything after that will be cut off.

     "sankaku":
        {
            "username": null,
            "password": null,
            "wait-min": 2.5,
            "wait-max": 5.0,
            "filename": "{tags_artist}_{tags[:200]}_{id}_{created_at}_.{extension}",
            "directory":["Chan.Sankaku","{search_tags}"],
            "tags": true      
        },

[sankaku][error] Applying filename format string failed: TypeError: string indices must be integers

When i'm even not set {tags} ..gallery-dl still only set filename as {id}{created_at}.{extension} instead {tags_artist}{id}{created_at}_.{extension}

It can, but that's not necessary if you want to go past 1000 results / page 50. You don't even need to provide username and password if you want to go past page 25. Being logged in only lets you use more than 5 tags at once and allows you to jump to higher page numbers faster (with --range 800-, for example)

so if i input tags have higher than 1000result, it will keep downloading until have nothing to download ?

@mikf
Copy link
Owner

mikf commented Jul 29, 2018

[sankaku][error] Applying filename format string failed: TypeError: string indices must be integers

When i'm even not set {tags} ..gallery-dl still only set filename as {id}{created_at}.{extension} instead {tags_artist}{id}{created_at}_.{extension}

You are using version 1.4.2 and not the latest git snapshot. The {tags[:200]} thing and the tags option for sankaku hasn't been "officially" released yet. Do a pip install --upgrade https://github.com/mikf/gallery-dl/archive/master.zip and try again.

so if i input tags have higher than 1000result, it will keep downloading until have nothing to download ?

Yes, it only stops after downloading all search results, but you can set a custom upper limit with, again, the --range option.

@wankio
Copy link
Contributor Author

wankio commented Jul 29, 2018

oh nice ty, installed python version and it worked :)

i just test host local file and using r:link to batch download, wow it work too :)

mikf added a commit that referenced this issue Jul 29, 2018
The L option allows for the contents of a format field to be replaced
with <replacement> if its length is greater than <maxlen>.

Example:
{f:L5/too long/} -> "foo"      (if "f" is "foo")
                 -> "too long" (if "f" is "foobar")

(#92) (#94)
@mikf
Copy link
Owner

mikf commented Jul 29, 2018

host local file and using r:link to batch download

  -i, --input-file FILE     Download URLs found in FILE

And to quote myself from the other issue:
You can now use the L format specifier to set a replacement if the format field value is too long. For example {tags:L100/too many tags/} (e0dd8df).

@mikf mikf closed this as completed Jul 29, 2018
@wankio
Copy link
Contributor Author

wankio commented Jul 29, 2018

thank, so i need update gallery-dl again ?

@mikf
Copy link
Owner

mikf commented Jul 29, 2018

Only if you want to use the L format specifier feature.

@wankio
Copy link
Contributor Author

wankio commented Aug 1, 2018

  • oh today it stop working in 3 hours....no error, just stop download. Command Window still processing but it dont download any new link in 3hours (checked website, still no error)

  • and with archive option in sankaku extractor, why i feel it so slow to check downloaded link. Wait Min/Max 2/5 but sometime it wait 8-10 or maybe 20+ seconds to just check files

@mikf
Copy link
Owner

mikf commented Aug 1, 2018

oh today it stop working in 3 hours....no error, just stop download.

Hmm, there is a slim possibility that a HTTP requests "gets stuck" and the client waits forever for a reply from the remote server. Some HTTP requests send by gallery-dl - for some reason - don't have a timeout, so it probably happened with one of those. Fixing this should be easy. In the meantime: Ctrl+c and try again.

why i feel it so slow to check downloaded link

Because it has to get download URL and metadata before it can check if a file has already been downloaded (same as youtube-dl). It doesn't help that Sankaku is incredibly slow itself, so you have to wait 2-5 seconds before each HTTP request (to avoid 429 Too Many Requests errors) and then you have to wait for the request itself to finish, which might take another 5 seconds.

When using sakaku stuff, you should really use the --range command-line option when necessary, as it allows the extractor to quickly jump ahead. gallery-dl --range 250- URL... is going to immediately jump to image nr. 250 and start from there.

@wankio
Copy link
Contributor Author

wankio commented Aug 2, 2018

yeah...it's easy to fix with --range you told me

Being logged in only lets you use more than 5 tags at once and allows you to jump to higher page numbers faster (with --range 800-, for example)

5tags at once, you mean 5 tags combined : ?tags=dynasty_warriors brown_hair china_dress female shoes right ?

Because it has to get download URL and metadata before it can check if a file has already been downloaded (same as youtube-dl). It doesn't help that Sankaku is incredibly slow itself, so you have to wait 2-5 seconds before each HTTP request (to avoid 429 Too Many Requests errors) and then you have to wait for the request itself to finish, which might take another 5 seconds.

sometime it wait 15-20seconds is normal ?

When using sakaku stuff, you should really use the --range command-line option when necessary, as it allows the extractor to quickly jump ahead. gallery-dl --range 250- URL... is going to immediately jump to image nr. 250 and start from there.

so i need to count downloaded files and compare with tags(totalresult) to know exactly range i need to put in right ?

It should have feature to skipped tags once it reach downloaded files (so it just only download newer pictures and stopped once it reach downloaded files if extractor archive option enabled)

@mikf
Copy link
Owner

mikf commented Aug 2, 2018

yeah...it's easy to fix with --range you told me

That is not what I meant. I wanted to say "It's easy for me to add a timeout to regular HTTP requests, so it doesn't get stuck anymore" -> 68d6033

5tags at once, you mean 5 tags combined : ?tags=dynasty_warriors brown_hair china_dress female shoes right ?

Right.

sometime it wait 15-20seconds is normal ?

Not really, no. I might be the case that the wait-min/-max default values are too low and you get 429 Too Many Requests responses from sankaku. In that case gallery-dl retries the original request after waiting for a bit, but it can take quite a bit of time until sankaku sends a normal response.

You can enable verbose output (-v) to see what goes on behind the scenes. If you encounter anything 429 related, increase wait-min/-max until this doesn't happen anymore.

so i need to count downloaded files and compare with tags(totalresult) to know exactly range i need to put in right ?

Your computer can count them for you:
d
... and you don't need the exact range, the start index is enough.

--range 200-300 will download anything from 200 to 300, but you can omit the end index (--range 200-) to download from 200 to the end or the start index to download up to 300 (--range -300).

It should have feature to skipped tags once it reach downloaded files (so it just only download newer pictures and stopped once it reach downloaded files if extractor archive option enabled)

  --abort-on-skip           Abort extractor run if a file download would
                            normally be skipped, i.e. if a file with the same
                            filename already exists

or the extractor.skip option

@wankio
Copy link
Contributor Author

wankio commented Aug 5, 2018

thank you

  • can we have a option to download sample if original file dimension is too big, depend on width or height ?
  • some files have 9000-10000px width , if we limit maximum width(3000 maybe) it will download sample instead

@mikf
Copy link
Owner

mikf commented Aug 5, 2018

Not going to happen.
You can download the original and then down-sample it yourself, or ignore it with --filter.

You should also open a new issue if you want to suggest a new feature. This one here is closed for a reason.

@wankio
Copy link
Contributor Author

wankio commented Aug 5, 2018

ok thank

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants