-
-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request: Patreon skip duplicate files && implement a similar function than "chapter-range" #590
Comments
Assuming you are currently using something like
Speaking of, this post processor is supposed to compare already downloaded versions of a file with a potentially new one, and replace or enumerate it in case it changed, maybe because the extractor in question got improved and now provides higher quality images. That's why you can't (or at least shouldn't) skip already downloaded files, otherwise you couldn't compare old with new. |
Oh this sort of works, but it still downloads duplicate files, which I wish was a way to avoid, I know that the postprocessor only works on the first session, but usually that's enough because the duplicated files tend to happen per post, I know this will keep duplicate files posted at different times on duplicate posts but still is a way better alternative than downloading the same file twice per session. |
Turns out all download URLs have a hash digest in them:
109f6c8 uses those to (hopefully) filter and ignore duplicates. It also restructures the way files are extracted by quite a bit, so I would appreciate it if you could test if everything works as it should. (I don't have any patreon subscriptions on my own and am relying on creators that have their stuff available for free) |
Hello! thank you so very much for your work on this I really appreciate the work on my request! So I ran this with 1.13.0-dev
And all the tests ran with the same behaviour as the last stable version, so not changes. Let me know if I can adjust the config file to try something else! and thanks once again for your time |
Hmm, 109f6c8 should at least solve "Scenario 1)", i.e. it shouldn't download duplicates anymore. Are you sure this didn't change? I've added some debug logging messages for duplicate files in b9c574b. Could you try downloading from a post with duplicate files in post/body and attachments while using And for "Scenario 2)", you should either change the filename format string to either include You definitely shouldn't be using the |
I updated the master again with the same command and running 1.13.0-dev
There is another unrelated problem, I can't seem to be able to run the -v flag, I doesn't do anything on the lastest master, I did tried running the same command on my laptop that didnt had gallery-dl updated and it showed the expected verbose content but after upgrading to the master it stopped showing the verbose. |
Hello! so I know there are workarounds for this, but none of them are perfect
So what I want to do is to be able to download all files from a patreon but skip duplicates.
Scenario 1)
A lot of artists upload the pictures to the patreon gallery post/body and ALSO add them as attachments.
Scenario 2)
Sometimes the body of the post (aka content) contains several images all named 1.png (not sure if this is a global thing or something of the handful creators I follow)
The default behaviour seems to be that if a filename of the same exists it will skip it, the problem is that when it comes to scenario 1, it will download those files as duplicates for scenario 2 following the unique filename behaviour it will only download the first 1.png but skip the rest of the files even if they are different. so it will not download unique files.
So to fix that I made my configuration file so uses the compare postprocessors, which to work requires skip = false, and part = true. And on the compare.action I use the "enumerate" setting.
Now this is perfect as I get to keep the original filename, it skips duplicated posts, ensuring that the user downloads 100% of the files, it will rename the files with conflicting filename and it will skip duplicate files.
The problem is that setting skip to false, will prevent the archive from working, so next time I run the program it will start enumerating new duplicates over the already existing files.
I think perhaps if the skip/compare function could be uniquely merged for the patreon extractor, it could allow for the download of the files without sacrificing the archiving functionality, it would be very handful.
Also I don't know how difficult it would be to implement this, but if there could be a way to implement chapter-range to patreon so it only downloads from the newest post, instead of going through the whole thing. range works, but it's unreliable as some galleries contain more pictures than others.
Thanks so much!
The text was updated successfully, but these errors were encountered: