Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[deviantart] Changed keywords, or API query? #35

Closed
Hrxn opened this issue Aug 11, 2017 · 9 comments
Closed

[deviantart] Changed keywords, or API query? #35

Hrxn opened this issue Aug 11, 2017 · 9 comments

Comments

@Hrxn
Copy link
Contributor

Hrxn commented Aug 11, 2017

OS: Windows 10 x64 [Version 10.0.15063]
Python: 3.6.1 amd64
gallery-dl: git master

PS F:\> gallery-dl --verbose "http://bentanart.deviantart.com/favourites/"
[gallery-dl][debug] Starting DownloadJob for 'http://bentanart.deviantart.com/favourites/'
[deviantart][debug] Using DeviantartFavoriteExtractor for http://bentanart.deviantart.com/favourites/
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.deviantart.com
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/collections/folders?username=bentanart&offset=0&limit=50&mature_content=true HTTP/1.1" 200 274
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/collections/3A2F3B70-6714-52AF-8D16-2BAD70BB6809?username=bentanart&offset=0&limit=24&mature_content=true HTTP/1.1" 200 6971
[deviantart][error] An unexpected error occurred: KeyError - 'collection'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[deviantart][debug] Traceback
Traceback (most recent call last):
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\job.py", line 45, in run
    self.dispatch(msg)
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\job.py", line 78, in dispatch
    self.handle_directory(msg[1])
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\job.py", line 130, in handle_directory
    self.pathfmt.set_directory(keywords)
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\util.py", line 188, in set_directory
    for segment in self.directory_fmt
  File "c:\users\hrxn\appdata\local\programs\python\python36\lib\site-packages\gallery_dl\util.py", line 188, in <listcomp>
    for segment in self.directory_fmt
KeyError: 'collection'
PS F:\>

(Same error for different .deviantart.com/favourites URLs)

Error appears when setting directory, i.e. in extractor.deviantart.favorite.directory

I tested it with --ignore-config, which seemed to work.

So it has to be something in my config, here the part for DeviantArt:

"deviantart":
        {
            "gallery":
            {
                "directory": ["DeviantArt", "Galleries", "{author[username]}"],
                "filename": "{index}_{title}.{extension}"
            },
            "favorite":
            {
                "directory": ["DeviantArt", "Favorites", "{collection[owner]}", "{collection[title]}"],
                "filename": "{index}_{title}_by_{author[username]}.{extension}"
            },
            "deviation":
            {
                "directory": ["DeviantArt", "Deviations"],
                "filename": "{index}_{title}_by_{author[username]}-({author[urlname]}).{extension}"
            },
            "folder":
            {
                "directory": ["DeviantArt", "Folders", "{folder[owner]}", "{folder[title]}"],
                "filename": "{index}_{title}_by_{author[username]}.{extension}"
            },
            "collection":
            {
                "directory": ["DeviantArt", "Collections", "{collection[owner]}", "{collection[title]}"],
                "filename": "{index}_{title}_by_{author[username]}.{extension}"
            },
            "mature": true
        },

(Without the part for Journals...)

Not sure if I get this right..

Well, the Favorites-subextractor should be used, and that's the same folder structure setting I've used in the past.

Not sure. Did something change with their API? Or did some recent commit changed the endpoint used, and I missed it somehow?

@Hrxn Hrxn changed the title [deviantart] [deviantart] Changed keywords, or API query? Aug 11, 2017
@mikf
Copy link
Owner

mikf commented Aug 11, 2017

You should either change the favorite directory-format to ["DeviantArt", "Favorites", "{username}"] or set the flat option to false to use the collection extractor.

This change happened during the time we talked about default paths and general consistency in #26 (af9bd17).
The collection extractor deals with single favorite-collections and the favorite extractor with all favorites from all collections, which it downloads either into one directory controlled by favorite.directory, or it transfers its work to the collections extractor, which then uses collection.directory.
It therefore didn't really make sense to have a collection-named dictionary in the metadata of the favorite extractor, as this one doesn't distinguish between different collection-folders and all useful information is alread contained in the username value. (The collection key is manually added and doesn't depend on anything API related)

I can re-add this key if you think it is useful, but {collection[title]} and {collection[index]} will always be "All" and 0, at least when using the favorite extractor with flat set to true.

@Hrxn
Copy link
Contributor Author

Hrxn commented Aug 12, 2017

Ah, that delegating thing again...

I guess I didn't notice this issue earlier in this case because all I've used gallery-dl with recently were group URLs on DeviantArt.

The old behaviour of the favorites sub-extractor is now used in the collection sub-extractor? Or do I misremember something here? And what about that certain /all API endpoint? Somehow being gallery only? Guess I'm getting old.

I can re-add this key if you think it is useful, but {collection[title]} and {collection[index]} will always be "All" and 0, at least when using the favorite extractor with flat set to true.

Nah, I think this wouldn't make any sense.

The collection extractor deals with single favorite-collections and the favorite extractor with all favorites from all collections, which it downloads either into one directory controlled by favorite.directory, or it transfers its work to the collections extractor, which then uses collection.directory.

Yes, this is what's determined by flat being set to false (latter) or true (former)?

(The collection key is manually added and doesn't depend on anything API related)

In which case gets anything added manually now?


Well, I have an idea here, I'm trying to describe what I actually want to achieve:

I have a DeviantArt root directory (as part of the base-directory, obviously), and then a directory for every sub-category there, as currently specified by my config above:

  • Galleries
  • Favorites
  • Deviations
  • Folders
  • Collections

Which does not really make any sense, when I think about it now.
Two things to consider:
Are 'Collections' always used in conjunction with 'Favorites'?
Are 'Folders' always used in conjunction with 'Galleries'?

Because then they actually belong inside their respective counterparts, and I change my directory settings accordingly.

And, because I couldn't wrap my mind around using flat= false yet: Is there anything else I need to be aware of, or are there any other downsides in using it?

Edit:

A little background info for better understanding: Of course I've thought about using flat, but two things kept me from adding it into my config so far: Many accounts on DeviantArt don't use folders/collections at all, or only very sporadically, so using this option would not make much sense. And secondly, some accounts are sprinkled with folders/collections that barely contain anything, and I strongly want to avoid ending up with directories in my local tree here that only contain very few items, if at all..

@mikf
Copy link
Owner

mikf commented Aug 12, 2017

The old behaviour of the favorites sub-extractor is now used in the collection sub-extractor? Or do I misremember something here? And what about that certain /all API endpoint? Somehow being gallery only? Guess I'm getting old.

The old behavior of the favorites extractor did indeed change. It got split in two:

The old favorite extractor used to combine these two functions, but this newer structure produces better code and has nice similarities with how galleries are handled:

(I hope you can see the similarities between Gallery- and FavoriteExtractor and their respective "smaller" versions as Folder- and CollectionExtractor)

Are 'Collections' always used in conjunction with 'Favorites'?
Are 'Folders' always used in conjunction with 'Galleries'?

No, they aren't. The Gallery- and FolderExtractors only use their "smaller versions" if the flat option is set to false or if it's a group-gallery; otherwise they just produce a flat list of all images without using any other extractors.

Because then they actually belong inside their respective counterparts, and I change my directory settings accordingly.

Well, they kind of do belong together. The default paths put collections/folders into the same directory as Favorite-/Gallery-Extractors do.

And, because I couldn't wrap my mind around using flat= false yet: Is there anything else I need to be aware of, or are there any other downsides in using it?

As previously noted: two collections/folders can contain the same image twice, so you would be downloading duplicate images. I don't think there are any other downsides.

I should state that their is no need to use the flat option, but you talked about preserving the folder-structure of a gallery and this option basically allows you to do just that. Instead of adding this to your config, you could also use -o as command-line argument to enable/disable the option for special cases: gallery-dl -o flat=false http://rosuuri.deviantart.com/.

@Hrxn
Copy link
Contributor Author

Hrxn commented Aug 14, 2017

Are 'Collections' always used in conjunction with 'Favorites'?
Are 'Folders' always used in conjunction with 'Galleries'?

No, they aren't. The Gallery- and FolderExtractors only use their "smaller versions" if the flat option is set to false or if it's a group-gallery; otherwise they just produce a flat list of all images without using any other extractors.

Yes, I got that, this was more about how DeviantArt itself handles this distinction. Like, for example, when browsing, interacting or whatever with the website, you will only encounter 'collections' if you're looking at a 'favorites' section, and 'folders' only as part of a gallery etc., and these two never get mixed. In short, if DeviantArt is consistent here in this regard.

As previously noted: two collections/folders can contain the same image twice, so you would be downloading duplicate images. I don't think there are any other downsides.

Right, I remember. But duplicates are less of a problem, in my opinion, because no matter the OS, there are more than enough tools to deal with that. I have not done that yet, but it should be perfectly possible to use such a program and turn duplicate files into symlinks or hardlinks or whatever to clean up wasted storage capacity while keeping the entire functionality of gallery-dl, making it possible to skip files it already found.

I should have been more clear with my question, I was more concerned about really getting all possible items to be honest, just like the/all endpoint does (Well, as the name indicates, I assume. I'm aware that this itself does not guarantee anything, but well, you get my point).

I should state that their is no need to use the flat option, but you talked about preserving the folder-structure of a gallery and this option basically allows you to do just that. Instead of adding this to your config, you could also use -o as command-line argument to enable/disable the option for special cases: gallery-dl -o flat=false http://rosuuri.deviantart.com/.

Yes, that's the thing. Many accounts on DeviantArt don't use a proper structure at all, some do but only inconsequentially, and some few actually have a good folder structure, keeping this would make sense here. The really interesting question is, would it be possible to use -o flat=false (or any other option) together with the input file option -i, or would that end up in changing too much of the code? This would enable using different options for each URL in the input batch list, and not just for DeviantArt, for all other sites where it would make any sense.

Sorry if this is a bit too much of a "but muh workflow!!" question 😄

@mikf
Copy link
Owner

mikf commented Aug 14, 2017

... you will only encounter 'collections' if you're looking at a 'favorites' section, and 'folders' only as part of a gallery etc., and these two never get mixed. In short, if DeviantArt is consistent here in this regard.

I'm pretty sure that it is. There is the /gallery/folders and /collections/folders API endpoint and the output of both of them always correlate with what you see on the corresponding "Gallery" and "Favourites" pages on DeviantArt itself. It is also impossible to use a gallery-folder ID as a collection ID and vice versa, so both of them are handled as separate entities.

I should have been more clear with my question, I was more concerned about really getting all possible items to be honest.

The GalleryExtractor using gallery/all definitively gets all images in a user's gallery (your previous tests pretty much confirmed that, didn't they), but going through all gallery-folders surprisingly doesn't give the same result:

# using gallery/all
$ gallery-dl -g http://rosuuri.deviantart.com/ | sort | uniq | wc -l
386
# going through all gallery-folders
$ gallery-dl -o flat=false -gg http://rosuuri.deviantart.com/ | sort | uniq | wc -l
366

There are for some reason 20 images missing, even though the total amount of returned URLs is ca. 750. Using flat=false might not be the ideal way - currently. It might not be easy to implement (meaning I have currently no idea how to), but it might be possible to somehow get the missing 20 and still have a folder-structure.

Concerning Favourites: there is sadly no favourites/all so going through all collection-folders is all there is. I don't know if this method has the same issue of missing some items as it had for gallery-folders, but there is no easy way to test that and no other way getting these images (except manually crawling the DeviantArt website)

The really interesting question is, would it be possible to use -o flat=false (or any other option) together with the input file option -i, or would that end up in changing too much of the code? This would enable using different options for each URL in the input batch list, and not just for DeviantArt, for all other sites where it would make any sense.

That sounds like an interesting proposition, but you can do something like this manually:

  • group your input file according to the options they need
  • write a shell-/batch-script that calls gallery-dl multiple times for each input-file/options combination
gallery-dl -i files01 -c config01.cfg
gallery-dl -i files02 -c config02.cfg
gallery-dl -i files03 -o opt1=val1 -o opt2=val2
...

If that doesn't fit your needs, open another issue for that and we can discuss it there.

Sorry if this is a bit too much of a "but muh workflow!!" question 😄

No, that's fine. Don't worry about it.

@Hrxn
Copy link
Contributor Author

Hrxn commented Aug 14, 2017

[..] It is also impossible to use a gallery-folder ID as a collection ID and vice versa, so both of them are handled as separate entities.

Yes, exactly this, good to know. Thanks.

The GalleryExtractor using gallery/all definitively gets all images in a user's gallery (your previous tests pretty much confirmed that, didn't they), but going through all gallery-folders surprisingly doesn't give the same result:

Yes, that is true. From what I've seen so far, the result of good ol' GalleryExtractor always matched the number of items shown on the stats page (<user>.deviantart.com/stats/gallery/).

Too bad that the folder extraction method doesn't always return the same result, although I have a suspicion on the reason why that might be. Presumably, there is only one base gallery for every account, every submission by the user to the site ends up there, giving a linear or flat gallery of all deviation-items. The user can now create folders, grouping together single items at will, and present these sets together to the visitors on the site. So, you can basically think of gallery folders as some sort of presentation layer on top of the full user gallery. Basically just a view into the gallery, and the folder analogy doesn't hold up here in comparison with a file system, for example. The difference in resulting items we can observe now simply comes from the deviation-items that did not yet have the fortune of being selected for presentation by the user in that specific way.

Concerning Favourites: there is sadly no favourites/all so going through all collection-folders is all there is. I don't know if this method has the same issue of missing some items as it had for gallery-folders, but there is no easy way to test that and no other way getting these images (except manually crawling the DeviantArt website)

Yes, agreed. On the other hand, we've already seen this discrepancy in the results of items from Favorites in the earlier tests. I guess this is caused by the lack of a real favorites gallery that actually contains any items. A user on the website adds items to the account's favorites section by a simple click (or drag-and-drop), and what actually gets created is just a simple reference to the item in the gallery of the user account that uploaded it in the first place. If the original uploader now moves, deletes or restricts access in any other way to the deviation-item in question, the reference in the favorites section of our other user still exists and does not get updated, creating this mismatch. This behavior can be observed on many other websites, DeviantArt is not the only "offender" here.

That sounds like an interesting proposition, but you can do something like this manually:

  • group your input file according to the options they need
  • write a shell-/batch-script that calls gallery-dl multiple times for each input-file/options combination

Absolutely right, that would work just as well, and I've used this "trick" (well, actually it's not, just basic scripting) before, although not with gallery-dl, I think. This is just about increasing convenience a bit.
I can explain how I usually do this sort of stuff, even before I discovered the great gallery-dl.

  1. Surf the web
  2. Find something that intrigues me
  3. Put that URL into a queue file
    If I now want to add specific options, usually it's just setting the output path, I add them directly on the spot, avoiding the need to come back to any queue file later, days or possibly even more, and not remembering every single URL and having to open them again in a browser.

But as I said, this can be disregarded, because it really is just a minor convenience thing. All options can still be added in the same step, and all that is missing now can be done with simple text-editing, like just putting gallery-dl at the beginning of every line.
And that is the stuff I'm somewhat familiar with, dealing with texts and poking them with regular expressions. 😄

Sorry for my walls of text again, but I'm not sure about opening a new issue for this, because I'd consider it rather low priority. So, it's up to you, I guess. And I definitely don't know enough about Python to make any judgement about how much work would be involved in getting from here to there, that is, using -i to read from an input file, read every line from that file and use that as the URL parameter in the invocation of gallery-dl, and using that option to read line-by-line, not just assuming the URL part but also optional parameters, and do basically this for every line: gallery-dl [OPTIONS] URL.
If this can't be done in a simple and straightforward way, because the option/parameter parsing and checking suddenly gets a lot more complicated, then maybe better forget it for now.

Edit:

Not directly related, but might be relevant nonetheless:
https://danlev.deviantart.com/journal/DeviantArt-Is-Switching-To-HTTPS-697996906

@mikf
Copy link
Owner

mikf commented Aug 18, 2017

My initial ideas about this input-file feature were a lot more complicated and general, and involved JSON data structures and the like, which is why I suggested opening a separate issue ... I like over-complicating things, it seems.

Nonetheless, (re-)parsing command-line options at the point in the program-flow where results of the -i option are considered isn't really possible with how gallery-dl is structured right know, so I'm not going to implement this into gallery-dl itself, but you might want to check out this script: https://gist.github.com/mikf/b147134411e5295e5351f800a1cb6563

Give it an input-file as argument and it should do what you described. Possible input-file format would be, for example

URL -o flat=false
--ignore-config -c /tmp/cfg.txt URL2
# comment

URL3

You might need to adjust the "python" value of the GALLERYDL list to its full path.

@Hrxn
Copy link
Contributor Author

Hrxn commented Aug 21, 2017

Okay, I agree, then it's the best to forget this for now, up until some other more pressing use case comes up, eventually.

And the script you linked addresses this issue here just fine, so thank you for that!


I've now changed my config to this:

"deviantart":
        {
            "gallery":
            {
                "directory": ["DeviantArt", "Galleries", "{author[username]}"],
                "filename": "{index}_{title}.{extension}"
            },
            "favorite":
            {
                "directory": ["DeviantArt", "Favorites", "{username}"],
                "filename": "{index}_{title}_by_{author[username]}.{extension}"
            },
            "deviation":
            {
                "directory": ["DeviantArt", "Deviations", "{author[username]}"],
                "filename": "{index}_{title}_by_{author[username]}-({author[urlname]}).{extension}"
            },
            "folder":
            {
                "directory": ["DeviantArt", "Galleries", "{folder[owner]}", "Folders", "{folder[title]}"],
                "filename": "{index}_{title}.{extension}"
            },
            "collection":
            {
                "directory": ["DeviantArt", "Favorites", "{collection[owner]}", "Collections", "{collection[title]}"],
                "filename": "{index}_{title}_by_{author[username]}.{extension}"
            },
            "mature": true
        },

Maybe someone else will also see it as being of any use...

It relies on {username}, {author[username]}, {folder[owner]}, and {collection[owner]} always being the same in the end, but from what I've seen so far this seems to be the case.

It should be something like this as a result, (example) being a user account on DeviantArt.

X:\Test\DeviantArt
├───Deviations
│   └───(example)
├───Favorites
│   └───(example)
│       └───Collections
│           └───(collection)
└───Galleries
    └───(example)
        └───Folders
            └───(folder)

@Hrxn Hrxn closed this as completed Aug 21, 2017
@Hrxn
Copy link
Contributor Author

Hrxn commented Aug 24, 2017

Yeah, almost forgot:
{username}, {author[username]}, {folder[owner]}, and {collection[owner]} might not really work in the exact same fashion on all platforms.

As far as I know, always using {author[urlname]} instead of {author[username]} seems to be the better choice.

Just in case someone else stumbles upon this here...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants