Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[deviantart] Add support for Groups galleries #26

Closed
Hrxn opened this issue Jul 3, 2017 · 23 comments
Closed

[deviantart] Add support for Groups galleries #26

Hrxn opened this issue Jul 3, 2017 · 23 comments

Comments

@Hrxn
Copy link
Contributor

Hrxn commented Jul 3, 2017

gallery-dl --version

0.9.1-dev

OS: Windows 10 CU x64
Python 3.6.1. x64

Found something on DeviantArt (again) 😄
Groups..

1: Home URL of a group
http://cgpinups.deviantart.com/

PS D:\Stuff> gallery-dl -v http://cgpinups.deviantart.com/
[gallery-dl][debug] Starting DownloadJob for 'http://cgpinups.deviantart.com/'
[deviantart][debug] Using DeviantartGalleryExtractor for http://cgpinups.deviantart.com/
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.deviantart.com
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/all?username=cgpinups&offset=0&limit=10&mature_content=true HTTP/1.1" 200 70
PS D:\Stuff>

2: Gallery URL of group
http://cgpinups.deviantart.com/gallery/

Same result as above.

3: Gallery Folders in the Group (The actual galleries, so to speak) (One example of them)
http://cgpinups.deviantart.com/gallery/25871850/Fantasy-and-Sci-Fi

[gallery-dl][error] No suitable extractor found for 'http://cgpinups.deviantart.com/gallery/25871850/Fantasy-and-Sci-Fi'

I think the question is how the API handles this stuff.
It doesn't make sense to me right now, but I guess this is related to the difference between Gallery and Favourites. If this is the same distinction by the API...

For example, a user's gallery has Gallery Folders (just as a group), while the user's favourites has collections.

Gallery folders also don't work.
Example:

PS D:\Stuff> gallery-dl "http://arsenixc.deviantart.com/gallery/11314091/Backgrounds"
[gallery-dl][error] No suitable extractor found for 'http://arsenixc.deviantart.com/gallery/11314091/Backgrounds'
PS D:\Stuff>

No point in having two separate issues here, I think. Depends on the API results, I guess.

@mikf
Copy link
Owner

mikf commented Jul 3, 2017

I've investigated this a bit and came to the conclusion that groups behave in exactly the same manner as regular users do, with one major exception: You can't get all Deviation-objects of a group in the same way as you can for a user, i.e. gallery/all returns nothing. Everything else works the same (gallery folders, favorites, journals). There also seems to be no way to differentiate between groups and users based on their URL or otherwise.

I only see two possible "solutions" for this, but I don't really like any of them. Maybe you have a better idea?

  1. Leave the GalleryExtractor as it is and therefore don't return a result for a group's gallery. Gallery folders should work just fine.
  2. Change the GalleryExtractor to get its results not by using gallery/all but by iterating over all the gallery folders, which works for groups as well as users. This might not even get all the images of a user all the time and it is definitively going to change the directory structure unless I change something there as well.

Another idea that comes to mind as I am writing this:

  • Check if the gallery/all result is empty:
    • No -> normal user
    • Yes -> possible group, do the thing mentioned in point 2.

It's just that checking for an empty result (in this case an iterator) is quite messy.

mikf added a commit that referenced this issue Jul 3, 2017
The code for this and the available metadata is probably going
to change again. This extractor is very similar to the favorite-
extractor, so they might be "combined" or something like that.
@Hrxn
Copy link
Contributor Author

Hrxn commented Jul 4, 2017

There also seems to be no way to differentiate between groups and users based on their URL or otherwise.

Good point. The URL scheme is exactly the same, and except for profile-related information, i.e. personal data and stats etc. they are basically the same. Minus this difference in gallery/all...

I noticed something else, but it's the same underlying problem, as far as I understand.

Using this profile as example again: http://arsenixc.deviantart.com/

And its Favourites and Gallery and what gallery-dl returns

  1. Favourites
    http://arsenixc.deviantart.com/favourites/
    -> Results in the collection "{collection[owner]} - {collection[title]}", i.e. (username) - Featured
    http://arsenixc.deviantart.com/favourites/?catpath=/
    -> Results in the collection "{collection[owner]} - {collection[title]}", i.e. (username) - All
  2. Gallery
    http://arsenixc.deviantart.com/gallery/
    -> Results in "{author[username]}", i.e. single directory (username) with all deviation-objects (hopefully)
    http://arsenixc.deviantart.com/gallery/?catpath=/
    -> Results in error [gallery-dl][error] No suitable extractor found for [...]

Little side note:
http://arsenixc.deviantart.com/ and http://arsenixc.deviantart.com/gallery/ result in the exact same behaviour, that is identical API query, but I guess this is by design, to improve the usability of gallery-dl a bit, right?

So when it comes to the API, the difference between user and group is pretty irrelevant, at least for the purposes of gallery-dl, but DeviantArt makes a distinction between Gallery and Collection and uses different API endpoints, /gallery/folders, /collections/folders, and /gallery/all, which only exists for Gallery and not Collection.

Well, at first sight, not relying on gallery/all seems to make sense, as long as the two other endpoints work in the same manner.

By the way, I'm pretty sure that I've seen it on a few profiles: A gallery folder that also contains folders.
What does the API return here, normally just an array of deviation-ids, right? So the structure below this first level is not preserved, right?
I will post an example link if I find such a profile again.

Another thing from the API documentation: GET /user/profile/{username}

I'm not sure, but couldn't this be used as a check to differentiate between users and groups?
And more, as listed in the section Parameters, you can optionally request ext_collections and ext_galleries I think, and as listed under Response, get an array of collection ids and gallery ids, maybe this helps?

@Hrxn
Copy link
Contributor Author

Hrxn commented Jul 4, 2017

Another thing, if I read it right /gallery/all always returns a flat array of deviation ids, hence any gallery structure, i.e. different folders in Gallery can never be preserved this way. Counts as contra, I'd say.

Okay, I think I have a small suggestion here:
We should first agree on reasonable defaults. What would the average gallery-dl user expect, what are his intentions, getting all content, some content, specific content?

It would be nice if some others could chime in here, to get some different opinions, maybe. Wouldn't do any harm at least.

Well, I'll start with the first point:

§1
Allow the distinction between an user's deviations and favorited deviations on their profile. Never get both.

This means that, in case our hypothetical user wants both, gallery-dl has to be fed twice. With different URLs.
Example:

gallery-dl http://arsenixc.deviantart.com/gallery/
gallery-dl http://arsenixc.deviantart.com/favourites/

mikf added a commit that referenced this issue Jul 6, 2017
They previously weren't supported for galleries and journals.

This also increases the 'limit' parameter for API calls to its
respective maximum.
@mikf
Copy link
Owner

mikf commented Jul 8, 2017

Missing support for ?catpath=/ URLs should be fixed.
The "wierd" behavior for user.deviantart.com is just a leftover from when the deviantart module only had 1 extractor and didn't even use the API. Back then one could only get all images of a user's gallery and providing this URL as "shortcut" seemed appropriate.

Well, at first sight, not relying on gallery/all seems to make sense, as long as the two other endpoints work in the same manner.

I tried this for a couple of users and just using the contents of their gallery-folders didn't work all the time. Without the "Featured" folder you can get less then their actual Deviation count; with the "Featured" folder you get more than that (duplicates, probably).
For a user with 300 Deviations I got 200 without and 400 with. So you either get only a subset or duplicates.

A gallery folder that also contains folders

I've found an example of that: http://rachychan.deviantart.com/gallery/46449774/Pokemon
This sub-folder has another folder in it. Seems that an account needs Core Membership to be able to create folders inside of folders.
It it technically possible to preserve the given folder structure: each folder object in an API response has a "parentid" member that refers to its parent folder, so one could reconstruct the whole directory tree. It's just that there is no good way of implementing variable-depth directory paths with gallery-dl's infrastructure.

Another thing from the API documentation: GET /user/profile/{username}

Doesn't work for groups:

Request
GET https://www.deviantart.com/api/v1/oauth2/user/profile/cgpinups

Result
{
    "error": "invalid_request",
    "error_description": "user not found.",
    "error_code": 2,
    "status": "error"
}

There are actually undocumented API endpoints for groups that don't really work: wix-incubator/DeviantArt-API#122

We should first agree on reasonable defaults

That, and consistent behavior, are quite important, I think. For that reason I tried to compile a list of all possible URLs for a Deviantart user:

# URL Status
1 user.deviantart.com/ Currently the same as (2). Should probably do all of (2), (7) and (10)
2 user.deviantart.com/gallery/ All deviations/images in [user]'s gallery
3 user.deviantart.com/gallery/?catpath=/ Same as (2).
4 user.deviantart.com/gallery/?catpath=/some/category/path No way of filtering these per API, might be possible to do the filtering manually
5 user.deviantart.com/gallery/?catpath=scraps Not possible to get these per API, as far as I can tell
6 user.deviantart.com/gallery/123/folder-name All deviations in this gallery folder
7 user.deviantart.com/favourites/ The contents of the "Featured" folder of [user]'s favorites. Inconsistent with (2) and (10), all of these should do the same.
8 user.deviantart.com/favourites/?catpath=/ All favorites of [user]. This is done by retrieving the contents of all favorite folders of that user. There is no actual API endpoint for this.
9 user.deviantart.com/favourites/123/folder-name All favorites in that specific folder/collection
10 user.deviantart.com/journal/ All of [user]'s journals
11 user.deviantart.com/blog/ Same as (10) (Groups use /blog, Users use /journal)
12 user.deviantart.com/journal/?catpath=/ Same as (10)
13 user.deviantart.com/journal/name-123 That specific journal entry. Should go in the same directory as (12)
14 user.deviantart.com/art/name-123 That specific deviation. Should go in the same directory as the items of (3)

My suggestion would be that (2), (7) and (10) provide the contents of all the available (sub)folders (just "Featured" for journals) and put them in appropriately named directories:

deviantart/
    gallery/
        Featured/
            image1.jpg
            ...
        Folder 1/
            image123.jpg
            ...
    favourites/
        Featured/
        Collection 1/
    journal/

(3), (8) and (12) would just produce a flat list:

deviantart/
    gallery/
        image1.jpg
        ...
    favourites/
        image123.jpg
        ...

Not quite sure what to do, but the way things currently work is not consistent and defies expectations quite a bit.
Another example of this are usernames: sometimes the value of author[username] gets used, which is included in the actual API response and may contain capital letters, and sometimes the collection[owner] value get used, which is just the lowercase username taken from the URL.

@Hrxn
Copy link
Contributor Author

Hrxn commented Jul 9, 2017

Another thing from the API documentation: GET /user/profile/{username}

Doesn't work for groups:

Not too surprising, probably. But as a check to differentiate between users and groups?
Instead of:

Check if the gallery/all result is empty: [...]
It's just that checking for an empty result (in this case an iterator) is quite messy.


But the most important part is (6) (All deviations in this gallery folder), as long as this also works on group folders, because that's probably all that is relevant here. And using multiple URLs for the distinct group folders isn't asking too much, I'd say.

Without the "Featured" folder you can get less then their actual Deviation count; with the "Featured" folder you get more than that (duplicates, probably).
For a user with 300 Deviations I got 200 without and 400 with. So you either get only a subset or duplicates.

Again, not too surprising, this is what I saw with some profiles on the web site. DeviantArt allows its users to create any folder structure and then to add any deviation-objects at will to each of them. Now using gallery-dl to download all folders there then naturally results in duplicates.
But I think we should definitely err on the side of duplicates instead of missing some items.
Because 1. most DeviantArt profiles aren't actually that big at all and 2. deduplication can be done on the user's side, lots of programs to choose from here should someone feel this to be necessary.

But the gallery/all endpoint always gets the exactly right amount of deviations, although without any folder structure, right?

I agree on your suggestion in regard to points (2), (7) and (10) vs. (3), (8) and (12).
Makes sense to me, and making use of the distinction between user.deviantart.com/gallery/ and user.deviantart.com/gallery/?catpath=/ is the most straightforward way I can think of right now.

With regard to (4) and (5):
Using the extraction for folders (6) doesn't work here?
I always assumed this would just be another folder, at least in the case of (5).

What initially made me think about requiring a consensus for the defaults in the first place is the inconsistency you mention in (7), my first thought was that maybe using user.deviantart.com/favourites/ as a "shortcut" to user.deviantart.com/favourites/?catpath=/ would be a good idea, because that is what a hypothetical gallery-dl user was trying to achieve anyway.
But if (7) can be fixed, even better, this way we could do it without "rewriting" any input URLs to different endpoints. I had not thought about your suggestion including (10) (Journals) with (1) and (7) together before, but I agree this would make sense, for the sake of completeness. Else, telling our hypothetical users to use explicit URLs for journal retrieval seems also valid to me. But okay, bundling this all together in a neat and tidy package seems like the more elegant solution.

A small remark to (13) and (14): Nothing wrong with that suggestion, but alternatively a separate category directory could be used here instead, directly below the top level, i.e. deviantart\selection or something, because I think that if someone (again, our hypothetical user) makes the extra step to invoke gallery-dl on an URL to one specific element, it will be done on purpose, I'd assume.
Best to use whatever is easier to implement 😉

Another example of this are usernames: sometimes the value of author[username] gets used, which is included in the actual API response and may contain capital letters, and sometimes the collection[owner] value get used, which is just the lowercase username taken from the URL.

collection[owner] is used in the context of favorites, right? Probably not a problem, because I'd think that this would mostly make sense as a keyword for use in filenames and nothing else.

@mikf
Copy link
Owner

mikf commented Jul 10, 2017

I tried to adjust the default directories to a more sane default and make them work the same for both users and groups (af9bd17). It's not exactly the same as I suggested, but I think I like this version more.

URL Directory
user.deviantart.com/ user/
user.deviantart.com/gallery/ user/
user.deviantart.com/gallery/?catpath=/ user/
user.deviantart.com/gallery/123/folder user/folder/
user.deviantart.com/favourites/ user/Favourites/
user.deviantart.com/favourites/?catpath=/ user/Favourites/
user.deviantart.com/favourites/123/folder user/Favourites/folder/
user.deviantart.com/journal/ user/Journal/
user.deviantart.com/journal?catpath=/ user/Journal/
user.deviantart.com/art/name-123 user/

So user.deviantart.com/(gallery|favourites|journal)/ (with or without ?catpath=/) produces a flat list of all deviation objects in that respective category. Gallery-folders and favourite-collections get their own sub-directory. Putting single deviations into their own directory seemed kind of weird when I tried that in practice, so they are just going into their owners directory.
This doesn't really preserve any given folder structure like I suggested it would, but I just wanted to get groups and a general directory structure going.

Everything, except getting all images of a gallery (e.g. user.deviantart.com/gallery/), works for users as well as groups, but that can be worked around by feeding all the different gallery-folder URLs into gallery-dl, like you suggested.


But the gallery/all endpoint always gets the exactly right amount of deviations, although without any folder structure, right?

That is correct. Adding folder-structure information for this might be tricky and DeviantArt's end since images can be in multiple folders at once.

With regard to (4) and (5):
Using the extraction for folders (6) doesn't work here?
I always assumed this would just be another folder, at least in the case of (5).

There is only a folder entry for "Featured" (and all the other user-created folders), but not for "All", "Scraps" or any special Category Path. Someone on the DeviantArt-API issue tracker mentioned that the "Scraps" entries where previously included in the output of gallery/all, but not anymore.

Another example of this are usernames: sometimes the value of author[username] gets used, which is included in the actual API response and may contain capital letters, and sometimes the collection[owner] value get used, which is just the lowercase username taken from the URL.

collection[owner] is used in the context of favorites, right? Probably not a problem, because I'd think that this would mostly make sense as a keyword for use in filenames and nothing else.

Sorry for not explaining this good enough, but that is not exactly what I meant. I also forgot that this isn't an issue on Windows, so you probably haven't noticed the problem with this.
Paths on POSIX systems are case-sensitive. AbC and abc refer to two different paths, so using a mixed-case and a lower-case value for the same directory name will result in two directories being created instead of just one as intended. (Paths on Windows systems are case-insensitive: AbC and abc refer to the same thing)

mikf added a commit that referenced this issue Jul 12, 2017
For groups the 'GalleryExtractor' collects all gallery-folder URLs
and defers its work to the 'FolderExtractor'.
@Hrxn
Copy link
Contributor Author

Hrxn commented Jul 15, 2017

Everything, except getting all images of a gallery (e.g. user.deviantart.com/gallery/), works for users as well as groups, but that can be worked around by feeding all the different gallery-folder URLs into gallery-dl, like you suggested.

Very good news. So the default directory structure would be like this?

URL Directory
user.deviantart.com/ user/
group.deviantart.com/ group/

But without any deviation-items in the group/ top-level directory itself, instead in gallery-folders one level below?

There is only a folder entry for "Featured" (and all the other user-created folders), but not for "All", "Scraps" or any special Category Path. Someone on the DeviantArt-API issue tracker mentioned that the "Scraps" entries where previously included in the output of gallery/all, but not anymore.

Ah, okay. I assumed that ...?catpath=/ would be the "All" folder, basically, and work along the lines of the other (user) folders.
(Same for "Scraps")
But hey, as long as they work by using their URLs explicitly with gallery-dl, fine.


URL Subcategory Extractor Class API Endpoint
user.deviantart.com/ gallery DeviantartGalleryExtractor(..) gallery/all *
user.deviantart.com/gallery/ gallery DeviantartGalleryExtractor(..) gallery/all *
user.deviantart.com/gallery/?catpath=/ gallery DeviantartGalleryExtractor(..) gallery/all *
user.deviantart.com/gallery/123/folder folder DeviantartFolderExtractor(..) /gallery/{folderid}
user.deviantart.com/favourites/ favorite DeviantartFavoriteExtractor(..) /collections/{folderid}
user.deviantart.com/favourites/?catpath=/ favorite DeviantartFavoriteExtractor(..) /collections/{folderid}
user.deviantart.com/favourites/123/folder collection DeviantartCollectionExtractor(..) /collections/{folderid}
user.deviantart.com/journal/ journal DeviantartJournalExtractor(..) /browse/user/journals
user.deviantart.com/journal?catpath=/ journal DeviantartJournalExtractor(..) /browse/user/journals
user.deviantart.com/art/name-123 deviation DeviantartDeviationExtractor(..) /deviation/{deviationid}
user.deviantart.com/journal/name-123 deviation DeviantartDeviationExtractor(..) /deviation/{deviationid}

* : Depends on your setting of deviantart.flat.
Note 1: The base class getting extended (DeviantartExtractor) is omitted.

Damn, Github's columns not wide enough. I think I should split the second half of the table into a separate table below.

Let me know if anything is not correct, I'll update it accordingly.

@mikf
Copy link
Owner

mikf commented Jul 15, 2017

Everything in your table seems to be correct, as is your assumption about the directory structure (a flat structure for group-images doesn't exist and doesn't make too much sense either).
There is one more not so important URL-type, though: group.deviantart.com/blog/, which is exactly the same as user.deviantart.com/journal/, but for groups.

I've also added the deviantart.flat option, which allows you to choose between a flat directory structure (user/) or sub-directories (user/folder-name/) for the output of the gallery- and favorite-extractors.

Regarding the initial topic of this issue: everything that works for users now also works for groups, even group.deviantart.com/gallery/.

@Hrxn
Copy link
Contributor Author

Hrxn commented Jul 15, 2017

Good to hear, will do some tests in the next days..

So the gallery/all endpoint will only be used with deviantart.flat set to true (default), otherwise
it's the two-step approach: 1. gallery/folders -> flat array of folder ids -> 2. /gallery/{folderid}?

Basically the same approach as with all endpoints in the table above that use a {folderid}

@mikf
Copy link
Owner

mikf commented Jul 15, 2017

Yes, pretty much that.
I should mention that group-galleries will always be handled by the /gallery/folders -> /gallery/{folderid} approach, regardless of your deviantart.flat setting.

@Hrxn
Copy link
Contributor Author

Hrxn commented Jul 15, 2017

With the two-step method, one sub-extractor delegating the tasked URL to another sub-extractor, which format settings get used, if specified (Directory, Filename)? Always the last one (i.e. 2nd sub-extractor)?

@mikf
Copy link
Owner

mikf commented Jul 15, 2017

Yes, always the last one.
In this case the gallery and favorite extractors are delegating their work to the folder and collection extractors respectively, so you have to set extractor.deviantart.folder.directory or .collection.directory.

You can also manually reproduce the "delegation" by using the -g and -i option:

# write gallery-folder URLs to file
$ gallery-dl -g http://adoptik.deviantart.com/ > urls

# download their images
$ gallery-dl -i urls

gallery-dl basically does the same thing internally.

@Hrxn
Copy link
Contributor Author

Hrxn commented Aug 4, 2017

Uh, almost forgot. I made a test run few days ago with the initial example (http://cgpinups.deviantart.com/), everything seemed to work fine, until it threw an error after ~ 20 GB because my drive run out of space :/

But I'll count it as an success anyway 😄

One little thing, but I guess this again is the API response difference seen before:
gallery-dl --list-keywords http://cgpinups.deviantart.com returns nothing, for example.

Another question: What is the keyword difference for author[urlname] and author[username]?
Because so far I've encountered no example, it's always the same (minus the letter case, sometimes).

Some accounts also have names in other languages (Example: http://arsenixc.deviantart.com/), as seen in the info box in the upper right corner or the title of the page (e.g. <title>arsenixc (&#12450;&#12523;&#12473;) | DeviantArt</title>)

Not sure how the API handles this..
Anyway, I just realized it would probably be a bad idea. Better to forget this..

@mikf
Copy link
Owner

mikf commented Aug 5, 2017

gallery-dl --list-keywords http://cgpinups.deviantart.com returns nothing, for example.

This happens every time an extractor offloads its work to other extractors and doesn't provide any metadata itself. Some sort of info-message would be appropriate here, I think.

What is the keyword difference for author[urlname] and author[username]? ... it's always the same (minus the letter case, sometimes).

That is exactly it: the letter case. username is the mixed-case value returned by the API, urlname is the lowercase version of that and also the username variant used in URLs (arsenixc for http://arsenixc.deviantart.com/)
This is just for consistency's sake to always have a lowercase version of a username available, as Python's default format strings don't allow you to convert an input string to lowercase.

Some accounts also have names in other languages (Example: http://arsenixc.deviantart.com/)

The API says that this user's username is just arsenixc and アルス/&#12450;&#12523;&#12473; is his real name

{
    "user": {
        "userid": "E9A92126-5927-365C-80D9-3D0E1783790F",
        "username": "arsenixc",
        "usericon": "http://a.deviantart.net/avatars/a/r/arsenixc.jpg?5",
        "type": "regular"
    },
    "profile_url": "http://arsenixc.deviantart.com",
    "real_name": "アルス",
    "tagline": "~~~",
}

@Hrxn
Copy link
Contributor Author

Hrxn commented Aug 5, 2017

Yup, "real_name", that's what I meant..

I thought for a moment that this could eventually be useful as a keyword, because file systems should not be an issue anymore with lack of Unicode support, but I later realized that many accounts on DeviantArt don't use this real name at all, so this would be pretty pointless anyway.

@Hrxn
Copy link
Contributor Author

Hrxn commented Aug 21, 2017

Thinking about directory structures, I remembered this bit here:

Another thing from the API documentation: GET /user/profile/{username}

Doesn't work for groups:

GET https://www.deviantart.com/api/v1/oauth2/user/profile/cgpinups

Result
{
  "error": "invalid_request",
   "error_description": "user not found.",
   "error_code": 2,
   "status": "error"
}```

So, for a group the result is an error with error_description = user not found. and an error_code..
If this is the case for all groups, and every legit (resp. "real") user returns proper profile information, wouldn't this be an easy and straightforward way to reliably tell users and groups apart?

@mikf
Copy link
Owner

mikf commented Aug 21, 2017

That is what I have been using ever since thinking about this again because of your comment here.
I initially didn't want to use /user/profile/{username} since it involves another API call and doesn't differentiate between groups and non-existing users. but in the end it resulted in nice and clean code to solve the actual problem, so thank you for this suggestion.

@Hrxn
Copy link
Contributor Author

Hrxn commented Aug 21, 2017

Understood. If my rudimentary knowledge of Python doesn't fail me again, the check is called from here in the GalleryExtractor sub-extractor:

def deviations(self):
if self.flat and self.api.user_profile(self.user):
return self.api.gallery_all(self.user, self.offset)
else:
folders = self.api.gallery_folders(self.user)
return self._folder_urls(folders, "gallery")

Line 177, to be exact. And gallery-dl always ends up in GalleryExtractor in this case, because Users and Groups have the exact same URL scheme, as already documented earlier in this thread here I think.

And in _folder_urls, folder URLs returned by the API get collected and then used here urlfmt = "https://{}.deviantart.com/{}/0/{}", from what I understand is called a format string(?), to rebuilt complete, valid and specific URLs, that then get re-used by gallery-dl and the normal URL handling and matching logic. That is this "delegation", which already came up in one of the threads here, but I wonder, and I'm not sure if this is something specific to Python, if it would also be possible to delegate by calling a sub-extractor class from inside another sub-extractor class? If it wasn't obvious before that I don't know how to properly use classes in Python, it now definitely is..

What I'm trying to get at, basically, is right now the output options for groups cannot be changed, they use the setting for directory etc. just like Gallery or Folder, so they always end up as part of the /user directory structure, with default settings.


PS.
That code embedding thing above into comments seems to be new..
👍 to GitHub.

@mikf
Copy link
Owner

mikf commented Aug 22, 2017

but I wonder, and I'm not sure if this is something specific to Python

No, this is just part of the extractor-"infrastructure" used by gallery-dl and independent of the language used. The important line for URL-delegation is line 40: (deviation, in this case, contains a folder- or collection-URL)

if isinstance(deviation, str):
yield Message.Queue, deviation
continue

... which then ends up here ...

def handle_queue(self, url):
try:
DownloadJob(url).run()
except exception.NoExtractorError:
self._write_unsupported(url)

... and starts a new download job using another sub-extractor.

if it would also be possible to delegate by calling a sub-extractor class from inside another sub-extractor class?

That would actually be possible, but it would use the settings and formats of the "parent"-extractor, which might or might not be useful.

To address your actual issue, i.e. the possibility to configure different paths for users and groups: Directory- and filename-formats as well as (sub)category values are bound to the extractor class being used, but I could create group-specific classes and "delegate" (yes, again) from the main classes to these new ones (GalleryExtractor to GroupGalleryExtractor). This would allow for different configuration options to be a possibility.

@mikf mikf reopened this Aug 22, 2017
mikf added a commit that referenced this issue Aug 22, 2017
This is done by prepending "group-" to an extractor's subcategory
if the URL belongs to a group ("folder" becomes "group-folder" and
so on). This changes the configuration-path being used and is also
reflected in the output of '--list-keywords'.
@mikf
Copy link
Owner

mikf commented Aug 22, 2017

Ok, I went with a different approach which yields the same results: subcategory values for groups now get prepended with "group-", which allows you to set different output options for group-related URLs.

edit:
to give an example: you can set deviantart.folder.directory for user-based gallery folders and deviantart.group-folder.directory for group-based ones.

@Hrxn
Copy link
Contributor Author

Hrxn commented Aug 24, 2017

Good idea, thanks for changing this.

Changed my config to this now:

        "deviantart":
        {
            "gallery":
            {
                "directory": ["DeviantArt", "Galleries", "{author[urlname]}"],
                "filename": "{index}_{title}.{extension}"
            },
            "favorite":
            {
                "directory": ["DeviantArt", "Favorites", "{username}"],
                "filename": "{index}_{title}_by_{author[username]}.{extension}"
            },
            "deviation":
            {
                "directory": ["DeviantArt", "Deviations", "{author[username]}"],
                "filename": "{index}_{title}_by_{author[username]}.{extension}"
            },
            "folder":
            {
                "directory": ["DeviantArt", "Galleries", "{folder[owner]}", "Folders", "{folder[title]}"],
                "filename": "{index}_{title}.{extension}"
            },
            "group-folder":
            {
                "directory": ["DeviantArt", "Groups", "{folder[owner]}", "Folders", "{folder[title]}"],
                "filename": "{index}_{title}.{extension}"
            },
            "collection":
            {
                "directory": ["DeviantArt", "Favorites", "{collection[owner]}", "Collections", "{collection[title]}"],
                "filename": "{index}_{title}_by_{author[username]}.{extension}"
            },
   

(Clipped at bit at the end)

Just to be clear: this only affects two possible categories, that can now be prepended with group, right?
folder and journal

@Hrxn Hrxn closed this as completed Aug 24, 2017
@mikf
Copy link
Owner

mikf commented Aug 24, 2017

It also affects favorite, collection and (theoretically) gallery, so everything except deviation (there are no group-specific single deviations, as far as i'm aware).

This change causes the username part of the initial URL (arsenic for http://arsenixc.deviantart.com/) to be looked up via /user/profile/{username} in all extractors and before everything else, to decide between user or group.

@Hrxn
Copy link
Contributor Author

Hrxn commented Aug 26, 2017

Okay, changed it again.

The full deviantart part from my config:

"deviantart":
        {
            "gallery":
            {
                "directory": ["DeviantArt", "Galleries", "{author[urlname]}"],
                "filename": "{index}_{title}.{extension}"
            },
            "folder":
            {
                "directory": ["DeviantArt", "Galleries", "{folder[owner]}", "Folders", "{folder[title]}"],
                "filename": "{index}_{title}.{extension}"
            },
            "favorite":
            {
                "directory": ["DeviantArt", "Favorites", "{username}"],
                "filename": "{index}_{title}_by_{author[username]}.{extension}"
            },
            "collection":
            {
                "directory": ["DeviantArt", "Favorites", "{collection[owner]}", "Collections", "{collection[title]}"],
                "filename": "{index}_{title}_by_{author[username]}.{extension}"
            },
            "journal":
            {
                "directory": ["DeviantArt", "Journals", "{username}"],
                "filename": "{index}_{title}.{extension}"
            },
            "group-gallery":
            {
                "directory": ["DeviantArt", "Groups", "{username}"],
                "filename": "{index}_{title}_by_{author[username]}.{extension}"
            },
            "group-folder":
            {
                "directory": ["DeviantArt", "Groups", "{folder[owner]}", "Folders", "{folder[title]}"],
                "filename": "{index}_{title}_by_{author[username]}.{extension}"
            },
            "group-favorite":
            {
                "directory": ["DeviantArt", "Groups", "{username}", "Favorites"],
                "filename": "{index}_{title}_by_{author[username]}.{extension}"
            },
            "group-collection":
            {
                "directory": ["DeviantArt", "Groups", "{collection[owner]}", "Favorites", "{collection[title]}"],
                "filename": "{index}_{title}_by_{author[username]}.{extension}"
            },
            "group-journal":
            {
                "directory": ["DeviantArt", "Groups", "{username}", "Journals"],
                "filename": "{index}_{title}.{extension}"
            },
            "deviation":
            {
                "directory": ["DeviantArt", "Deviations", "{category_path}"],
                "filename": "{index}_{title}_by_{author[username]}.{extension}"
            },
            "flat": true,
            "mature": true
        },

I think this should do it.. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants