Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[deviantart] add extractor for status updates #3541

Merged
merged 4 commits into from
Jan 23, 2023
Merged

Conversation

ClosedPort22
Copy link
Contributor

Partially resolves #3539

1/3 done

@ClosedPort22
Copy link
Contributor Author

Well, this is even more complicated than I thought, so I think I'll call it a day. As I explained in #3539 (comment), the /user/profile/posts/ endpoint does seem to replace /browse/user/journals content-wise, and it's very efficient in terms of API rate limiting. Plus, when used with the expand parameter, it's even possible to extract embedded images as (stashed) deviations. The major downside though is that this endpoint apparently returns the content in JSON format, so in order to construct a human-readable HTML document, a specialized parser must be written from scratch.

@ClosedPort22 ClosedPort22 marked this pull request as ready for review January 18, 2023 07:42
extract user status updates using the '/user/statuses/' endpoint
- recursively yield statuses
- ignore items with missing or unexpected field(s)
@ClosedPort22 ClosedPort22 marked this pull request as draft January 20, 2023 11:40
@ClosedPort22 ClosedPort22 marked this pull request as ready for review January 20, 2023 13:16
Comment on lines +158 to +174
alphabet = "0123456789abcdefghijklmnopqrstuvwxyz"
if "index" not in deviation:
try:
deviation["index"] = text.parse_int(
deviation["url"].rpartition("-")[2])
if deviation["url"].startswith("https://sta.sh"):
filename = deviation["content"]["src"].split("/")[5]
deviation["index_base36"] = filename.partition("-")[0][1:]
deviation["index"] = \
util.bdecode(deviation["index_base36"], alphabet)
else:
deviation["index"] = text.parse_int(
deviation["url"].rpartition("-")[2])
except KeyError:
deviation["index"] = 0
deviation["index_base36"] = "0"
if "index_base36" not in deviation:
deviation["index_base36"] = \
util.bencode(deviation["index"], alphabet)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this part of this PR? From what I can tell, this is not at all necessary for status updates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used to handle shared sta.sh deviations in status updates (see #3539 (comment)). I moved the code to the parent extractor because it's also possible to extract embedded sta.sh deviations in journals, and I plan to add this feature in the future.

gallery_dl/extractor/deviantart.py Outdated Show resolved Hide resolved
gallery_dl/extractor/deviantart.py Outdated Show resolved Hide resolved
Comment on lines +1255 to +1257
def comments(self, id, target, offset=0):
"""Fetch comments posted on a target"""
endpoint = "/comments/{}/{}".format(target, id)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have preferred to have one DeviantartOAuthAPI method per API endpoint, but I guess this works as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method can also be used to fetch comments posted on a user's profile.

- relax regex pattern
- handle invalid 'items' field
- add a test for shared sta.sh item

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
@@ -802,7 +809,7 @@ def deviations(self):
yield from self.status(status)

def status(self, status):
for item in status.get("items", ()): # do not trust is_share
for item in status.get("items") or (): # do not trust is_share
Copy link
Contributor

@rautamiekka rautamiekka Jan 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this change ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #3541 (comment)

This crashes (TypeError: 'NoneType' object is not iterable)

status = {"items": None}
for item in status.get("items", ()):
    pass

This does not

status = {"items": None}
for item in status.get("items") or ():
    pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I even wanna know why the Py core devs thought this was a good idea ? Surely they coulda included the or ... functionality in the .get.

@mikf mikf merged commit caae8fe into mikf:master Jan 23, 2023
@ClosedPort22 ClosedPort22 deleted the da-status branch January 23, 2023 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deviantart | Gallery-dl only downloading Journals and not Polls, Status Updates, from User Posts
3 participants