Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync from upstream #2

Merged
merged 95 commits into from
Jul 16, 2020
Merged

sync from upstream #2

merged 95 commits into from
Jul 16, 2020

Conversation

mo-han
Copy link
Owner

@mo-han mo-han commented Jul 16, 2020

No description provided.

mikf and others added 30 commits May 28, 2020 01:51
- don't run Instagram tests on Travis anymore
- replace Twitter test because timeline was made private
- update Hiperdex domain to '.com' (again ...)
… and prepare for more potential extractors
* Add instagram metadata: post_pageurl, post_tags

Add the following metadata for instagram:
- post_pageurl: json string with url of the post page
- post_tags: json array with instagram tags extracted from the post description

* Oops: rename post_tags to tags for --write-tags

This way, --write-tags will pick up the post tags.

* Rename to post_url, improve regex

* Add post_url and tags to tests

* Remove duplicate tags and sort them

* Bugfix: don't create empty tag lists

* Metadata: add location

* Metadata: add tagged_users for each media

* Move self._find_tags() to base class

* Make flake happy
* rectify code of `join_title`, some minor fix.

* + hentainexus self.data

* fixed: call staticmethod join_title with data
Everything except logging in with username & password and TwitPic
embeds should be working again.

Metadata per Tweet is massively different than before (mostly raw API
responses - might need some cleaning up) and the default 'archive_fmt'
changed.
The text content of each tweet is always available as 'full_text'
- add 'date' field
- remove 'entities' and 'extended_entities'
- don't include 'focus_fields' from 'original_info'
- remove useless clutter by creating new tweet-data dicts instead of
  reusing the original Tweet objects
- rename fields to how they were named before
  ('id_str' -> 'tweet_id', etc.)
- only include 'author' if it would differ from 'user'
- restore 'archive_fmt'
A 'keyarg=1' argument to the memcache decorator would have worked as
well, but keeping the user object in memory isn't useful for the vast
majority of use cases and only wastes space.

(closes #817)
mikf and others added 29 commits June 25, 2020 19:11
This is enabled by default and will recursively go through all
(sub)folders in an artist's gallery.

The old method of using "Latest Updates" lists can be restored by
disabling this option.
This prevents pathfmt.filename from potentially being empty.
Only call os.makedirs() before a file is getting downloaded,
and not immediately for every Directory message.
instead of expecting an URL and trying to complete it.
The relatively new v2 challenges aren't supported (*), but retrying
often enough may yield a v1 challenge which can be solved.

(*) and probably never will. They are far too complicated to do without
a real browser.
The reported filename of the 'postfile' entry of each post may differ
from the corresponding entry in the list of images or attachments,
and be outright "wrong".
… instead of just '/t/unmuted/'
Please note that URLs are only "translated", all requests are still
done always via the Twitter API.
@mo-han mo-han merged commit 2486e0f into mo-han:master Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants