sync from upstream #2

mo-han · 2020-07-16T06:54:23Z

No description provided.

- don't run Instagram tests on Travis anymore - replace Twitter test because timeline was made private - update Hiperdex domain to '.com' (again ...)

… and prepare for more potential extractors

* Add instagram metadata: post_pageurl, post_tags Add the following metadata for instagram: - post_pageurl: json string with url of the post page - post_tags: json array with instagram tags extracted from the post description * Oops: rename post_tags to tags for --write-tags This way, --write-tags will pick up the post tags. * Rename to post_url, improve regex * Add post_url and tags to tests * Remove duplicate tags and sort them * Bugfix: don't create empty tag lists * Metadata: add location * Metadata: add tagged_users for each media * Move self._find_tags() to base class * Make flake happy

* rectify code of `join_title`, some minor fix. * + hentainexus self.data * fixed: call staticmethod join_title with data

This reverts commit 3e0848a. (#756, #771, #797, #803) https://github.com/althonos/InsaLooter/issues/287#issuecomment-630456522

Everything except logging in with username & password and TwitPic embeds should be working again. Metadata per Tweet is massively different than before (mostly raw API responses - might need some cleaning up) and the default 'archive_fmt' changed.

The text content of each tweet is always available as 'full_text'

when 'extra' is enabled

- add 'date' field - remove 'entities' and 'extended_entities' - don't include 'focus_fields' from 'original_info'

- remove useless clutter by creating new tweet-data dicts instead of reusing the original Tweet objects - rename fields to how they were named before ('id_str' -> 'tweet_id', etc.) - only include 'author' if it would differ from 'user' - restore 'archive_fmt'

A 'keyarg=1' argument to the memcache decorator would have worked as well, but keeping the user object in memory isn't useful for the vast majority of use cases and only wastes space. (closes #817)

This reverts commit 4cf3d54.

This is enabled by default and will recursively go through all (sub)folders in an artist's gallery. The old method of using "Latest Updates" lists can be restored by disabling this option.

This prevents pathfmt.filename from potentially being empty.

Only call os.makedirs() before a file is getting downloaded, and not immediately for every Directory message.

instead of expecting an URL and trying to complete it.

The relatively new v2 challenges aren't supported (*), but retrying often enough may yield a v1 challenge which can be solved. (*) and probably never will. They are far too complicated to do without a real browser.

The reported filename of the 'postfile' entry of each post may differ from the corresponding entry in the list of images or attachments, and be outright "wrong".

… instead of just '/t/unmuted/'

Please note that URLs are only "translated", all requests are still done always via the Twitter API.

mikf and others added 30 commits May 28, 2020 01:51

fix internal links in configuration.rst

b489f4d

update extractor test results

45baa13

- don't run Instagram tests on Travis anymore - replace Twitter test because timeline was made private - update Hiperdex domain to '.com' (again ...)

[redgifs] fix extraction (#724)

275ccee

… and prepare for more potential extractors

use %APPDATA%\gallery-dl for config/cache on Windows

da22ea8

hentainexus.py minor fix (#787)

a4e3d40

* rectify code of `join_title`, some minor fix. * + hentainexus self.data * fixed: call staticmethod join_title with data

[instagram] simplify code & complete tests (#743)

a63682a

[hentainexus] fix flake8 issues (#787)

2bff8dd

[instagram] update 'query_hash' values

a32aea4

[instagram] disable login with username&password (#756)

3e0848a

add section about cookies to README.rst

c4d06a8

[instagram] fix and re-enable login with username&password

0f459f3

This reverts commit 3e0848a. (#756, #771, #797, #803) https://github.com/althonos/InsaLooter/issues/287#issuecomment-630456522

update output of 'oauth:…' (#616)

864f422

update extractor test results

3bad157

release version 1.14.0

f1ef908

[twitter] login using the mobile nojs login page

bd0f214

[twitter] restore TwitPic support

2132e54

[twitter] remove 'content' option

0138e9c

The text content of each tweet is always available as 'full_text'

[deviantart] also search journals for sta.sh links (#712)

41d0316

when 'extra' is enabled

[twitter] skip unavailable tweets

655c98c

[twitter] small metadata cleanup

3eed5f5

- add 'date' field - remove 'entities' and 'extended_entities' - don't include 'focus_fields' from 'original_info'

[sensescans] use https://

4aea513

[deviantart] don't add journal text to description (#712)

c6c06c4

implement a general 'delete_items()' function

1fcf938

[twitter] improve pagination

d769bb4

[nhentai] fix extraction (closes #819)

83b7bd0

[twitter] add 'reply_to' metadata to replies

4442dfe

[twitter] don't cache results of 'user_by_screen_name()'

036a409

A 'keyarg=1' argument to the memcache decorator would have worked as well, but keeping the user object in memory isn't useful for the vast majority of use cases and only wastes space. (closes #817)

mikf and others added 29 commits June 25, 2020 19:11

update extractor test results

0cac14c

Revert "[kissmanga] workaround for CAPTCHAs (#818)"

699062b

This reverts commit 4cf3d54.

[aryion] add 'recursive' option (fixes #832)

f1ddbff

This is enabled by default and will recursively go through all (sub)folders in an artist's gallery. The old method of using "Latest Updates" lists can be restored by disabling this option.

update CHANGELOG before building sdist and wheel packages

e62ebb4

release version 1.14.2

4f16fd3

[subscribestar] add 'user' and 'post' extractors (#852)

821524e

add zsh completion script (#150)

d0cd86e

set pseudo extension for Metadata messages (#865)

d5bfb0b

This prevents pathfmt.filename from potentially being empty.

defer directory creation (fixes #722)

4d8b3e4

Only call os.makedirs() before a file is getting downloaded, and not immediately for every Directory message.

[8muses] support 'comics.8muses.com' URLs

c28db7a

let zsh completion immediately suggest cmdline options

74494b4

instead of expecting an URL and trying to complete it.

[twitter] improve error message formatting

6e2af9a

prevent unhandled exception on Cloudflare challenges (#868)

dbf841e

The relatively new v2 challenges aren't supported (*), but retrying often enough may yield a v1 challenge which can be solved. (*) and probably never will. They are far too complicated to do without a real browser.

[patreon] yield images and attachments before postfiles (#871)

f1344fe

The reported filename of the 'postfile' entry of each post may differ from the corresponding entry in the list of images or attachments, and be outright "wrong".

[redgifs] support gifsdeliverynetwork.com URLs (#874)

3424fb9

[reddit] limit title length in default filenames (#873)

94a08f0

[reddit] fix AttributeError when using 'recursion' (fixes #879)

5a6e750

[subscribestar] use current date instead of hard-coded '2020' (#852)

f5c9f1d

[imgur] support all '/t/...' URLs (closes #880)

27d163a

… instead of just '/t/unmuted/'

[twitter] add debug messages for all skipped Tweets (#867)

3855d0d

[artstation] add 'following' extractor (closes #888)

d594977

[khinsider] add 'format' option (closes #840)

cb0132e

[mangakakalot] Added extractors for MangaKakalot (#876)

7dfdcc3

[mangakakalot] update URL patterns, fix flake8 errors (#876)

9cd1bc6

update extractor test results

c51fbd7

[newgrounds] fix favorites extraction

e17d4f4

[twitter] add support for nitter.net URLs in pattern (#890)

86e5a05

Please note that URLs are only "translated", all requests are still done always via the Twitter API.

mention optional auth access for furaffinity (#893)

51b49a3

Document Extractor Prefixing (#892)

9339ac0

mo-han merged commit 2486e0f into mo-han:master Jul 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync from upstream #2

sync from upstream #2

mo-han commented Jul 16, 2020

sync from upstream #2

sync from upstream #2

Conversation

mo-han commented Jul 16, 2020