Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[twitter] 404 not found on trying to fully download #3522

Closed
IdiotNamedRaz opened this issue Jan 11, 2023 · 16 comments
Closed

[twitter] 404 not found on trying to fully download #3522

IdiotNamedRaz opened this issue Jan 11, 2023 · 16 comments

Comments

@IdiotNamedRaz
Copy link

On trying to use gallery-dl to scrape from a twitter account, it'll reach a point and then just spit out [twitter][error] 404 Not Found (Sorry, that page does not exist) when trying to use gallery-dl https://twitter.com/USER whereas gallery-dl https://twitter.com/USER/tweets and gallery-dl https://twitter.com/USER/media still work but stop before fully downloading a gallery (as expected) when trying to download from people with a large number of tweets

Example with --verbose flag

gallery-dl https://twitter.com/ESPER995 --verbose --no-download
[gallery-dl][debug] Version 1.24.4
[gallery-dl][debug] Python 3.9.7 - Windows-10-10.0.19044-SP0
[gallery-dl][debug] requests 2.26.0 - urllib3 1.26.7
[gallery-dl][debug] Configuration Files ['%USERPROFILE%\\gallery-dl.conf']
[gallery-dl][debug] Starting DownloadJob for 'https://twitter.com/ESPER995'
[twitter][debug] Using TwitterTimelineExtractor for 'https://twitter.com/ESPER995'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): twitter.com:443
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/graphql/7mjxD3-C6BxitPMVQ6w0-Q/UserByScreenName?variables=%7B%22screen_name%22%3A%22ESPER995%22%2C%22withSafetyModeUserFields%22%3Atrue%2C%22withSuperFollowsUserFields%22%3Atrue%7D HTTP/1.1" 200 921
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/graphql/nRybED9kRbN-TOWioHq1ng/UserMedia?variables=%7B%22userId%22%3A%222280615006%22%2C%22count%22%3A100%2C%22includePromotedContent%22%3Afalse%2C%22withSuperFollowsUserFields%22%3Atrue%2C%22withBirdwatchPivots%22%3Afalse%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%2C%22withSuperFollowsTweetFields%22%3Atrue%2C%22withClientEventToken%22%3Afalse%2C%22withBirdwatchNotes%22%3Afalse%2C%22withVoice%22%3Atrue%2C%22withV2Timeline%22%3Afalse%2C%22__fs_interactive_text%22%3Afalse%2C%22__fs_dont_mention_me_view_api_enabled%22%3Afalse%7D HTTP/1.1" 200 38424
...
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/graphql/nRybED9kRbN-TOWioHq1ng/UserMedia?variables=%7B%22userId%22%3A%222280615006%22%2C%22count%22%3A100%2C%22includePromotedContent%22%3Afalse%2C%22withSuperFollowsUserFields%22%3Atrue%2C%22withBirdwatchPivots%22%3Afalse%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%2C%22withSuperFollowsTweetFields%22%3Atrue%2C%22withClientEventToken%22%3Afalse%2C%22withBirdwatchNotes%22%3Afalse%2C%22withVoice%22%3Atrue%2C%22withV2Timeline%22%3Afalse%2C%22__fs_interactive_text%22%3Afalse%2C%22__fs_dont_mention_me_view_api_enabled%22%3Afalse%2C%22cursor%22%3A%22HBaEwLm5qpG4mCcAAA%3D%3D%22%7D HTTP/1.1" 200 301
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&send_error_codes=true&simple_quoted_tweet=true&count=100&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2CsuperFollowMetadata&q=from%3AESPER995+max_id%3A1412001926238121986+filter%3Alinks&tweet_search_mode=live&query_source=typed_query&pc=1&spelling_corrections=1 HTTP/1.1" 404 92
[twitter][error] 404 Not Found (Sorry, that page does not exist)
@ClosedPort22
Copy link
Contributor

Looks like they're finally moving on from the old search API.

@Hecatom
Copy link

Hecatom commented Jan 12, 2023

Came to report that.
I usually download the images using "https://twitter.com/search?filter:media&q=from:user" as a query to download media, since at least from when I started using the app because it missed a lot of posts, but now is not working at all, giving me giving me this error for any query

[twitter][error] 404 Not Found (Sorry, that page does not exist)

@ClosedPort22
Copy link
Contributor

ClosedPort22 commented Jan 12, 2023

Twitter changed the API endpoint from twitter.com/i/api/2/search/adaptive.json to api.twitter.com/2/search/adaptive.json. They also use api.twitter.com/graphql/[\w-]+/<endpoint> instead of twitter.com/i/api/graphql/[\w-]+/<endpoint>, now, so they may remove these old endpoints in the future as well.

Whew, that was easier than I thought it would be.

@nobody613
Copy link

Twitter changed the API endpoint from twitter.com/i/api/2/search/adaptive.json to api.twitter.com/2/search/adaptive.json. They also use api.twitter.com/graphql/[\w-]+/<endpoint> instead of twitter.com/i/api/graphql/[\w-]+/<endpoint>, now, so they may remove these old endpoints in the future as well.

Whew, that was easier than I thought it would be.

Are you sure ? I changed it a long time ago - and now after this update:
https://twittercommunity.com/t/announcing-the-deprecation-of-v1-1-statuses-filter-endpoint/182960

It doesn't work for me - I'm breaking my head

@nobody613
Copy link

NOTE :

  1. api.twitter.com/graphql/[\w-]+/ - WORK
  2. api.twitter.com/2/search/adaptive.json - NOT WORK
  3. this is my params for exm:
    [('include_can_media_tag', '1'), ('include_ext_alt_text', 'true'), ('include_quote_count', 'true'), ('include_reply_count', '1'), ('tweet_mode', 'extended'), ('include_entities', 'true'), ('include_user_entities', 'true'), ('include_ext_media_availability', 'true'), ('send_error_codes', 'true'), ('simple_quoted_tweet', 'true'), ('count', 100), ('cursor', '-1'), ('spelling_corrections', '1'), ('ext', 'mediaStats%2ChighlightedLabel'), ('tweet_search_mode', 'live'), ('f', 'tweets'), ('q', 'geocode:LAT*.LON*,10km')]

@ClosedPort22
Copy link
Contributor

ClosedPort22 commented Jan 12, 2023

2. api.twitter.com/2/search/adaptive.json - NOT WORK

Twitter blocked the default TLS fingerprint of the requests library for this particular endpoint. This is currently bypassable using -o browser=X or -o tls12=false, but I don't think this is good news, because more restrictions on the web frontend API might ensue.

Also, they added a bunch of other query parameters and removed tweet_search_mode. I don't know why but if this parameter is sent, the API will respond with 401 Unauthorized (Could not authenticate you.).

@mikf

@nobody613
Copy link

nobody613 commented Jan 12, 2023

THENKS - i will try to build this :
adapter = TlsAdapter(ssl.OP_NO_TLSv1 | ssl.OP_NO_TLSv1_1)
self._session = requests.Session()
self._session.mount("https://", adapter)

its midtime its not work but i still try
mybe its not work becouse i use windows ?
mybe i need to use with openssl on linux ?

@nobody613
Copy link

if i will use with a fake user agent and diffrend time zone its will help ?

@ClosedPort22
Copy link
Contributor

No, gallery-dl already uses the UA of Firefox by default.

@ClosedPort22
Copy link
Contributor

They actually haven't removed twitter.com/i/api/2/search/adaptive.json yet; the 404 error was solely caused by TLS fingerprinting.

Some third-party Twitter clients are facing issues as well, but I don't know if they also use the web API.

@mikf
Copy link
Owner

mikf commented Jan 13, 2023

I've only every gotten the 401 Unauthorized (Could not authenticate you.) error, which is fixed by removing the tweet_search_mode query parameter. When logged in I can access the search API endpoint even with that parameter still present.

Should I set browser = "firefox by default, even though it is not necessary in general?

@mikf
Copy link
Owner

mikf commented Jan 13, 2023

Regarding TLS fingerprinting: There is a ciphers option, which right now is undocumented for some reason.

custom_ciphers = self.config("ciphers")

https://github.com/mikf/gallery-dl/blob/e1a12761d7a0234eccc66f347ccd9987f23d9080/docs/configuration.rst#ciphers

edit: 4e86aaa

@ClosedPort22
Copy link
Contributor

I've only every gotten the 401 Unauthorized (Could not authenticate you.) error, which is fixed by removing the tweet_search_mode query parameter. When logged in I can access the search API endpoint even with that parameter still present.

Looks like the TLS fingerprint differs depending on the operating system used. Both @nobody613 and I used Windows.

Should I set browser = "firefox by default, even though it is not necessary in general?

tls12=false works at the moment. It's more lightweight and minimalistic (though I suspect altering any of the TLS parameters would have the same effect) IMO.

mikf added a commit that referenced this issue Jan 13, 2023
and update API root and general query parameters
@Wiiplay123
Copy link
Contributor

Now that TLS 1.2 is disabled, I'm getting this error instead of 404 when I try to download videos from searches (it works fine every other time, like for users or direct tweet links):

[downloader.http][warning] HTTPSConnectionPool(host='video.twimg.com', port=443): Max retries exceeded with url: /ext_tw_video/1611552416285032449/pu/vid/1280x530/tk8uo1NkKDfRsAQb.mp4?tag=12 (Caused by SSLError(SSLError(1, '[SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:997)'))) (1/5)

Images work fine, it's just videos.

@ClosedPort22
Copy link
Contributor

ClosedPort22 commented Jan 14, 2023

Now that TLS 1.2 is disabled, I'm getting this error instead of 404 when I try to download videos from searches (it works fine every other time, like for users or direct tweet links):

[downloader.http][warning] HTTPSConnectionPool(host='video.twimg.com', port=443): Max retries exceeded with url: /ext_tw_video/1611552416285032449/pu/vid/1280x530/tk8uo1NkKDfRsAQb.mp4?tag=12 (Caused by SSLError(SSLError(1, '[SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:997)'))) (1/5)

Images work fine, it's just videos.

My bad, it's because video.twimg.com doesn't support TLS 1.3 yet, which is weird since both pbs.twimg.com and video.twimg.com utilize Fastly's CDN infrastructure.

For the time being, you can either re-enable TLS 1.2 (-o tls12=true) AND use -o browser=firefox, or edit your hosts file to force the video domain to connect to the same IP address as pbs.twimg.com.

Maybe it's time to add a tls13 option...

@Kikimaru
Copy link

I'm still seeing success using
--cookies-from-browser BROWSERNAME

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants