-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pornhub gif (actually short webm video) download from (https://www.pornhub.com/gif/) #31176
Comments
Please:
|
|
The page seen by yt-dl has these video elements: ...
<meta name="twitter:player:stream" content="https://dl.phncdn.com/pics/gifs/038/435/321/38435321a.webm">
<meta name="twitter:player:stream:content_type" content="video/webm">
<meta name="twitter:player:stream" content="https://dl.phncdn.com/pics/gifs/038/435/321/38435321a.mp4">
<meta name="twitter:player:stream:content_type" content="video/mp4">
<meta name="twitter:player:width" content="1280">
<meta name="twitter:player:height" content="720">
...
<script type="application/ld+json">
{
"@context": "http://schema.org/",
"@type": "VideoObject",
"name": "leolulu intro 1",
"description": "Check out leolulu intro 1 porn gif with Leolulu, Threesome from video We were just trying to shoot a morning sex scene in the kitchen... Amateur Couple LeoLulu on Pornhub.com",
"contentUrl": "https://dl.phncdn.com/pics/gifs/038/435/321/38435321a.webm",
"thumbnailUrl": "https://dl.phncdn.com/gif/38435321.gif",
"uploadDate": "2021-11-22"
}
...
<div
id="js-gifToWebm"
class="centerImage notModal"
data-gif="https://dl.phncdn.com/gif/38435321.gif"
data-mp4="https://dl.phncdn.com/pics/gifs/038/435/321/38435321a.mp4"
data-webm="https://dl.phncdn.com/pics/gifs/038/435/321/38435321a.webm"
data-gif-title="leolulu intro 1"
data-fallback="https://dl.phncdn.com/pics/gifs/038/435/321/38435321a.mp4"
> That's 2 instances of the .mp4, 3 of the target .webm, and 2 of the .gif. First we need to prevent the wrong extractor from running by changing the URL pattern at l.636 of class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
- _VALID_URL = r'https?://(?:[^/]+\.)?%s/(?P<id>(?:[^/]+/)*[^/?#&]+)' % PornHubBaseIE._PORNHUB_HOST_RE
+ _VALID_URL = r'https?://(?:[^/]+\.)?%s/(?!playlist/|gif/)(?P<id>(?:[^/]+/)*[^/?#&]+)' % PornHubBaseIE._PORNHUB_HOST_RE
_TESTS = [{ Then the problem page is handled by the generic extractor which finds the .webm, presumably from the second (ld+json script element) group: $ python3.9 -m youtube_dl -v -F 'https://www.pornhub.com/gif/38435321'
[debug] System config: ['--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', '-F', 'https://www.pornhub.com/gif/38435321']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 46b8ae2f5
[debug] Python version 3.9.13 (CPython) - Linux-4.4.0-210-generic-i686-with-glibc2.23
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[generic] 38435321: Requesting header
WARNING: Falling back on generic information extractor.
[generic] 38435321: Downloading webpage
[generic] 38435321: Extracting information
[info] Available formats for 38435321:
format code extension resolution note
0 webm unknown
$ This also finds a reasonable set of metadata:
Here the age_limit is wrong. PH claims to respect the RTA labelling scheme but adds the label with script. The page yt-dl sees doesn't actually have the text that it looks for according to the RTA scheme. Some options:
Taking the last option, the page contains a link with This change catches both, but maybe the 2257 pattern will give too many false positives: --- old/youtube_dl/extractor/generic.py
+++ new/youtube_dl/extractor/generic.py
@@ -2538,9 +2538,11 @@ class GenericIE(InfoExtractor):
age_limit = self._rta_search(webpage)
# And then there are the jokers who advertise that they use RTA,
# but actually don't.
- AGE_LIMIT_MARKERS = [
- r'Proudly Labeled <a href="http://www\.rtalabel\.org/" title="Restricted to Adults">RTA</a>',
- ]
+ AGE_LIMIT_MARKERS = (
+ r'<a\b[^>]+\bhref\s*=\s*"http://www\.rtalabel\.org/"[^>]+?(?:\btitle\s*=\s*"Restricted to Adults\b|>\s*RTA\b)',
+ r'''<img\b[^>]+\b(?:id\s*=["']RTAImage|alt\s*=\s*["']RTA)\b''',
+ r'(?:>\s*(?:(?:18\s+)?(?:U.S.C.|USC)\s+)?§?|/)2257\b',
+ )
if any(re.search(marker, webpage) for marker in AGE_LIMIT_MARKERS):
age_limit = 18 |
Checklist
Description
youtube-dl treat the
/gif/***
path URL as playlist and tries to download the "playlist" but nothing is downloaded.The text was updated successfully, but these errors were encountered: