-
-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bunkr] fixed extractor #4529
[bunkr] fixed extractor #4529
Conversation
I was testing this fix out and running into an issue. It's possible I did something incorrectly, however I believe the way the code currently reads in this PR is that it's pulling the cdn from the download page based on the string found between <source src=" and the next " for videos and between <img src=" and the next " for images. Which pulls the entire source URL for that album/gallery item. However, it then truncates that url to only grab the cdn root (i.e., 'https://media-files12.bunker.la/') and then appends the end of the {self} URL starting from 'v/' or 'i/' which works for the test album because the image in the file is found at the link https://bunkrr.su/i/test-%E3%83%86%E3%82%B9%E3%83%88-%22&%3E-QjgneIQv.png which is the same as the file name. Opening the first file goes to the link: However, the src URL for the download is: Currently, this code grabs the 'https://media-files12.bunkr.la/' appends the 'v/kLG2yrlpg7DSk' from the page link for that item and tries to download using 'https://media-files12.bunkr.la/v/kLG2yrlpg7DSk' which 404s as it's an invalid link. I tried commenting out lines 106 and 107 and adding in 'url = cdn' to just use the https://media-files12.bunkr.la/Woods-KbuqDmbn-rftcXF0I2H1v.mp4 link that is pulled by @Yakabuff's if/else statement, which worked.... For the first item in the gallery. It isn't iterating through all of the files in the gallery as the headers variable is iterating through the 'v/kLG2yrlpg7DSk' URLs instead of iterating through the actual download links that are needed. It just attempts to download the https://media-files12.bunkr.la/Woods-KbuqDmbn-rftcXF0I2H1v.mp4 file over and over 844 times I believe (one for each file in the gallery). I can look into this later, but don't have the time to dive into the code right this second. Figured I'd explain all this here in case @Yakabuff would be able to correct this quickly or if there's something I'm missing in how I was testing it such that this PR will actually resolve the bunkrr.su issue. |
It seems that removing Line #96
|
@HeavenlyVice Thanks, I will try implementing that and do further testing |
Yeah, there seems to be 2 different formats: This means we will have to either:
|
@mikf @HeavenlyVice Seems to work now. I am able to download links from both |
Yep, looks good to me. It may run slower than how it was previously working, but this should at least address the issue and get it working again. Then maybe we can figure out a faster way of doing it later. The only other note I'd have is to remove the comment on Line 96 or update it since we're no longer just grabbing the cdn root but the entire download link to clarify for anyone working on future development for the Bunkr extractor. I'm not sure if it's worthwhile to maybe have it display a process notification in the CLI for users to know that it is actually working when it's first grabbing all of the URLs for larger albums. When I was initially testing I wasn't sure that it was working at first because it just was hanging until I escaped it and ran it with the --verbose flag. It doesn't take a massive amount of time, but I made the mistake of testing it on a larger album (like 844 items in the album I believe?) at first so it did take a few minutes to run through them all. But if it were to just print to the CLI something along the lines of "Fetching download URLs..." or "Processing album information..." or something of the sort. Just a thought. |
The reason it's slower now is that's having to fetch a separate page for each and every file it's going to download.
Example:
Resulting download URL: Unfortunately, this sort of parsing isn't well suited for |
I tried to install it from your branch, but I got
---edit--- |
@bhaskoro-muthohar that's not a problem with the code. You need to set your config to a browser User-Agent string or you'll get blocked. |
What is the User-Agent value for bunkr you recommend? |
#4514
/v/
or/i/
depending on whether it's a video or image