Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bilibili] initial support (#2824) #6443

Merged
merged 2 commits into from
Nov 10, 2024
Merged

[bilibili] initial support (#2824) #6443

merged 2 commits into from
Nov 10, 2024

Conversation

hdk5
Copy link
Contributor

@hdk5 hdk5 commented Nov 8, 2024

No description provided.

@hdk5
Copy link
Contributor Author

hdk5 commented Nov 8, 2024

To consider:

  • Only "opus articles" are supported here, as I personally only interested in them
  • Anti-spam protection triggers if User-Agent string is not one of a recent browser. For example, danbooru calculates it on the fly (danbooru/danbooru@566c467)
  • Rate-limit is not accounted for, but it triggers with the exact same, indistinguishable, error as above
  • Chores (tests, docs, readme, etc.)

@hdk5 hdk5 marked this pull request as ready for review November 8, 2024 22:01
@Hrxn
Copy link
Contributor

Hrxn commented Nov 9, 2024

Isn't that a video-only site? Or at least used to be?

@hdk5
Copy link
Contributor Author

hdk5 commented Nov 9, 2024

Isn't that a video-only site? Or at least used to be?

I am not that much familiar with the Chinese internet to be honest. I think it was, but now it has an article section. Compare to how YouTube has a "Community" tab. I think gallery-dl can integrate with yt-dlp for videos too, but, as I said, I am personally interested only in bilibili space articles, so I'll let Cunningham's Law handle other contributions types if needed.

@Hrxn
Copy link
Contributor

Hrxn commented Nov 9, 2024

I'm not familiar either, but let me just dump this from yt-dlp's supportedsites.md here:

- BiliBili
- Bilibili category extractor
- BilibiliAudio
- BilibiliAudioAlbum
- BiliBiliBangumi
- BiliBiliBangumiMedia
- BiliBiliBangumiSeason
- BilibiliCheese
- BilibiliCheeseSeason
- BilibiliCollectionList
- BilibiliFavoritesList
- BiliBiliPlayer
- BilibiliPlaylist
- BiliBiliSearch: Bilibili video search; "bilisearch:" prefix
- BilibiliSeriesList
- BilibiliSpaceAudio
- BilibiliSpaceVideo
- BilibiliWatchlater
- BiliIntl: biliintl
- biliIntl:series: biliintl
- BiliLive

@hdk5
Copy link
Contributor Author

hdk5 commented Nov 9, 2024

Thanks, was not aware of those. In any case, yt-dlp doesn't supports what I need.
On user profiles specifically, there are "Video", "Audio", "Article" tabs, and also "Dynamic" tab that combines all of them.
Modified the extractor to clarify that only "Article"-type posts are supported.

@mikf
Copy link
Owner

mikf commented Nov 10, 2024

Do you want me to do a proper code review, or would it be OK if I did any changes and additions myself?

@hdk5
Copy link
Contributor Author

hdk5 commented Nov 10, 2024

Please go ahead with your changes. The branch should be open for maintainers.

@mikf mikf merged commit fc59e0f into mikf:master Nov 10, 2024
0 of 10 checks passed
@hdk5
Copy link
Contributor Author

hdk5 commented Nov 10, 2024

When rate-limited, request for article fails:

Traceback (most recent call last):
  File "C:\Users\hdk5\repo\gallery-dl\gallery_dl\job.py", line 151, in run
    for msg in extractor:
  File "C:\Users\hdk5\repo\gallery-dl\gallery_dl\extractor\bilibili.py", line 46, in items
    article = self.api.article(self.groups[0])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hdk5\repo\gallery-dl\gallery_dl\extractor\bilibili.py", line 106, in article
    return util.json_loads(text.extr(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

What I do for now is retry if response enforces logging in.

    def article(self, article_id):
        url = "https://www.bilibili.com/opus/" + article_id

        while True:
            response = self.extractor.request(url)

            if "window._riskdata_" in response.text:
                self.extractor.wait(seconds=300)
                continue

            return util.json_loads(text.extr(
                response.text, "window.__INITIAL_STATE__=", "};") + "}")

Would be cool to properly figure this out.

mikf added a commit that referenced this pull request Nov 15, 2024
- set 3-6 second request_interval by default
- retry request after waiting 5 minutes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants