Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add provision for other languages in Youtube Video Block #8630

Merged

Conversation

vishesh10
Copy link
Contributor

Background

Youtube Video Block could only transcribe videos if the language was "en". This PR enhances the capability of the block for other languages as well.

Changes 🏗️

  • Implemented the changes in the youtube.py script to return one of the available transcripts. An exception is thrown If a transcript is not available.

Test

  • Happy path
Screenshot 2024-11-12 at 11 08 19 PM
  • Transcript is not available
Screenshot 2024-11-12 at 11 05 34 PM

@vishesh10 vishesh10 requested a review from a team as a code owner November 12, 2024 17:57
@vishesh10 vishesh10 requested review from Bentlybro and majdyz and removed request for a team November 12, 2024 17:57
Copy link
Contributor

This PR targets the master branch but does not come from dev or a hotfix/* branch.

Automatically setting the base branch to dev.

@github-actions github-actions bot added platform/backend AutoGPT Platform - Back end platform/blocks labels Nov 12, 2024
@github-actions github-actions bot changed the base branch from master to dev November 12, 2024 17:58
Copy link

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

8422 - Partially compliant

Fully compliant requirements:

  • Get list of available transcripts
  • Check if transcript list is empty
  • Get first available transcript in its language
  • Get default transcript using the first available language

Not compliant requirements:

  • No explicit fallback from 'en' implemented - code just takes first available language
⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Logic Issue
The for loop iterates through transcript_list but only processes the first iteration since it returns immediately. This makes the loop redundant.

Error Handling
Generic exception catch block masks specific errors that could help in debugging issues. Consider catching specific exceptions.

Copy link

netlify bot commented Nov 12, 2024

Deploy Preview for auto-gpt-docs ready!

Name Link
🔨 Latest commit 3ee4f2f
🔍 Latest deploy log https://app.netlify.com/sites/auto-gpt-docs/deploys/6735c003f5fdf0000889bd58
😎 Deploy Preview https://deploy-preview-8630--auto-gpt-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@aarushik93
Copy link
Contributor

Just a comment on the expected behaviour @Torantulino is this the expected behaviour in this? Shouldn't it be to choose the language? Seems odd to me that it would just choose the first available language?

@Torantulino
Copy link
Member

Thanks for taking this on @vishesh10!

@aarushik93 this resolves a known issue with the transcriber block that will currently fail to get a transcription even if there is a us-en transcript available and not one exactly named en. As to whether we should defaultly pick the first available transcription, that's a good question, I don't know how the application orders them. I'd assume that the first transcription available would be the native language of the video.

To default to English you can first attempt a call to: YouTubeTranscriptApi.get_transcript(video_id)

Note: By default, this will try to access the English transcript of the video.

There is other improvements that could be made here such as preferring manually created transcriptions over Google's auto-generated ones. That can be done via:
transcript.is_generated

All in all though this is an undeniable improvement over the existing functionality and a step in the right direction.

@Torantulino Torantulino enabled auto-merge (squash) November 14, 2024 09:17
@Torantulino Torantulino merged commit 639242a into Significant-Gravitas:dev Nov 14, 2024
15 checks passed
@vishesh10
Copy link
Contributor Author

Thanks for considering the PR @Torantulino @aarushik93.

Per documentation the find_transcript method returns the manually created transcripts first. If none are found, generated transcripts are returned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transcribe YouTube Video Block should automatically fallback to other languages if en is not available
3 participants