Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support .srt formats for transcripts #441

Closed
elynema opened this issue Feb 29, 2024 · 1 comment
Closed

Support .srt formats for transcripts #441

elynema opened this issue Feb 29, 2024 · 1 comment
Assignees
Labels
transcripts Transcript component related

Comments

@elynema
Copy link

elynema commented Feb 29, 2024

Is your feature request related to a problem? Please describe.
Currently, Ramp only supports .vtt, plain text, and .docx formats for transcripts. Many caption files that are produced as .srt could be appropriate for transcripts, as well.

Describe the solution you'd like
Parse .srt format files provided to the transcript component and display them as interactive timed text.

Additional context
One motivating factor for considering this is that the .vtt files produced by auto-generated captions in Youtube seem to be invalid, and so don't work for transcripts in Ramp. However, the .srt files seem to be valid and could be an alternate option. It's unclear whether this is just a Youtube issue, or whether we might run into other issues with .vtt files produced by external systems?

An alternative to this is manually fixing .vtt files downloaded from Youtube, which is easy to do, as it requires deleting several lines at the start of the file.

Note that if we end up treating captions as transcripts in Avalon for search/display purposes, then we'll need to reconcile the format we allow as captions and transcripts; this could be another reason to support .srt for transcripts.

youtube-webvtt-example.vtt.txt

youtube-srt-example.srt.txt

@elynema elynema added the transcripts Transcript component related label Feb 29, 2024
@Dananji Dananji self-assigned this Mar 8, 2024
@joncameron
Copy link
Contributor

Works great, example used for testing at https://avalon-dev.dlib.indiana.edu/media_objects/8s45q877v.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
transcripts Transcript component related
Projects
None yet
Development

No branches or pull requests

3 participants