Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend subtitles parser to recognize the format of automatically generated subtitles #22

Closed
stanislaw opened this issue Feb 13, 2023 · 2 comments · Fixed by #31
Closed
Labels
enhancement New feature or request

Comments

@stanislaw
Copy link
Contributor

stanislaw commented Feb 13, 2023

The subtitles are parsed correctly when they are not auto-generated (and we already have a working unit test for this 🥳 ).

When they are auto-generated, the subtitles files contain more structure and lines with funny <c>...</c> tags that have to be recognized by the parser.

Here is an example:

WEBVTT
Kind: captions
Language: de

00:00:00.000 --> 00:00:02.869 align:start position:0%
 
hallo <00:00:00.430><c>ich </c><00:00:00.860><c>bin </c><00:00:01.290><c>david </c><00:00:01.720><c>gründer </c><00:00:02.150><c>der </c><00:00:02.580><c>lingus</c>

00:00:02.869 --> 00:00:02.879 align:start position:0%
hallo ich bin david gründer der lingus
 

00:00:02.879 --> 00:00:05.390 align:start position:0%
hallo ich bin david gründer der lingus
organic <00:00:03.336><c>und </c><00:00:03.793><c>erfinder </c><00:00:04.250><c>der </c><00:00:04.707><c>bilingue</c>

Seems directly relevant:

@kamui-fin
Copy link
Owner

So far I got the parser to avoid including that additional markup in PR #31. We still need to write code to avoid duplicate lines, correct?

@kamui-fin
Copy link
Owner

Just added the duplicate removing code. Let me know if I can close this now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants