Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I remove blank lines from VTT subtitles? #229

Open
zdoek001 opened this issue Jul 3, 2024 · 9 comments
Open

How do I remove blank lines from VTT subtitles? #229

zdoek001 opened this issue Jul 3, 2024 · 9 comments

Comments

@zdoek001
Copy link

zdoek001 commented Jul 3, 2024

WEBVTT

00:00:00.086 --> 00:00:00.961

xxxxx

00:00:01.166 --> 00:00:02.586

xxxxx

There's always a blank line between the timeline and the characters?

@rany2
Copy link
Owner

rany2 commented Jul 3, 2024

I don't understand? is it doing something different from https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API#webvtt_files ?

@zdoek001
Copy link
Author

zdoek001 commented Jul 3, 2024

I don't understand? is it doing something different from https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API#webvtt_files ?

Strange, the first line in my VTT subtitles is always a blank line

@zdoek001
Copy link
Author

zdoek001 commented Jul 3, 2024

WEBVTT

00:00:00.086 --> 00:00:00.961

xxxxx

00:00:01.166 --> 00:00:02.586

xxxxx

Normally it should be like this:
WEBVTT

00:00:00.086 --> 00:00:00.961
xxxxx

00:00:01.166 --> 00:00:02.586
xxxxx

And mine is this:
WEBVTT

00:00:00.086 --> 00:00:00.961

xxxxx

00:00:01.166 --> 00:00:02.586

xxxxx

@rany2
Copy link
Owner

rany2 commented Jul 3, 2024

I have an internal version of edge-tts which has many subtitle fixes (especially noticeable Chinese) and uses pysrt for subtitle generation so this issue should be fixed, but I never had this issue in the first place so :/

@zdoek001
Copy link
Author

zdoek001 commented Jul 3, 2024

I have an internal version of edge-tts which has many subtitle fixes (especially noticeable Chinese) and uses pysrt for subtitle generation so this issue should be fixed, but I never had this issue in the first place so :/

New version coming soon? Expect to generate str directly

@rany2
Copy link
Owner

rany2 commented Jul 3, 2024

If you're keen you could test it out I pushed my wip branch, https://github.com/rany2/edge-tts/tree/wip-subtitles

@rany2
Copy link
Owner

rany2 commented Jul 3, 2024

It needs to be simplified a bit more before it's ready, right now it's more of a bodge and a concept. There are some issues so it's not ready to be in master yet because the TTS service would rewrite the input text and then return in word boundary.

For example, if you asked TTS to generate text for "1k.m." it will be rewritten internally by the service as "1 kilometer" and the mapping will fail; I've attempted to fix such issues but it's still a WIP.

@GerFr
Copy link

GerFr commented Jul 19, 2024

Using newline="\n" in with open(...) as file: fixed the issue on my windows device. It seems to be a Linux/windows problem.
https://stackoverflow.com/questions/9184107/how-can-i-force-pythons-file-write-to-use-the-same-newline-format-in-windows

line 31 in async subtitle example should be adjusted

@FaintWhisper
Copy link

WEBVTT
00:00:00.086 --> 00:00:00.961
xxxxx
00:00:01.166 --> 00:00:02.586
xxxxx

Normally it should be like this: WEBVTT

00:00:00.086 --> 00:00:00.961 xxxxx

00:00:01.166 --> 00:00:02.586 xxxxx

And mine is this: WEBVTT

00:00:00.086 --> 00:00:00.961

xxxxx

00:00:01.166 --> 00:00:02.586

xxxxx

You can delete these lines after the subs have been written to the VTT file using the example streaming_with_subtitles.py by adding this code:

with open(WEBVTT_FILE, "w", encoding="utf-8") as file:
    file.write(submaker.generate_subs())

# Delete new lines in VTT file below cue
with open(WEBVTT_FILE, "r", encoding="utf-8") as file:
    lines = file.readlines()
with open(WEBVTT_FILE, "w", encoding="utf-8") as file:
    for line in lines:
        if "-->" in line:
            file.write(line.strip() + " ")
        else:
            file.write(line)

This allows to play the audio together with the VTT file in players such as mpv and MPC-HC, otherwise the subs will not be displayed as they are considered invalid due to an incorrect format.

Ideally, this should also be fixed in the CLI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants