model output asr often lost fragment text #1214

RichardQin1 · 2024-12-24T09:08:54Z

After running the model for ASR recognition, some content is often missing
audio link：(https://share-github.tos-cn-beijing.volces.com/test.mp3)

import whisperx
from faster_whisper import WhisperModel

mp3_audio = whisperx.load_audio('test.mp3')
prompt = ' 新闻今日谈 林秀芹 李炜 时事评论员 '
language = 'zh'
asr_model = WhisperModel("large-v2", device='cuda', compute_type='float16')
segments, info = asr_model.transcribe(mp3_audio,
                                              beam_size=5,
                                              vad_filter=True,
                                              language=language,
                                              initial_prompt=prompt,
                                              hotwords=prompt,
                                              )
tmp_segments = []
for segment in segments:
    simplified_text = segment.text
    if hasattr(segment, 'words') and segment.words:
        tmp_segments.append(
            {"start": add_time + segment.start, "end": add_time + segment.end,
             "text": simplified_text, "words": segment.words})
    else:
        tmp_segments.append(
            {"start": add_time + segment.start, "end": add_time + segment.end,
             "text": simplified_text})  # , "words": segment.words
asr_result = {'segments': tmp_segments, 'language': language}

current output：

{
'language': 'zh',
 'segments': [
    {'end': 21.89, 'start': 17.49, 'text': '我是林秀芹 首先联合话题关注的是中德关系的新的进展'},   ......    
    {'end': 755.53, 'start': 748.93, 'text': '当然 谢谢李伟先生带来的分析 我们先休息下来 但关注的是世界经济论坛非洲峰会的相关话题 稍后再见'}, 
    {'end': 787.29, 'start': 781.09, 'text': '谈非洲峰会呢 六号在南非闭幕 这一次的非洲峰会可以说是吸引全世界一个关注目光'},
 ... ... ]}

correct output:

{
'language': 'zh',
 'segments': [
    {'end': 17.49, 'start': 14.8, 'text': '大家好 欢迎收看今天的 新闻今日谈'},   # lost content
    {'end': 21.89, 'start': 17.49, 'text': '我是林秀芹 首先联合话题关注的是中德关系的新的进展'},   ......    
    {'end': 755.53, 'start': 748.93, 'text': '当然 谢谢李伟先生带来的分析 我们先休息下来 但关注的是世界经济论坛非洲峰会的相关话题 稍后再见'}, 
    {'end': 781, 'start': 778, 'text': '欢迎回来 世界经济论坛'}, # lost content
    {'end': 787.29, 'start': 781.09, 'text': '非洲峰会呢 六号在南非闭幕 这一次的非洲峰会可以说是吸引全世界一个关注目光'},
 ... ... ]}

env:

faster-whisper               1.1.0

How to adjust parameters or modify code to ensure normal output
help plz.

The text was updated successfully, but these errors were encountered:

Purfview · 2024-12-24T15:59:38Z

Check if VAD didn't cut off those missing segments.

RichardQin1 · 2024-12-25T03:04:58Z

Check if VAD didn't cut off those missing segments.

how to check vad? sorry,im beginner

RichardQin1 · 2024-12-25T06:38:46Z

Check if VAD didn't cut off those missing segments.

Does this prove that the time lost by audio was discarded by VAD？
How should I optimize

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model output asr often lost fragment text #1214

model output asr often lost fragment text #1214

RichardQin1 commented Dec 24, 2024 •

edited

Loading

Purfview commented Dec 24, 2024

RichardQin1 commented Dec 25, 2024

RichardQin1 commented Dec 25, 2024

model output asr often lost fragment text #1214

model output asr often lost fragment text #1214

Comments

RichardQin1 commented Dec 24, 2024 • edited Loading

Purfview commented Dec 24, 2024

RichardQin1 commented Dec 25, 2024

RichardQin1 commented Dec 25, 2024

RichardQin1 commented Dec 24, 2024 •

edited

Loading