Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli.transcribe 支持大文件wav吗 #2676

Open
deevarvar opened this issue Jan 22, 2025 · 2 comments
Open

cli.transcribe 支持大文件wav吗 #2676

deevarvar opened this issue Jan 22, 2025 · 2 comments

Comments

@deevarvar
Copy link

Describe the bug
File "/home/xcsong/workspace/wenet/wenet/transformer/embedding.py", line 100, in position_encoding
# pytorch/pytorch#69434
if isinstance(offset, int):
assert offset + size <= self.max_len
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
pos_emb = self.pe[:, offset:offset + size]
elif isinstance(offset, torch.Tensor) and offset.dim() == 0: # scalar
RuntimeError: AssertionError:

To Reproduce
wenet --device cuda --language chinese ./20minutes.wav

Expected behavior
期望可以有结果

60s的音频是可以的, 但是20分钟就assert了,
cli.transcribe有 stream模式吗

@wwfcnu
Copy link

wwfcnu commented Feb 5, 2025

Describe the bug File "/home/xcsong/workspace/wenet/wenet/transformer/embedding.py", line 100, in position_encoding # pytorch/pytorch#69434 if isinstance(offset, int): assert offset + size <= self.max_len ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE pos_emb = self.pe[:, offset:offset + size] elif isinstance(offset, torch.Tensor) and offset.dim() == 0: # scalar RuntimeError: AssertionError:

To Reproduce wenet --device cuda --language chinese ./20minutes.wav

Expected behavior 期望可以有结果

60s的音频是可以的, 但是20分钟就assert了, cli.transcribe有 stream模式吗

本来就是流式识别吧,只要内存够大,处理多长的音频应该都没问题

@Mddct
Copy link
Collaborator

Mddct commented Feb 8, 2025

需要借助vad之类的工具

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants