Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

为何自己的音频总是报sample fre error #3

Open
errolyan opened this issue Nov 22, 2018 · 7 comments
Open

为何自己的音频总是报sample fre error #3

errolyan opened this issue Nov 22, 2018 · 7 comments

Comments

@errolyan
Copy link

命令$ ./bin/speech-aligner --acoustic-scale=0.01 --careful=true --sample-frequency=48000 --config=egs/cn_phn/conf/align.conf egs/cn_phn/data1/wav.scp egs/cn_phn/data1/text egs/cn_phn/data/out1.ali
报错
./bin/speech-aligner --acoustic-scale=0.01 --careful=true --sample-frequency=48000 --config=egs/cn_phn/conf/align.conf egs/cn_phn/data1/wav.scp egs/cn_phn/data1/text egs/cn_phn/data/out1.ali
LOG (speech-aligner[5.4.2154-f2b7]:main():bin/speech-aligner.cc:351) zhuni
ERROR (speech-aligner[5.4.215
4-f2b7]:main():bin/speech-aligner.cc:425) Sample frequency mismatch: you specified 16000 but data has 48000 (use --sample-frequency option). Utterance is zhuni

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
main
__libc_start_main
_start

@megazone87
Copy link
Member

欢迎使用该项目!

该问题是因为输入的音频采样率(48k)和可以接受的(16k)不匹配,所以在我增加变采样功能前,需要你在程序外先自行变采样,比如这样:

在wav.scp中,将每一行的:
wav_name wav_path.wav
变成
wav_name sox wav_path.wav -t wav - rate -I 16k |

@errolyan
Copy link
Author

欢迎使用该项目!

该问题是因为输入的音频采样率(48k)和可以接受的(16k)不匹配,所以在我增加变采样功能前,需要你在程序外先自行变采样,比如这样:

在wav.scp中,将每一行的:
wav_name wav_path.wav
变成
wav_name sox wav_path.wav -t wav - rate -I 16k |
/cn_phn/data2/out.ali
./bin/speech-aligner --config=egs/cn_phn/conf/align.conf egs/cn_phn/data2/wav.scp egs/cn_phn/data2/text egs/cn_phn/data2/out.ali
LOG (speech-aligner[5.4.2154-f2b7]:main():bin/speech-aligner.cc:351) nitech_jp_song070_f001_097
WARNING (speech-aligner[5.4.215
4-f2b7]:AlignOneUtteranceWrapper():decoder/decoder-wrappers.cc:601) Did not successfully decode file nitech_jp_song070_f001_097, len = 5938
LOG (speech-aligner[5.4.2154-f2b7]:main():bin/speech-aligner.cc:351) nitech_jp_song070_f001_006
WARNING (speech-aligner[5.4.215
4-f2b7]:AlignOneUtteranceWrapper():decoder/decoder-wrappers.cc:601) Did not successfully decode file nitech_jp_song070_f001_006, len = 12658
LOG (speech-aligner[5.4.215~4-f2b7]:main():bin/speech-aligner.cc:558) Done 0 out of 2 utterances.

@errolyan
Copy link
Author

每一行变成了wav_name sox wav_path.wav -t wav - rate -I 16k | ,有提示“Did not successfully decode file nitech_jp_song070_f001_006, len = 12658”这个错误

@errolyan
Copy link
Author

image

@megazone87
Copy link
Member

我认为,这是因为输入的语音和文本不能做到对齐,原因可能是输入语音并不是目前模型适用的领域,模型由配置文件读取的,目前仅有的配置文件只适用中文、噪声不大场景。

@megazone87
Copy link
Member

我对日语对齐也感兴趣,可以提供一个日语模型出来,方便把你的语音发我么?

@HaiYandada
Copy link

我认为,这是因为输入的语音和文本不能做到对齐,原因可能是输入语音并不是目前模型适用的领域,模型由配置文件读取的,目前仅有的配置文件只适用中文、噪声不大场景。

我遇到了同样的问题,但是我提供的语音是很正常的中文,环境安静的情况。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants