-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable low_freq high_freq, dithering #664
Configurable low_freq high_freq, dithering #664
Conversation
Thanks! |
Would you like to add it? |
yes, PR is here: csukuangfj/kaldi-native-fbank#40 |
is there a way to install i have seen the cmake download of |
Yes.
You only need to change the URL to point to a specific commit. In your specific case, your repo is Note that the commit id is
with
By the way, you need to change the HASH
You can leave the hash empty by using
|
364ec69
to
4f04fb8
Compare
okay, i'll configure it with the updated |
Otherwise, the 80-3600Hz FBANK ASR appeared to be quite sensitive to the dithering constant:
There was a problem with "too many deletions", which dithering mitigated. But still, there seems to be another problem too...
But this should not block this PR from being merged. K. |
bf88d1b
to
678bb71
Compare
with the update of kaldi-native-fbank, there is a problem in the
But there are mutexes elsewhere in the sherpa-onnx, and the kaldi-native-fbank is compiled in C++14 standard. |
I see. kaldi-math.cc lacks of the header
You can change it locally in your build directory and see it if fixes the issue. |
I have fixed it in csukuangfj/kaldi-native-fbank#43 Please use kaldi-native-fbank v1.19.1 |
97429cd
to
5afda1e
Compare
…and python API - I am buliding ASR system for telephone speech, in this scenario it is good to limit the bandwidth to 80-3600Hz. - The default remains the 20-7600Hz.
- we use less features 80->56 - we pass `feature_dim` to `OnlineZipformer2TransducerModel::GetEncoderInitStates()` via new method `OnlineTransducerModel::SetFeatureDim(.)` - the value 19 corresponding to 80-dim features was previously hard-coded, in this PR it is imported from the `FeatureExtractorConfig`
- currently, dithering is not yet implemented in https://github.com/csukuangfj/kaldi-native-fbank - i can port it there from kaldi
5afda1e
to
a940e56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks good to me. Left some minor comments.
set(kaldi_native_fbank_URL "https://github.com/csukuangfj/kaldi-native-fbank/archive/refs/tags/v1.18.7.tar.gz") | ||
set(kaldi_native_fbank_URL2 "https://huggingface.co/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/kaldi-native-fbank-1.18.7.tar.gz") | ||
set(kaldi_native_fbank_HASH "SHA256=e78fd9d481d83d7d6d1be0012752e6531cb614e030558a3491e3c033cb8e0e4e") | ||
set(kaldi_native_fbank_URL "https://github.com/csukuangfj/kaldi-native-fbank/archive/refs/tags/v1.19.1.tar.gz") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also change lines 15-19 in this file.
@@ -101,6 +105,7 @@ class OnlineZipformer2TransducerModel : public OnlineTransducerModel { | |||
|
|||
int32_t context_size_ = 0; | |||
int32_t vocab_size_ = 0; | |||
int32_t feature_dim_ = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change the default value to 80?
Thank you! I am merging it. |
add support of non-default config of FBANK features
(i noticed that dithering is not yet in https://github.com/csukuangfj/kaldi-native-fbank)
(i think of porting it there from kaldi, just the constant will be 1/(2^15)-times smaller)