Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable low_freq high_freq, dithering #664

Merged
merged 4 commits into from
Mar 22, 2024

Conversation

KarelVesely84
Copy link
Contributor

add support of non-default config of FBANK features

  • i built telephone speech ASR system with 56-dim fbanks with low_freq=80, high_freq=3600
  • i have a signal with hard-zeros in it, so dithering would be valuable

(i noticed that dithering is not yet in https://github.com/csukuangfj/kaldi-native-fbank)
(i think of porting it there from kaldi, just the constant will be 1/(2^15)-times smaller)

@KarelVesely84 KarelVesely84 changed the title [WIP] Configurable low freq high freq [WIP] Configurable low_freq high_freq, dithering Mar 13, 2024
@csukuangfj
Copy link
Collaborator

Thanks!

@csukuangfj
Copy link
Collaborator

(i noticed that dithering is not yet in https://github.com/csukuangfj/kaldi-native-fbank

Would you like to add it?

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Mar 13, 2024

yes, PR is here: csukuangfj/kaldi-native-fbank#40

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Mar 13, 2024

is there a way to install kaldi-native-fbank for sherpa-onnx from a customized branch ?
(ideally existing installation in current conda env, or in a local build directory)

i have seen the cmake download of kaldi-native-fbank, .../tags/v1.18.7.tar.gz in cmake/kaldi-native-fbank.cmake,
but i dunno how it is wired inside the overall cmake build...

@csukuangfj
Copy link
Collaborator

is there a way to install kaldi-native-fbank for sherpa-onnx from a customized branch

Yes.

i have seen the cmake download of kaldi-native-fbank, .../tags/v1.18.7.tar.gz in cmake/kaldi-native-fbank.cmake,
but i dunno how it is wired inside the overall cmake build...

You only need to change the URL to point to a specific commit.

In your specific case, your repo is
https://github.com/KarelVesely84/kaldi-native-fbank
and the commit you want is
KarelVesely84/kaldi-native-fbank@926fa47

Note that the commit id is 926fa4787977bcd8157c7e7a0cfee5a4597a92e4, so you need to replace

set(kaldi_native_fbank_URL "https://github.com/csukuangfj/kaldi-native-fbank/archive/refs/tags/v1.18.7.tar.gz")

with

 set(kaldi_native_fbank_URL  "https://github.com/KarelVesely84/kaldi-native-fbank/archive/926fa4787977bcd8157c7e7a0cfee5a4597a92e4.zip") 

By the way, you need to change the HASH

set(kaldi_native_fbank_HASH "SHA256=e78fd9d481d83d7d6d1be0012752e6531cb614e030558a3491e3c033cb8e0e4e")

You can leave the hash empty by using

 set(kaldi_native_fbank_HASH) 

@KarelVesely84 KarelVesely84 force-pushed the configurable_low-freq_high-freq branch from 364ec69 to 4f04fb8 Compare March 14, 2024 15:13
@KarelVesely84
Copy link
Contributor Author

okay, i'll configure it with the updated kaldi-native-fbank, so the unit-tests are already with the newer version

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Mar 14, 2024

Otherwise, the 80-3600Hz FBANK ASR appeared to be quite sensitive to the dithering constant:

|   | System                          | Eval        | Comment                    |
|---|---------------------------------|-------------|----------------------------|
| D | small (24M), 80-3600Hz, no-aug  | 77.9        | feat-dim56, dither 0.0     |
|   |                                 | 76.9        | feat-dim56, dither 0.00003 |
|   |                                 | 74.0        | feat-dim56, dither 0.0001  |
|   |                                 | 67.2        | feat-dim56, dither 0.0003  |
|   |                                 | 63.3        | feat-dim56, dither 0.001   |
|   |                                 | 63.3        | feat-dim56, dither 0.003   |
|   |                                 | 70.0        | feat-dim56, dither 0.01    |

There was a problem with "too many deletions", which dithering mitigated.

But still, there seems to be another problem too...

  • Maybe missing empty segments in ASR training data.
  • Maybe audio channel mismatch.
  • Maybe other inconsistency of train/sherpa features...

But this should not block this PR from being merged.
The hight_freq, low_freq, non 80-dim features and dithering works...

K.

@KarelVesely84 KarelVesely84 force-pushed the configurable_low-freq_high-freq branch from bf88d1b to 678bb71 Compare March 18, 2024 09:57
@KarelVesely84
Copy link
Contributor Author

with the update of kaldi-native-fbank, there is a problem in the android build:

[  9%] Building CXX object _deps/kaldi_native_fbank-build/kaldi-native-fbank/csrc/CMakeFiles/kaldi-native-fbank-core.dir/online-feature.cc.o
/home/runner/work/sherpa-onnx/sherpa-onnx/build-android-arm64-v8a/_deps/kaldi_native_fbank-src/kaldi-native-fbank/csrc/kaldi-math.cc:19:26: error: no member named 'mutex' in namespace 'std'

But there are mutexes elsewhere in the sherpa-onnx, and the kaldi-native-fbank is compiled in C++14 standard.
Is this a big problem ? Do you have an idea how to fix it ?

@csukuangfj
Copy link
Collaborator

csukuangfj commented Mar 19, 2024

I see. kaldi-math.cc lacks of the header

#include <mutex>

You can change it locally in your build directory and see it if fixes the issue.

@csukuangfj
Copy link
Collaborator

with the update of kaldi-native-fbank, there is a problem in the android build:

[  9%] Building CXX object _deps/kaldi_native_fbank-build/kaldi-native-fbank/csrc/CMakeFiles/kaldi-native-fbank-core.dir/online-feature.cc.o
/home/runner/work/sherpa-onnx/sherpa-onnx/build-android-arm64-v8a/_deps/kaldi_native_fbank-src/kaldi-native-fbank/csrc/kaldi-math.cc:19:26: error: no member named 'mutex' in namespace 'std'

But there are mutexes elsewhere in the sherpa-onnx, and the kaldi-native-fbank is compiled in C++14 standard. Is this a big problem ? Do you have an idea how to fix it ?

I have fixed it in csukuangfj/kaldi-native-fbank#43

Please use kaldi-native-fbank v1.19.1

@KarelVesely84 KarelVesely84 force-pushed the configurable_low-freq_high-freq branch from 97429cd to 5afda1e Compare March 20, 2024 17:29
…and python API

- I am buliding ASR system for telephone speech, in this scenario it is
  good to limit the bandwidth to 80-3600Hz.
- The default remains the 20-7600Hz.
- we use less features 80->56
- we pass `feature_dim` to `OnlineZipformer2TransducerModel::GetEncoderInitStates()` via new method `OnlineTransducerModel::SetFeatureDim(.)`
- the value 19 corresponding to 80-dim features was previously hard-coded, in this PR it is imported from the `FeatureExtractorConfig`
- currently, dithering is not yet implemented in https://github.com/csukuangfj/kaldi-native-fbank
- i can port it there from kaldi
@KarelVesely84 KarelVesely84 force-pushed the configurable_low-freq_high-freq branch from 5afda1e to a940e56 Compare March 20, 2024 17:30
@KarelVesely84 KarelVesely84 changed the title [WIP] Configurable low_freq high_freq, dithering Configurable low_freq high_freq, dithering Mar 20, 2024
Copy link
Collaborator

@csukuangfj csukuangfj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good to me. Left some minor comments.

set(kaldi_native_fbank_URL "https://github.com/csukuangfj/kaldi-native-fbank/archive/refs/tags/v1.18.7.tar.gz")
set(kaldi_native_fbank_URL2 "https://huggingface.co/csukuangfj/sherpa-onnx-cmake-deps/resolve/main/kaldi-native-fbank-1.18.7.tar.gz")
set(kaldi_native_fbank_HASH "SHA256=e78fd9d481d83d7d6d1be0012752e6531cb614e030558a3491e3c033cb8e0e4e")
set(kaldi_native_fbank_URL "https://github.com/csukuangfj/kaldi-native-fbank/archive/refs/tags/v1.19.1.tar.gz")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also change lines 15-19 in this file.

@@ -101,6 +105,7 @@ class OnlineZipformer2TransducerModel : public OnlineTransducerModel {

int32_t context_size_ = 0;
int32_t vocab_size_ = 0;
int32_t feature_dim_ = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change the default value to 80?

@csukuangfj
Copy link
Collaborator

Thank you! I am merging it.

@csukuangfj csukuangfj merged commit eaec4c8 into k2-fsa:master Mar 22, 2024
218 of 227 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants