Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified conformer with multi datasets #312

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
3373092
Copy files for editing.
csukuangfj Apr 12, 2022
bbf074a
Use librispeech + gigaspeech with modified conformer.
csukuangfj Apr 12, 2022
0cc13bc
Support specifying number of workers for on-the-fly feature extraction.
csukuangfj Apr 14, 2022
4e05213
Feature extraction code for GigaSpeech.
csukuangfj Apr 16, 2022
f0330f9
Combine XL splits lazily during training.
csukuangfj Apr 17, 2022
5c7c991
Fix warnings in decoding.
csukuangfj Apr 17, 2022
0c8310e
Merge remote-tracking branch 'dan/master' into modified-conformer-wit…
csukuangfj Apr 18, 2022
e32641d
Add decoding code for GigaSpeech.
csukuangfj Apr 18, 2022
a31207f
Fix decoding the gigaspeech dataset.
csukuangfj Apr 18, 2022
65fd981
Disable speed perturbe for XL subset.
csukuangfj Apr 20, 2022
e9f0975
Merge remote-tracking branch 'origin/modified-conformer-with-multi-da…
csukuangfj Apr 20, 2022
b1c3705
Compute the Nbest oracle WER for RNN-T decoding.
csukuangfj Apr 24, 2022
b54d9a2
Minor fixes.
csukuangfj Apr 24, 2022
af20922
Minor fixes.
csukuangfj Apr 27, 2022
187534d
Merge branch 'modified-conformer-with-multi-datasets' of github.com:c…
csukuangfj Apr 27, 2022
fc7574f
Add results.
csukuangfj Apr 29, 2022
9721a42
Update results.
csukuangfj Apr 29, 2022
5bbce70
Merge remote-tracking branch 'dan/master' into modified-conformer-wit…
csukuangfj Apr 29, 2022
a227bd7
Update CI.
csukuangfj Apr 29, 2022
8d2797d
Update results.
csukuangfj Apr 29, 2022
fb61e31
Fix style issues.
csukuangfj Apr 29, 2022
c7000b9
Update results.
csukuangfj Apr 29, 2022
00fd664
Fix style issues.
csukuangfj Apr 29, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ per-file-ignores =
egs/tedlium3/ASR/*/conformer.py: E501,
egs/gigaspeech/ASR/*/conformer.py: E501,
egs/librispeech/ASR/pruned_transducer_stateless2/*.py: E501,
egs/librispeech/ASR/*/optim.py: E501,
egs/librispeech/ASR/*/scaling.py: E501,

# invalid escape sequence (cause by tex formular), W605
icefall/utils.py: E501, W605
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#!/usr/bin/env bash

log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}

cd egs/librispeech/ASR

repo_url=https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless2-2022-04-29

log "Downloading pre-trained model from $repo_url"
git lfs install
git clone $repo_url
repo=$(basename $repo_url)

log "Display test files"
tree $repo/
soxi $repo/test_wavs/*.wav
ls -lh $repo/test_wavs/*.wav

pushd $repo/exp
ln -s pretrained-epoch-38-avg-10.pt pretrained.pt
popd

for sym in 1 2 3; do
log "Greedy search with --max-sym-per-frame $sym"

./pruned_transducer_stateless2/pretrained.py \
--method greedy_search \
--max-sym-per-frame $sym \
--checkpoint $repo/exp/pretrained.pt \
--bpe-model $repo/data/lang_bpe_500/bpe.model \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
done

for method in modified_beam_search beam_search fast_beam_search; do
log "$method"

./pruned_transducer_stateless2/pretrained.py \
--method $method \
--beam-size 4 \
--checkpoint $repo/exp/pretrained.pt \
--bpe-model $repo/data/lang_bpe_500/bpe.model \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
done
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#!/usr/bin/env bash

log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}

cd egs/librispeech/ASR

repo_url=https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-04-29

log "Downloading pre-trained model from $repo_url"
git lfs install
git clone $repo_url
repo=$(basename $repo_url)

log "Display test files"
tree $repo/
soxi $repo/test_wavs/*.wav
ls -lh $repo/test_wavs/*.wav

pushd $repo/exp
ln -s pretrained-epoch-25-avg-6.pt pretrained.pt
popd

for sym in 1 2 3; do
log "Greedy search with --max-sym-per-frame $sym"

./pruned_transducer_stateless3/pretrained.py \
--method greedy_search \
--max-sym-per-frame $sym \
--checkpoint $repo/exp/pretrained.pt \
--bpe-model $repo/data/lang_bpe_500/bpe.model \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
done

for method in modified_beam_search beam_search fast_beam_search; do
log "$method"

./pruned_transducer_stateless3/pretrained.py \
--method $method \
--beam-size 4 \
--checkpoint $repo/exp/pretrained.pt \
--bpe-model $repo/data/lang_bpe_500/bpe.model \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
done
85 changes: 85 additions & 0 deletions .github/workflows/run-librispeech-2022-04-29.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Copyright 2021 Fangjun Kuang (csukuangfj@gmail.com)

# See ../../LICENSE for clarification regarding multiple authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: run-librispeech-2022-04-29
# stateless pruned transducer (reworked model) + giga speech

on:
push:
branches:
- master
pull_request:
types: [labeled]

jobs:
run_librispeech_2022_04_29:
if: github.event.label.name == 'ready' || github.event_name == 'push'
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-18.04]
python-version: [3.7, 3.8, 3.9]

fail-fast: false

steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0

- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: '**/requirements-ci.txt'

- name: Install Python dependencies
run: |
grep -v '^#' ./requirements-ci.txt | xargs -n 1 -L 1 pip install

- name: Cache kaldifeat
id: my-cache
uses: actions/cache@v2
with:
path: |
~/tmp/kaldifeat
key: cache-tmp-${{ matrix.python-version }}

- name: Install kaldifeat
if: steps.my-cache.outputs.cache-hit != 'true'
shell: bash
run: |
mkdir -p ~/tmp
cd ~/tmp
git clone https://github.com/csukuangfj/kaldifeat
cd kaldifeat
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j2 _kaldifeat

- name: Inference with pre-trained model
shell: bash
run: |
sudo apt-get -qq install git-lfs tree sox
export PYTHONPATH=$PWD:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/kaldifeat/python:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/build/lib:$PYTHONPATH

.github/scripts/run-librispeech-pruned-transducer-stateless2-2022-04-29.sh

.github/scripts/run-librispeech-pruned-transducer-stateless3-2022-04-29.sh
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ We do provide a Colab notebook for this recipe.

### LibriSpeech

Please see <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/README.md>
for the **latest** results.

We provide 4 models for this recipe:

- [conformer CTC model][LibriSpeech_conformer_ctc]
Expand Down Expand Up @@ -92,6 +95,20 @@ in the decoding.

We provide a Colab notebook to run a pre-trained transducer conformer + stateless decoder model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1CO1bXJ-2khDckZIW8zjOPHGSKLHpTDlp?usp=sharing)


#### k2 pruned RNN-T

| | test-clean | test-other |
|-----|------------|------------|
| WER | 2.57 | 5.95 |

#### k2 pruned RNN-T + GigaSpeech

| | test-clean | test-other |
|-----|------------|------------|
| WER | 2.19 | 4.97 |


### Aishell

We provide two models for this recipe: [conformer CTC model][Aishell_conformer_ctc]
Expand Down
1 change: 1 addition & 0 deletions egs/librispeech/ASR/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
log-*
20 changes: 8 additions & 12 deletions egs/librispeech/ASR/README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,19 @@

# Introduction

Please refer to <https://icefall.readthedocs.io/en/latest/recipes/librispeech/index.html>
for how to run models in this recipe.

# Introduction Please refer to <https://icefall.readthedocs.io/en/latest/recipes/librispeech/index.html> for how to run models in this recipe.
# Transducers

There are various folders containing the name `transducer` in this folder.
The following table lists the differences among them.

| | Encoder | Decoder | Comment |
|---------------------------------------|---------------------|--------------------|-------------------------------------------------------|
| `transducer` | Conformer | LSTM | |
|---------------------------------------|---------------------|--------------------|---------------------------------------------------|
| `transducer` | Conformer | LSTM | |
| `transducer_stateless` | Conformer | Embedding + Conv1d | Using optimized_transducer from computing RNN-T loss |
| `transducer_stateless2` | Conformer | Embedding + Conv1d | Using torchaudio for computing RNN-T loss |
| `transducer_lstm` | LSTM | LSTM | |
| `transducer_stateless_multi_datasets` | Conformer | Embedding + Conv1d | Using data from GigaSpeech as extra training data |
| `pruned_transducer_stateless` | Conformer | Embedding + Conv1d | Using k2 pruned RNN-T loss |
| `pruned_transducer_stateless2` | Conformer(modified) | Embedding + Conv1d | Using k2 pruned RNN-T loss |
| `transducer_lstm` | LSTM | LSTM | |
| `transducer_stateless_multi_datasets` | Conformer | Embedding + Conv1d | Using data from GigaSpeech as extra training data |
| `pruned_transducer_stateless` | Conformer | Embedding + Conv1d | Using k2 pruned RNN-T loss |
| `pruned_transducer_stateless2` | Conformer(modified) | Embedding + Conv1d | Using k2 pruned RNN-T loss |
| `pruned_transducer_stateless3` | Conformer(modified) | Embedding + Conv1d | Using k2 pruned RNN-T loss + using GigaSpeech as extra training data |


The decoder in `transducer_stateless` is modified from the paper
Expand Down
Loading