-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Pruned_transducer_stateless for WenetSpeech #274
[WIP] Pruned_transducer_stateless for WenetSpeech #274
Conversation
|
||
device = torch.device("cpu") | ||
if torch.cuda.is_available(): | ||
device = torch.device("cuda", 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please always use device 0
.
You can use CUDA_VISIBLE_DEVICES
to control which devices are available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
I will re-organize the codes based on this PR #288. |
I suggest to just copy-and-modify to a different directory, pruned_transducer_stateless2. If you want you can delete the original directory (once this is tested and working). |
OK,I will make a new directory to do this.
|
If the vocab-size is only 400+, I think you can use max-duration=300 or 350. |
There is a long time for waiting when I use greedy_search to decode with char. There are some recordings:
When using BucketingSampler for test_dataloader:
The loading date time refers to the time for loading each batch. We can see that loading data takes a long time. So how can I improve it? Can somebody give me some suggestions?@pzelasko |
You can always try increasing the number of dataloader workers.. if you're using on the fly features, consider precomputing them. If none of the above helps, is it possible that you have a very slow disk? You can make a copy of your test data to move to sequential I/O reads which will be much more faster at the cost of extra storage, there is a tutorial here: https://github.com/lhotse-speech/lhotse/blob/master/examples/02-webdataset-integration.ipynb (you will need to install a specific version of webdataset==0.1.103, I am intending to support these things natively without external dependencies in the future). |
Thanks. I will try your suggestions. And I will update the progress on real time. |
In my experiments, I set on_the_fly_feats as False. I try to change the num_workers to explore the influence to decoding speed. When using num-worker=1 for greedy_search decoding:
When using num-worker=2 for greedy_search decoding:
When using num-worker=16 for greedy_search decoding:
When using num-worker=32 for greedy_search decoding:
|
I am trying to use webdataset for this. But a bug happens to me. webdataset/webdataset#171. Hoping the members in the webdataset group can help to solve it. -_- |
For some reason they removed that class 3 weeks ago. You can pip install webdataset==0.1.103 |
At present, the webdataset seems to be able to play a useful role for reducing the time of loading data.
|
@luomingshuang where did you provide the num_workers arg, can you show a code snippet? |
test, | ||
batch_size=None, | ||
sampler=sampler, | ||
num_workers=self.args.num_workers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apply the args.num_workers for test_dataloader here.
@@ -361,10 +363,15 @@ def test_dataloaders(self, cuts: CutSet) -> DataLoader: | |||
sampler = DynamicBucketingSampler( | |||
cuts, max_duration=self.args.max_duration, shuffle=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You must set rank=0, world_size=1
for the sampler, otherwise you might be dropping some data. There should be a big warning about this in your run logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Em....I check my decoding logs. It has no warning. BTW, I will add these to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm that's possible if you used a single-GPU process to decode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I was mentioning this warning:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm that's possible if you used a single-GPU process to decode.
Yes.
This PR is for pruned_transducer_stateless on WenetSpeech. In this PR, I set three toke_types for modeling:
char
,pinyin
andlazy_pinyin
. I also run the codes with 100 hours wenetspeech data normally.