-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Frame-VAD to ASR+VAD pipeline #6464
Conversation
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some grammars and typos to be fixed.
@@ -40,13 +40,15 @@ | |||
|
|||
To enable profiling, set `profiling=True`, but this will significantly slow down the program. | |||
|
|||
To use or disable feature masking, set `use_rttm` to `True` or `False`. | |||
To use or disable feature masking/droping based on RTTM files, set `use_rttm` to `True` or `False`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
droping->dropping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
window_length_in_sec (float): Window length in seconds. | ||
shift_length_in_sec (float): Shift length in seconds. | ||
is_regression_task (bool): if True, the labels are treated as regression task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as regression task -> as a regression task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
labels (Optional[list]): List of unique labels collected from all samples. | ||
augmentor (Optional): feature augmentation | ||
delimiter (str): delimiter to split the labels. | ||
is_regression_task (bool): if True, the labels are treated as regression task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as regression task -> as a regression task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
# Output settings, no need to change | ||
output_dir: Optional[str] = None # will be automatically set by the program | ||
output_filename: Optional[str] = None # will be automatically set by the program | ||
pred_name_postfix: Optional[str] = None # If you need to use another model name, rather than standard one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLM models are suggesting:
rather than standard one -> other
than the
standard one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Hello, Will the frame level VAD work with diarization too? I tried using the vad_multilingual_frame_marblenet.nemo model instead of the normal vad_multilingual_marblenet.nemo model, and got the following error:
Thank you in advance! |
Hi @gabitza-tech , yes it works with diarization, but needs some modifications in the inference pipeline. The error you're seeing is because Frame-VAD uses a different model calss, you can try Different from the segment-VAD that needs to first splice the input audios into many small 0.63s segments then outputs one label per segment, the Frame-VAD takes the whole audio as input without segment splicing and outputs one label per 20ms frame. Please let us know if you need any help~! |
Thank you a lot for your response @stevehuang52 ! I would have a couple more question:
Thank you in advance and big kudos for the work! It is very helpful! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had a quick look. will discuss offline regarding reducing redundant
Hi @gabitza-tech,
Yes you'll need to tune those parameters. We've roughly tuned the parameters on DIHARD3-dev, the following values generally work well on our cases, but you might need to further tune them:
We haven't run the model on very long audios. Given that the Frame-VAD uses 1/8 of the memory as Segment-VAD during inference, so Frame-VAD should work with audios much longer than Segment-VAD.
Yes please leave them as their default values. |
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks. please remember to add doc/tutorial regarding data preparation and train/finetune later
What does this PR do ?
This is the third PR for Frame-VAD. Please merge the previous two before this: #6441, #6463
This PR adds Frame-VAD to ASR+VAD pipeline, and also adds the drop-frame mode to ASR+VAD, which previously only supports masking mode.
Collection: [ASR]
Before your PR is "Ready for review"
Pre checks:
PR Type: