-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speaker diarization with ASR ouputs #3708
Comments
Thanks for the suggestion. The purpose of Is there any reason you would suggest the format to be in json for sentence level assignments than in txt format? if you feel the need, I would suggest you to send a PR to add sentence level transcriptions to same |
Hi! Thanks for the reply. I am interested in this functionality because of a practical application. Like I mentioned I am developing an API for speaker diarization and the return result of the API is a JSON file. One of the use cases we have for it is changing the color of automatically generated subtitles depending on who is speaking. Here we need number of speakers (provided in the JSON), transcript (provided in the JSON) and sentence level diarization (provided in the .txt file). So we need to prepare a new JSON on the API side that includes everything. Since our use case is not uncommon for speaker diarization I was wondering whether it would make sense to do this NeMo side and save time of other NeMo users when they need this as well. |
Yes, please feel free to send a PR to add sentence level transcriptions to same
|
Hi! The PR is at #3791. Let me know if it needs any changes. Cheers, Jure |
looks like the relevant PR was merged #3897 |
Is your feature request related to a problem? Please describe.
I am developing an API for a speaker diarization task with ASR (
/examples/speaker_tasks/diarization/offline_diarization_with_asr.py
). For my use case the script generates two useful outputs a.json
that looks something like this:So we have the whole transcription which is very useful along with speaker labels for each of the spoken words. For API purposes this JSON is very handy however its contents are not very useful for practical applications. For diarization purposes and practical applications the script's
.txt
output is much more convenient:Describe the solution you'd like
Would it be possible to add the information from the
.txt
output to the JSON? E.g., something like:Describe alternatives you've considered
I took a look at the code and I believe I would could code this myself, would you be interested in a pull request that modifies the output JSON?
The text was updated successfully, but these errors were encountered: