Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can this model be used for Speaker Recognition ? #6

Open
nishanksinglasjsu opened this issue Nov 10, 2016 · 4 comments
Open

Can this model be used for Speaker Recognition ? #6

nishanksinglasjsu opened this issue Nov 10, 2016 · 4 comments

Comments

@nishanksinglasjsu
Copy link

Hi,
I am working on Speaker Recognition. Is it possible to use this model for Speaker Recognition ?
If yes can you please guide me a little. And If not can you refer me some Deep Learning models which I can use for it.

@HulkSun
Copy link

HulkSun commented Nov 10, 2016

sure it can,but you must enlarge your training set so you can get more accurate results.

@nishanksinglasjsu
Copy link
Author

Thanks HulkSun for the reply.
I am happy to know that this model can be used for speaker recognition. Though I am not sure how to use it.
Can you please explain me a little about How can I use this model for speaker recognition. What would be data set ?

@HulkSun
Copy link

HulkSun commented Nov 11, 2016

hi,nishanksinglasjsu
you can read the paper that explained how the model works and how to train it.

@nishanksinglasjsu
Copy link
Author

Hi HulkSun,
Thank you for the paper. I will definitely go through this.
I am a beginner in Deep Learning especially in speech recognition model. I know CNN very well but not RNN.
Major problem I am facing is in understanding the dataset. I understand that the input(X) is spectrogram of an audio wav file but what is output(y) data in speech recognition.
According to my readings of research papers, for text-dependent speaker recognition I can use a CNN model in which the input(X) will be the spectrogram image of an audio file and output(y) can be a vector of 1's and 0's with index of 1 represents a unique speaker or user just like MNIST data set.

Can you please tell me if this implementation for speaker recognition is right ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants