Skip to content

The goal of our proposed study is to construct a Speaker verification model that can recognise the speaker utilizing multiple preprocessing processes, with an emphasis on the deep-learning-based model and MFCCs characteristics collected from the speech waves. This paper includes voice activity detection, Mel Frequency Cepstral Coefficients for f…

License

Notifications You must be signed in to change notification settings

iec2018076/Preprocessing-Techniques-for-Speaker-Recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Preprocessing-Techniques-for-Speaker-Recognition

The goal of our proposed study is to construct a Speaker verification model that can recognise the speaker utilizing multiple preprocessing processes, with an emphasis on the deep-learning-based model and MFCCs characteristics collected from the speech waves. This paper includes voice activity detection, Mel Frequency Cepstral Coefficients for feature extraction, then saving the pre-processed data so that processing time will be saved while executing. Then feeding this data to the 3-layer CNN model, which also consists of batch normalization and max-pooling. Max-pooling is applied to downsample the output of the convolutional layer by a factor of two. The activation function used is a rectified linear unit(ReLu). L2 Regularization is used for the overfitting issue. Tuning the hyperparameter-like learning rate, and epochs were very challenging as minute deflection take us up and down. The whole workflow is the first thing that loads the dataset and then splits it into a train, validation and test data splits then implements the CNN model. Then train the model, evaluate the model on the test splits, and finally, save the model so that it can be used afterward.

About

The goal of our proposed study is to construct a Speaker verification model that can recognise the speaker utilizing multiple preprocessing processes, with an emphasis on the deep-learning-based model and MFCCs characteristics collected from the speech waves. This paper includes voice activity detection, Mel Frequency Cepstral Coefficients for f…

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published