The goal of our proposed study is to construct a Speaker verification model that can recognise the speaker utilizing multiple preprocessing processes, with an emphasis on the deep-learning-based model and MFCCs characteristics collected from the speech waves. This paper includes voice activity detection, Mel Frequency Cepstral Coefficients for feature extraction, then saving the pre-processed data so that processing time will be saved while executing. Then feeding this data to the 3-layer CNN model, which also consists of batch normalization and max-pooling. Max-pooling is applied to downsample the output of the convolutional layer by a factor of two. The activation function used is a rectified linear unit(ReLu). L2 Regularization is used for the overfitting issue. Tuning the hyperparameter-like learning rate, and epochs were very challenging as minute deflection take us up and down. The whole workflow is the first thing that loads the dataset and then splits it into a train, validation and test data splits then implements the CNN model. Then train the model, evaluate the model on the test splits, and finally, save the model so that it can be used afterward.
-
Notifications
You must be signed in to change notification settings - Fork 0
The goal of our proposed study is to construct a Speaker verification model that can recognise the speaker utilizing multiple preprocessing processes, with an emphasis on the deep-learning-based model and MFCCs characteristics collected from the speech waves. This paper includes voice activity detection, Mel Frequency Cepstral Coefficients for f…
License
iec2018076/Preprocessing-Techniques-for-Speaker-Recognition
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
The goal of our proposed study is to construct a Speaker verification model that can recognise the speaker utilizing multiple preprocessing processes, with an emphasis on the deep-learning-based model and MFCCs characteristics collected from the speech waves. This paper includes voice activity detection, Mel Frequency Cepstral Coefficients for f…
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published