Deep Learning and Machine Learning Techniques Applied to Speaker Identification on Small Datasets

In this study, we explore the capabilities of speaker recognition technology for biometric authentication developing speaker recognition-based access control systems and serving as a resource for future research and improvements in secure and efficient speaker identification solutions. We focused on developing and evaluating machine learning and deep learning models for speaker identification. The models were trained and tested on private datasets with 32 speakers and public datasets with 1251 to 6112 speakers. The Gaussian Mixture Model performed well with our private datasets, with 93,10%, and 95% accuracy in correctly identifying the speakers. The Multilayer Perceptron achieved a peak accuracy of 93.33% on the Framed Trim private dataset. The VGGM model, after initial training on larger datasets, achieved an accuracy of 90.34% and 98.33% on our private datasets. At last, the model ResNet50 slightly outperformed the other models on two versions of our private dataset, achieving accuracies of 97.93% and 100%.

Deep Learning and Machine Learning Techniques Applied to Speaker Identification on Small Datasets Artigo de Conferência Capítulo de livro