ARTIFICIAL NEURAL NETWORK-BASED VOICEPRINT GENERATION MODELS FOR SPEAKER RECOGNITION

Authors

  • B. Sombo Department of Computer Engineering, Faculty of Engineering, University of Benin, P.M.B. 1154, Ugbowo, Benin City, Edo State, Nigeria.
  • S. T. Apeh Department of Computer Engineering, Faculty of Engineering, University of Benin, P.M.B. 1154, Ugbowo, Benin City, Edo State, Nigeria.
  • I. A. Edeoghon Department of Computer Engineering, Faculty of Engineering, University of Benin, P.M.B. 1154, Ugbowo, Benin City, Edo State, Nigeria.

DOI:

https://doi.org/10.54554/jet.2025.16.2.013

Keywords:

Artificial Neural Networks, Cosine Similarity, Data Features Extraction, Machine Learning, Speaker Recognition

Abstract


Speaker recognition systems often do not prioritize generating high-quality voiceprints with minimal processing time, which can help reduce new user enrollment time while maintaining accuracy. Therefore, this study addressed the need for a model that can efficiently generate high-quality voiceprints, thus having the potential to improve system performance and enrollment speed when deployed in speaker recognition systems. Voice features, including Mel Frequency Cepstral Coefficients (MFCC), Gammatone Frequency Cepstral Coefficients (GFCC), Linear Predictive Coding (LPC) coefficients, and Perceptual Linear Prediction (PLP) coefficients, were extracted from clean voice datasets collected from volunteers and the Mozilla Common Voice (MCV) database. Both Multi-Layer Perceptron (MLP) and Long Short-Term Memory (LSTM) networks were then trained on these features for voiceprint generation. Evaluation using cosine similarity of voiceprints revealed that the MLP model trained with MFCC achieved the highest separation score (0.850553), outperforming the other models and this high value demonstrates its strong potential to enhance the accuracy and new user’s enrollment time when deployed in speaker recognition systems.

Downloads

Download data is not yet available.

Downloads

Published

2025-12-30

How to Cite

SOMBO, B., Apeh, S. T. ., & Edeoghon, I. A. . (2025). ARTIFICIAL NEURAL NETWORK-BASED VOICEPRINT GENERATION MODELS FOR SPEAKER RECOGNITION. Journal of Engineering and Technology (JET), 16(2). https://doi.org/10.54554/jet.2025.16.2.013