Extracción de características de la voz utilizando el pitch para sistemas de verificación automática del locutor
Cargando...
Archivos
Fecha
Autores
Armas Toledo, Ariel de
Título de la revista
ISSN de la revista
Título del volumen
Editor
Universidad Central “Marta Abreu” de Las Villas
Resumen
Los Sistemas de Control de Acceso, la Autentificación de Clientes y otros sistemas biométricos constituyen una aplicación de los Sistemas de Verificación del Locutor. La mayoría de estas aplicaciones requieren de altos niveles de confiabilidad, por esta razón se han logrado introducir técnicas que permitan un buen funcionamiento de estos sistemas. Sin embargo se dedican grandes esfuerzos en la actualidad, orientados a lograr una mayor fiabilidad de estos sistemas, constituyendo un campo muy amplio de investigación. Dentro de los problemas aún sin resolver, se encuentra lograr obtener rasgos de la voz, que permitan discriminar mejor entre todos los locutores de un sistema. En este trabajo se implementan los métodos de extracción de características PHCC (Pitch Harmonics Cepstral Coefficients), y el PSMFCC (Pitch Synchronous Mel-Frequency Cepstral Coefficients) y se propone el método PSPHCC (Pitch Synchronous Harmonics Cepstral Coefficients). El funcionamiento de los tres métodos se verificó con un sistema basado en Modelos Ocultos de Markov. Con los métodos implementados se obtienen 96% de exactitud del sistema y un EER de 1.43 %, contra un 93% y un EER 2.18% por parte del algoritmo MFCC (Mel Frequency Cepstral Coefficients) tradicional. Como resultado final se desarrolló el software PHaSe-SAEC, que permite la extracción de las características del locutor por los métodos descritos en este trabajo y la obtención de los mismos en ficheros compatibles con el sistema HTK y el sistema Weka.
Access Control Systems, Client Authentication and other biometric systems are an application of the Speaker Verification. Most of these applications require high levels of reliability, for this reason they have managed to introduce techniques to smooth functioning of these systems. However devoted great efforts currently aimed at achieving greater reliability of these systems, providing a wide field of research. Among the problems still unsolved, is able to obtain the voice features that allow better discrimination between all speakers in a system. In this paper we implement the feature extraction methods PHCC (Pitch Harmonics Cepstral Coefficients) and PSMFCC (Pitch Synchronous Mel-Frequency Cepstral Coefficients) and the method proposed PSPHCC (Pitch Synchronous Harmonics Cepstral Coefficients). The operation of the three methods was verified with a system based on Hidden Markov Models. With the methods implemented are obtained 96% accuracy of the system and an EER of 1.43%, against 93% and 2.18% EER by MFCC (Mel Frequency Cepstral Coefficients) traditional algorithm. The end result is a software PHaSE-SAEC, which allows the extraction of characteristic of the speaker by the methods described in this work and getting them into files compatible with the HTK and the WEKA system.
Access Control Systems, Client Authentication and other biometric systems are an application of the Speaker Verification. Most of these applications require high levels of reliability, for this reason they have managed to introduce techniques to smooth functioning of these systems. However devoted great efforts currently aimed at achieving greater reliability of these systems, providing a wide field of research. Among the problems still unsolved, is able to obtain the voice features that allow better discrimination between all speakers in a system. In this paper we implement the feature extraction methods PHCC (Pitch Harmonics Cepstral Coefficients) and PSMFCC (Pitch Synchronous Mel-Frequency Cepstral Coefficients) and the method proposed PSPHCC (Pitch Synchronous Harmonics Cepstral Coefficients). The operation of the three methods was verified with a system based on Hidden Markov Models. With the methods implemented are obtained 96% accuracy of the system and an EER of 1.43%, against 93% and 2.18% EER by MFCC (Mel Frequency Cepstral Coefficients) traditional algorithm. The end result is a software PHaSE-SAEC, which allows the extraction of characteristic of the speaker by the methods described in this work and getting them into files compatible with the HTK and the WEKA system.