A novel word boundary detector based on the teager energy operator for automatic speech recognition

AUTOR(ES)
DATA DE PUBLICAÇÃO

2010

RESUMO

This work is part of a major research project and contributes into the development of a speaker-independent speech recognition system for isolated words from a limited vocabulary. It proposes a novel spoken word boundary detection method named TEO-based method for Spoken Word Segmentation (TSWS). Based on the Teager Energy Operator (TEO), the TSWS is presented and compared with two widely used speech segmentation methods: Classical, that uses energy and zero-crossing rate computations, and Bottom-up, based on the concepts of adaptive level equalization, energy pulse detection and endpoint ordering. The TSWS shows a great precision improvement on spoken word boundary detection when compared to Classical (67.8% of error reduction) and Bottom-up (61.2% of error reduction) methods. A complete isolated spoken word recognition system (ISWRS) is also presented. This ISWRS uses Mel-frequency Cepstral Coefficients (MFCC) as the parametric representation of the speech signal, and a standard multilayer feed-forward network (MLP) as the recognizer. Two sets of tests were conducted, one with a database of 50 different words with a total of 10,350 utterances, and another with a smaller vocabulary 17 words with a total of 3,519 utterances. Two in three of those utterances constituted the training set for the ISWRS, and one in three, the testing set. The tests were conducted for each of the TSWS, Classical or Bottom-up methods, used in the ISWRS speech segmentation stage. TSWS has enabled the ISWRS to achieve 99.0% of success on generalization tests, against 98.6% for Classical and Bottom-up methods. After, a white Gaussian noise was artificially added to ISWRS inputs to reach a signal-to-noise ratio of 15dB. The noise presence alters the ISWRS performances to 96.5%, 93.6%, and 91.4% on generalization tests when using TSWS, Classical and Bottom-up methods, respectively.

ASSUNTO(S)

segmentação da fala engenharia eletrica detecção de fronteiras de palavra falada teo independente de locutor palavras isoladas sistema de reconhecimento de voz mfcc mlp reconhecimento automático da voz redes neurais artificiais speech segmentation spoken word boundary detection speaker-independent isolated words speech recognition system mel-frequency cepstral coefficients artificial neural network

Documentos Relacionados