Text-Mining Research Group

Neural Network Acoustic Model with Decision Tree Clustered Triphones

This article tries to compare the performance of neural network and Gaussian mixture acoustic models (GMMs). We argue that using a multi layer perceptron as an emission probability estimator in hidden Markov model based automatic speech recognition can lead to better results than when the more traditional Gaussian mixtures are applied. We present a solution on how to model triphone phonetic units with neural networks and we show that this also leads to better performance in comparison with GMMs. The superior performance of the neural networks comes at a cost of extremely long training times.

Keywords: acoustic model, Gaussian mixture model, language model, neural networks, speech recognition

Year: 2008

Download:

Full text [127 kB]

View record in Web of Science®

Authors of this publication:

Tomáš Pavelka

Pavel Král

Phone: +420 377 632 454
E-mail: pkral@kiv,zcu.cz
WWW: http://home.zcu.cz/~pkral/

Pavel is a lecturer/researcher at the Department of Computer Science and Engineering at the University of West Bohemia in Pilsen (Czech Republic). His research is focused on automatic speech processing, dialog act recognition, syntactic parsing, punctuation annotation and document classification.