
Neural Network Acoustic Model with Decision Tree Clustered Triphones
This article tries to compare the performance of neural network and Gaussian mixture acoustic models (GMMs). We argue that using a multi layer perceptron as an emission probability estimator in hidden Markov model based automatic speech recognition can lead to better results than when the more traditional Gaussian mixtures are applied. We present a solution on how to model triphone phonetic units with neural networks and we show that this also leads to better performance in comparison with GMMs. The superior performance of the neural networks comes at a cost of extremely long training times.
Keywords: acoustic model, Gaussian mixture model, language model, neural networks, speech recognition
Year: 2008

Authors of this publication:

Pavel Král
Phone: +420 377 632 454
E-mail: pkral@kiv,zcu.cz
WWW: http://home.zcu.cz/~pkral/