Abstract
Recent developments in deep learning methods have greatly influenced the performances of speech recognition systems. In a Hidden Markov model-Deep neural network (HMM-DNN) based speech recognition system, DNNs have been employed to model senones (context dependent states of HMM),where HMMs capture the temporal relations among senones. Due to the use of more deeper networks significant improvement in the performances has been observed and developing deep learning methods to train more deeper architectures has gained a lot of scientific interest. Optimizing a deeper network is more complex ask than to optimize a less deeper network, but recently residual network have exhibited a capability to train a very deep neural network architectures and are not prone to vanishing/exploding gradient problems. In this work, the effectiveness of residual networks have been explored for of speech recognition. Along with the depth of the residual network, the criticality of width of the residual network has also been studied. It has been observe dthat at higher depth, width of the networks is also a crucial parameter for attaining significant improvements. A 14-hoursubset of WSJ corpus is used for training the speech recognition systems, it has been observed that the residual networks have shown much ease in convergence even with a depth much higher than the deep neural network. In this work, using residual networks an absolute reduction of 0.4 in WER error rates (8%reduction in the relative error) is attained compared to the best performing deep neural network.