CDNN: a context dependent neural network for continuous speech recognition
A series of theoretical and experimental results have suggested that multilayer perceptrons (MLPs) are an effective family of algorithms for the smooth estimate of highly dimensioned probability density functions that are useful in continuous speech recognition. All of these systems have exclusively used context-independent phonetic models, in the sense that the probabilities or costs are estimated for simple speech units such as phonemes or words, rather than biphones or triphones. Numerous conventional systems based on hidden Markov models (HMMs) have been reported that use triphone or triphone like context-dependent models. In one case the outputs of many context-dependent MLPs (one per context class) were used to help choose the best sentence from the N best sentences as determined by a context-dependent HMM system. It is shown how, without any simplifying assumptions, one can estimate likelihoods for context-dependent phonetic models with nets that are not substantially larger than context-independent MLPs.