Model building in neural networks with hidden Markov models
This thesis concerns the automatic generation of architectures for neural networks and other pattern recognition models comprising many elements of the same type. The requirement for such models, with automatically determined topology and connectivity, arises from two needs. The first is the need to develop commercial applications of the technology without resorting to laborious trial and error with different network sizes; the second is the need, in large and complex pattern processing applications such as speech recognition, to optimise the allocation of computing resources for problem solving. The state of the art in adaptive architectures is reviewed, and a mechanism is proposed for adding new processing elements to models. The scheme is developed in the context of multi-layer perceptron networks, and is linked to the best network-pruning mechanism available using a numerical criterion with construction required at one extreme and pruning at the other. The construction mechanism does not work in the multi-layer perceptron for which it was developed, owing to the long-range effects occurring in many applications of these networks. It works demonstrably well in density estimation models based on Gaussian mixtures, which are of the same family as the increasingly popular radial basis function networks. The construction mechanism is applied to the initialization of the density estimators embedded in the states of a hidden Markov model for speaker-independent speech recognition, where it offers a considerable increase in recogniser performance, provided cross-validation is used to prevent over-training.