Detection of Phonological Features in Continuous Speech using Neural Networks
We report work on the first component of a two stage speech recognition architecture based on phonological features rather than phones. The paper reports experimentson three phonological feature systems: 1) the Sound Pattern of English (SPE) system which uses binary features, 2) a multi valued (MV) feature system which uses traditional phonetic categories such as manner, place etc, and 3) Government Phonology (GP) which uses a set of structured primes. All experiments used recurrent neural networks to perform feature detection. In these networks the input layer is a standard framewise cepstral representation, and the output layer represents the values of the features. The system effectively produces a representation of the most likely phonological features for each input frame. All experiments were carried out on the TIMIT speaker independent database. The networks performed well in all cases, with the average accuracy for a single feature ranging from 86% and 93%. We describe these experiments in detail, and discuss the justification and potential advantages of using phonological features rather than phones for the basis of speech recognition.