Suprasegmental Duration Modelling with Elastic Constraints in Automatic Speech Recognition
In this paper a method of integrating a model of suprasegmental duration with a HMM-based recogniser at the post-processing level is presented. The N-Best utterance output is rescored using a suitable linear combination of acoustic log-likelihood (provided by a set of tied-state triphone HMMs) and duration log-likelihood (provided by a set of durational models). The durational model used in the post-processing imposes syllable-level elastic constraints on the durational behaviour of speech segments. Results are presented for word accuracy on the Resource Management database after rescoring, using two different syllable-like constraint units, a fixed-size N-phone window and simple (no constraint) phone duration probability scoring.