Edinburgh Research Archive logo

Edinburgh Research Archive

University of Edinburgh homecrest
View Item 
  •   ERA Home
  • Centre for Speech Technology Research
  • CSTR publications
  • View Item
  •   ERA Home
  • Centre for Speech Technology Research
  • CSTR publications
  • View Item
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.

Intonation and dialogue context as constraints for speech recognition

View/Open
Taylor_1998_b.pdf (154.8Kb)
Date
1998
Author
Taylor, Paul
King, Simon
Isard, Stephen
Wright, Helen
Metadata
Show full item record
Abstract
This paper describes a way of using intonation and dialogue context to improve the performance of an automatic speech recognition(ASR) system. Our experiments were run on the DCIEM Maptask corpus, a corpus of spontaneous task-oriented dialogue speech. This corpus has been tagged according to a dialogue analysis scheme that assigns each utterance to one of 12 “move types”, such as “acknowledge”, “query-yes/no” or “instruct”. Most asr systems use a bigram language model to constrain the possible sequences of words that might be recognised. Here we use a separate bigram language model for each move type. We show that when the “correct” move-specific language model is used for each utterance in the test set, the word error rate of the recogniser drops. Of course when the recogniser is run on previously unseen data, it cannot know in advance what move type the speaker has just produced. To determine the move type we use an intonation model combined with a dialogue model that puts constraints on possible sequences of move types, as well as the speech recogniser likelihoods for the different move-specific models. In the full recognition system, the combination of automatic move type recognition with the move specific language models reduces the overall word error rate by a small but significant amount when compared with a baseline system that does not take intonation or dialogue acts into account. Interestingly, the word error improvement is restricted to “initiating” move types, where word recognition is important. In “response” move types, where the important information is conveyed by the move type itself - e.g., positive vs. negative response - there is no word error improvement, but recognition of the response types themselves is good. The paper discusses the intonation model, the language models and the dialogue model in detail and describes the architecture in which they are combined.
URI
http://hdl.handle.net/1842/1050
Collections
  • CSTR publications
  • Linguistics and English Language publications

Library & University Collections HomeUniversity of Edinburgh Information Services Home
Privacy & Cookies | Takedown Policy | Accessibility | Contact
Privacy & Cookies
Takedown Policy
Accessibility
Contact
feed RSS Feeds

RSS Feed not available for this page

 

 

All of ERACommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsPublication TypeSponsorSupervisorsThis CollectionBy Issue DateAuthorsTitlesSubjectsPublication TypeSponsorSupervisors
LoginRegister

Library & University Collections HomeUniversity of Edinburgh Information Services Home
Privacy & Cookies | Takedown Policy | Accessibility | Contact
Privacy & Cookies
Takedown Policy
Accessibility
Contact
feed RSS Feeds

RSS Feed not available for this page