Automatic detection of discourse structure for speech recognition and understanding.
Date
1997Author
Jurafsky, Daniel
Bates, Rebecca
Coccaro, Noah
Martin, Rachel
Meteer, Marie
Ries, Klaus
Shriberg, Elizabeth
Stolcke, Andreas
Taylor, Paul A
Van Ess-Dykema, Carol
Metadata
Abstract
We describe a new approach for statistical modeling and detection of discourse structure
for natural conversational speech. Our model is based on 42 ‘Dialog Acts’ (DAs),
(question, answer, backchannel, agreement, disagreement, apology, etc). We labeled
1155 conversations from the Switchboard (SWBD) database (Godfrey et al. 1992) of
human-to-human telephone conversations with these 42 types and trained a Dialog Act
detector based on three distinct knowledge sources: sequences of words which characterize
a dialog act, prosodic features which characterize a dialog act, and a statistical
Discourse Grammar. Our combined detector, although still in preliminary stages, already
achieves a 65% Dialog Act detection rate based on acoustic waveforms, and 72%
accuracy based on word transcripts. Using this detector to switch among the 42 Dialog-
Act-Specific trigram LMs also gave us an encouraging but not statistically significant
reduction in SWBD word error.