Edinburgh Research Archive

Multidimensional scaling of listener responses to synthetic speech

dc.contributor.author
Mayo, Catherine
en
dc.contributor.author
Clark, Robert A J
en
dc.contributor.author
King, Simon
en
dc.coverage.spatial
4
en
dc.date.accessioned
2006-05-09T11:48:25Z
dc.date.available
2006-05-09T11:48:25Z
dc.date.issued
2005
dc.description.abstract
The move to unit-selection in speech synthesis has resulted in system improvements being made at subtle sub- and suprasegmental levels. Human perceptual evaluation of such subtle improvements requires a highly sophisticated level of perceptual attention to specific acoustic characteristics or cues. However, it is not well understood what acoustic cues listeners attend to by default when asked to evaluate synthetic speech. It may, therefore, be potentially quite difficult to design an evaluation method that allows listeners to concentrate on only one dimension of the signal, while ignoring others that are perceptually more important to them. This paper describes a pilot study which aims to evaluate multidimensional scaling (MDS) as a possible method of determining what acoustic characteristics of synthetic speech influence listeners’ judgements of the naturalness of the speech. Using distance measures (either real or perceived distances), MDS techniques represent stimuli as points in n-dimensional space. The space is configured so that similar stimuli are close together, while different stimuli are farther apart. Additionally, the dimensions of the space correspond to characteristics of the stimuli which influenced the perceived distances. Our results indicate that MDS techniques should be a useful tool in understanding the complex psychoacoustic processes that listeners undergo when evaluating synthetic speech. This method has allowed us to identify a number of cues that appear to be particularly perceptually salient to listeners evaluating synthetic speech naturalness, namely prosodic cues (in terms of duration and/or intonation) and segmental or unit level cues (in terms of appropriateness of units, or number of units).
en
dc.format.extent
42959 bytes
en
dc.format.mimetype
application/pdf
en
dc.identifier.citation
In Proceedings, Interspeech'2005 - Eurospeech, 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, September 4-8, 2005
dc.identifier.uri
http://www.isca-speech.org/archive/interspeech_2005
dc.identifier.uri
http://hdl.handle.net/1842/937
dc.language.iso
en
dc.publisher
International Speech Communication Association
en
dc.subject
speech synthesis
en
dc.subject
multidimensional scaling
en
dc.title
Multidimensional scaling of listener responses to synthetic speech
en
dc.type
Conference Paper
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
mayo-speech-2005.pdf
Size:
41.95 KB
Format:
Adobe Portable Document Format

This item appears in the following Collection(s)