Subjective Evaluation of Join Cost and Smoothing Methods
In our previous papers, we have proposed join cost functions derived from spectral distances, which have good correlations with perceptual scores obtained for a range of concatenation discontinuities. To further validate their ability to predict concatenation discontinuities, we have chosen the best three spectral distances and evaluated them subjectively in a listening test. The units for synthesis stimuli are obtained from a state-of-the-art unit selection text-to-speech system: rVoice from Rhetorical Systems Ltd. We also compared three different smoothing methods in this listening test. In this paper, we report listeners’ preferences for each join costs in combination with each smoothing method.