Concatenative Text-to-Speech Synthesis Based on Prototype Waveform Interpolation (A Time Frequency Approach)
This paper presents some preliminary methods to apply the Time- Frequency Interpolation technique - TFI  to concatenative text-to-speech synthesis. The TFI technique described here is a pitch-synchronous time-frequency approach of the well known Prototype-Waveform Interpolation technique - PWI . The basic concepts of representing the speech signal in the Time-Frequency Domain as well as techniques to perform Time-Scale and Pitch- Scale modifications are described. Using the flexibility of TFI technique to perform spectral smothing, a method was developed to minimize the spectral mismatch at the boundaries of the Synthesis-Units - SUs. The proposed system was evaluated using SUs (Diphones) and prosodic modifications generated by the Festival system . An informal subjective test was performed, between the proposed TFI system and the standard TD-PSOLA system, highligthing the superior quality of the proposed system in comparasion with TD-PSOLA.