Two unit selection singing synthesisers
Item statusRestricted Access
Two speech synthesisers were adapted for singing synthesis using unit selection techniques provided by the Festival speech synthesis system. A limited domain approach was used by focussing on the pitch, duration and word of each note. The first synthesiser used the cluster unit technique on a database of an octave range, where each note had a specific word assigned to it. Some of the automatic techniques used (e.g. for segmentation) were designed for speech and should ideally be adapted to take account of the differences between singing and speaking. Better quality was achieved with a multisyn engine and improved database design. This database used a smaller pitch range and only three syllables, ’la’ ’ti’ and ’so’, but each syllable could be synthesised on any available note, and in any combination of notes and syllables. This was achieved by weighting the target cost of selecting units from the database in favour of choosing units with the correct pitch and duration. Finally, prosodic modification was applied to units in the multisyn engine, but this degraded quality as a result of how the units were modified. Although the quality of synthesis was appropriate for the intended applications, the database was small and linguistic structure simple. To build a larger scale singing synthesiser, either some aspect of the database should be kept simple, such as vocabulary, or prosodic modification of units should be improved through further analysis of the characteristics of singing.