dc.description.abstract | Two speech synthesisers were adapted for singing synthesis using unit selection
techniques provided by the Festival speech synthesis system. A limited domain
approach was used by focussing on the pitch, duration and word of each note.
The first synthesiser used the cluster unit technique on a database of an octave
range, where each note had a specific word assigned to it. Some of the automatic
techniques used (e.g. for segmentation) were designed for speech and should
ideally be adapted to take account of the differences between singing and speaking.
Better quality was achieved with a multisyn engine and improved database design.
This database used a smaller pitch range and only three syllables, ’la’ ’ti’
and ’so’, but each syllable could be synthesised on any available note, and in any
combination of notes and syllables. This was achieved by weighting the target
cost of selecting units from the database in favour of choosing units with the correct
pitch and duration. Finally, prosodic modification was applied to units in
the multisyn engine, but this degraded quality as a result of how the units were
modified.
Although the quality of synthesis was appropriate for the intended applications,
the database was small and linguistic structure simple. To build a larger scale
singing synthesiser, either some aspect of the database should be kept simple,
such as vocabulary, or prosodic modification of units should be improved through
further analysis of the characteristics of singing. | en |