HMM-based speech synthesis using an acoustic glottal source model

Cabral, Joao P

HMM-based speech synthesis using an acoustic glottal source model

Simple item page

dc.contributor.advisor

Renals, Steve

en

dc.contributor.advisor

Richmond, Korin

en

dc.contributor.advisor

Yamagishi, Junichi

en

dc.contributor.author

Cabral, Joao P

en

dc.contributor.sponsor

Marie Curie Early Stage Training Site EdSST (MEST-CT-2005-020568)

en

dc.date.accessioned

2011-05-24T13:07:20Z

dc.date.available

2011-05-24T13:07:20Z

dc.date.issued

2011

dc.description.abstract

Parametric speech synthesis has received increased attention in recent years following the development of statistical HMM-based speech synthesis. However, the speech produced using this method still does not sound as natural as human speech and there is limited parametric flexibility to replicate voice quality aspects, such as breathiness. The hypothesis of this thesis is that speech naturalness and voice quality can be more accurately replicated by a HMM-based speech synthesiser using an acoustic glottal source model, the Liljencrants-Fant (LF) model, to represent the source component of speech instead of the traditional impulse train. Two different analysis-synthesis methods were developed during this thesis, in order to integrate the LF-model into a baseline HMM-based speech synthesiser, which is based on the popular HTS system and uses the STRAIGHT vocoder. The first method, which is called Glottal Post-Filtering (GPF), consists of passing a chosen LF-model signal through a glottal post-filter to obtain the source signal and then generating speech, by passing this source signal through the spectral envelope filter. The system which uses the GPF method (HTS-GPF system) is similar to the baseline system, but it uses a different source signal instead of the impulse train used by STRAIGHT. The second method, called Glottal Spectral Separation (GSS), generates speech by passing the LF-model signal through the vocal tract filter. The major advantage of the synthesiser which incorporates the GSS method, named HTS-LF, is that the acoustic properties of the LF-model parameters are automatically learnt by the HMMs. In this thesis, an initial perceptual experiment was conducted to compare the LFmodel to the impulse train. The results showed that the LF-model was significantly better, both in terms of speech naturalness and replication of two basic voice qualities (breathy and tense). In a second perceptual evaluation, the HTS-LF system was better than the baseline system, although the difference between the two had been expected to be more significant. A third experiment was conducted to evaluate the HTS-GPF system and an improved HTS-LF system, in terms of speech naturalness, voice similarity and intelligibility. The results showed that the HTS-GPF system performed similarly to the baseline. However, the HTS-LF system was significantly outperformed by the baseline. Finally, acoustic measurements were performed on the synthetic speech to investigate the speech distortion in the HTS-LF system. The results indicated that a problem in replicating the rapid variations of the vocal tract filter parameters at transitions between voiced and unvoiced sounds is the most significant cause of speech distortion. This problem encourages future work to further improve the system.

en

dc.identifier.uri

http://hdl.handle.net/1842/4877

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Cabral, J. P. and Oliveira, L. C. (2005). Pitch-synchronous time-scaling for prosodic and voice quality transformations. In Proc. of INTERSPEECH, pages 1137–1140, Lisbon, Portugal.

en

dc.subject

HMM-based speech synthesis

en

dc.subject

glottal source modelling

en

dc.subject

LF-model

en

dc.title

HMM-based speech synthesis using an acoustic glottal source model

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Cabral2011.pdf
Size:: 3.12 MB
Format:: Adobe Portable Document Format

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection