Centre for Speech Technology Research: Recent submissions
Now showing items 1-20 of 446
-
Evaluation of the Vulnerability of Speaker Verification to Synthetic Speech
(2010)In this paper, we evaluate the vulnerability of a speaker verification (SV) system to synthetic speech. Although this problem was first examined over a decade ago, dramatic improvements in both SV and speech synthesis ... -
Ageing voices: The effect of changes in voice parameters on ASR performance
(2010)With ageing, human voices undergo several changes which are typically characterized by increased hoarseness and changes in articulation patterns. In this study, we have examined the effect on Automatic Speech Recognition ... -
Unsupervised Cross-lingual Speaker Adaptation for HMM-based Speech Synthesis
(2010)In the EMIME project, we are developing a mobile device that performs personalized speech-to-speech translation such that a user's spoken input in one language is used to produce spoken output in another language, while ... -
Prediction and Realisation of Conversational Characteristics by Utilising Spontaneous Speech for Unit Selection
(2010)Unit selection speech synthesis has reached high levels of naturalness and intelligibility for neutral read aloud speech. However, synthetic speech generated using neutral read aloud data lacks all the attitude, intention ... -
Further exploration of the possibilities and pitfalls of multidimensional scaling as a tool for the evaluation of the quality of synthesized speech
(2010)Multidimensional scaling (MDS) has been suggested as a useful tool for the evaluation of the quality of synthesized speech. However, it has not yet been extensively tested for its applica- tion in this specific area of ... -
Transforming Voice Source Parameters in a HMM-based Speech Synthesiser with Glottal Post-Filtering
(2010)Control over voice quality, e.g. breathy and tense voice, is important for speech synthesis applications. For example, transformations can be used to modify aspects of the voice re- lated to speaker's identity and to improve ... -
Augmentation of adaptation data
(2010)Linear regression based speaker adaptation approaches can improve Automatic Speech Recognition (ASR) accuracy significantly for a target speaker. However, when the available adaptation data is limited to a few seconds, the ... -
Evaluating speech synthesis intelligibility using Amazon Mechanical Turk
(2010)Microtask platforms such as Amazon Mechanical Turk (AMT) are increasingly used to create speech and language resources. AMT in particular allows researchers to quickly recruit a large number of fairly demographically diverse ... -
A Digital Microphone Array for Distant Speech Recognition
(2010)In this paper, the design, implementation and testing of a digital microphone array is presented. The array uses digital MEMS microphones which integrate the microphone, amplifier and analogue to digital converter on a ... -
Designing Usable and Acceptable Reminders for the Home
(2010)Electronic reminders can play a key role in enabling people to manage their care and remain independent in their own homes for longer. The MultiMemoHome project aims to develop reminder designs that are accessible and ... -
Native and Non-Native Speaker Judgements on the Quality of Synthesized Speech
(2010)The difference between native speakers' and non-native speak- ers' naturalness judgements of synthetic speech is investigated. Similar/difference judgements are analysed via a multidimen- sional scaling analysis and compared ... -
A classifier-based target cost for unit selection speech synthesis trained on perceptual data
(2010)Our goal is to automatically learn a PERCEPTUALLY-optimal target cost function for a unit selection speech synthesiser. The approach we take here is to train a classifier on human perceptual judgements of synthetic speech. ... -
Recognition and Understanding of Meetings
(2010)This paper is about interpreting human communication in meetings using audio, video and other signals. Automatic meeting recognition and understanding is extremely challenging, since communication in a meeting is spontaneous ... -
Learning Dialogue Strategies from Older and Younger Simulated Users
(2010)Older adults are a challenging user group because their behaviour can be highly variable. To the best of our knowledge, this is the first study where dialogue strategies are learned and evaluated with both simulated younger ... -
The role of higher-level linguistic features in HMM-based speech synthesis
(2010)We analyse the contribution of higher-level elements of the linguistic specification of a data-driven speech synthesiser to the naturalness of the synthetic speech which it generates. The system is trained using various ... -
Personalising speech-to-speech translation in the EMIME project
(2010)In the EMIME project we have studied unsupervised cross-lingual speaker adaptation. We have employed an HMM statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt ... -
HMM-based Text-to-Articulatory-Movement Prediction and Analysis of Critical Articulators
(2010)In this paper we present a method to predict the movement of a speaker's mouth from text input using hidden Markov models (HMM). We have used a corpus of human articulatory movements, recorded by electromagnetic articulography ... -
Roles of the Average Voice in Speaker-adaptive HMM-based Speech Synthesis
(2010)In speaker-adaptive HMM-based speech synthesis, there are typically a few speakers for which the output synthetic speech sounds worse than that of other speakers, despite having the same amount of adaptation data from ... -
Comparison of HMM and TMD Methods for Lip Synchronisation
(2010)This paper presents a comparison between a hidden Markov model (HMM) based method and a novel artificial neural network (ANN) based method for lip synchronisation. Both model types were trained on motion tracking data, and ... -
Power Law Discounting for N-Gram Language Models
(2010)We present an approximation to the Bayesian hierarchical Pitman-Yor process language model which maintains the power law distribution over word tokens, while not requiring a computationally expensive approximate inference ...