Evaluation of the Vulnerability of Speaker Verification to Synthetic Speech
Proc. Odyssey (The speaker and language recognition workshop) 2010
View/ Open
Date
2010Author
De Leon, P.L.
Pucher, M.
Yamagishi, Junichi
Metadata
Abstract
In this paper, we evaluate the vulnerability of a speaker verification
(SV) system to synthetic speech. Although this problem
was first examined over a decade ago, dramatic improvements
in both SV and speech synthesis have renewed interest in
this problem. We use a HMM-based speech synthesizer, which
creates synthetic speech for a targeted speaker through adaptation
of a background model and a GMM-UBM-based SV system.
Using 283 speakers from the Wall-Street Journal (WSJ)
corpus, our SV system has a 0.4% EER. When the system
is tested with synthetic speech generated from speaker models
derived from the WSJ journal corpus, 90% of the matched
claims are accepted. This result suggests a possible vulnerability
in SV systems to synthetic speech. In order to detect
synthetic speech prior to recognition, we investigate the
use of an automatic speech recognizer (ASR), dynamic-timewarping
(DTW) distance of mel-frequency cepstral coefficients
(MFCC), and previously-proposed average inter-frame difference
of log-likelihood (IFDLL). Overall, while SV systems
have impressive accuracy, even with the proposed detector,
high-quality synthetic speech can lead to an unacceptably high
acceptance rate of synthetic speakers.