Evaluation of the Vulnerability of Speaker Verification to Synthetic Speech

De Leon, P.L.; Pucher, M.; Yamagishi, Junichi

Evaluation of the Vulnerability of Speaker Verification to Synthetic Speech

Simple item page

dc.contributor.author

De Leon, P.L.

en

dc.contributor.author

Pucher, M.

en

dc.contributor.author

Yamagishi, Junichi

en

dc.date.accessioned

2011-01-19T11:11:50Z

dc.date.available

2011-01-19T11:11:50Z

dc.date.issued

2010

dc.date.updated

2011-01-19T11:11:50Z

dc.description.abstract

In this paper, we evaluate the vulnerability of a speaker verification (SV) system to synthetic speech. Although this problem was first examined over a decade ago, dramatic improvements in both SV and speech synthesis have renewed interest in this problem. We use a HMM-based speech synthesizer, which creates synthetic speech for a targeted speaker through adaptation of a background model and a GMM-UBM-based SV system. Using 283 speakers from the Wall-Street Journal (WSJ) corpus, our SV system has a 0.4% EER. When the system is tested with synthetic speech generated from speaker models derived from the WSJ journal corpus, 90% of the matched claims are accepted. This result suggests a possible vulnerability in SV systems to synthetic speech. In order to detect synthetic speech prior to recognition, we investigate the use of an automatic speech recognizer (ASR), dynamic-timewarping (DTW) distance of mel-frequency cepstral coefficients (MFCC), and previously-proposed average inter-frame difference of log-likelihood (IFDLL). Overall, while SV systems have impressive accuracy, even with the proposed detector, high-quality synthetic speech can lead to an unacceptably high acceptance rate of synthetic speakers.

en

dc.identifier.uri

http://hdl.handle.net/1842/4659

dc.title

Evaluation of the Vulnerability of Speaker Verification to Synthetic Speech

en

dc.type

Conference Paper

en

rps.title

Proc. Odyssey (The speaker and language recognition workshop) 2010

en