Articulatory feature-based methods for acoustic and audio-visual speech recognition: Summary from the 2006 JHU Summer Workshop.

Livescu, Karen; Çetin, Ozgur; Hasegawa-Johnson, Mark; King, Simon; Bartels, Chris; Borges, Nash; Kantor, Arthur; Lal, Partha; Yung, Lisa; Bezman, Ari; Dawson-Haggerty, Stephen; Woods, Bronwyn; Frankel, Joe; Magimai-Doss, Mathew; Saenko, Kate

Articulatory feature-based methods for acoustic and audio-visual speech recognition: Summary from the 2006 JHU Summer Workshop.

Simple item page

dc.contributor.author

Livescu, Karen

en

dc.contributor.author

Çetin, Ozgur

en

dc.contributor.author

Hasegawa-Johnson, Mark

en

dc.contributor.author

King, Simon

en

dc.contributor.author

Bartels, Chris

en

dc.contributor.author

Borges, Nash

en

dc.contributor.author

Kantor, Arthur

en

dc.contributor.author

Lal, Partha

en

dc.contributor.author

Yung, Lisa

en

dc.contributor.author

Bezman, Ari

en

dc.contributor.author

Dawson-Haggerty, Stephen

en

dc.contributor.author

Woods, Bronwyn

en

dc.contributor.author

Frankel, Joe

en

dc.contributor.author

Magimai-Doss, Mathew

en

dc.contributor.author

Saenko, Kate

en

dc.date.accessioned

2007-09-18T10:01:47Z

dc.date.available

2007-09-18T10:01:47Z

dc.date.issued

2007

dc.description.abstract

We report on investigations, conducted at the 2006 Johns HopkinsWorkshop, into the use of articulatory features (AFs) for observation and pronunciation models in speech recognition. In the area of observation modeling, we use the outputs of AF classiers both directly, in an extension of hybrid HMM/neural network models, and as part of the observation vector, an extension of the tandem approach. In the area of pronunciation modeling, we investigate a model having multiple streams of AF states with soft synchrony constraints, for both audio-only and audio-visual recognition. The models are implemented as dynamic Bayesian networks, and tested on tasks from the Small-Vocabulary Switchboard (SVitchboard) corpus and the CUAVE audio-visual digits corpus. Finally, we analyze AF classication and forced alignment using a newly collected set of feature-level manual transcriptions.

en

dc.format.extent

143533 bytes

en

dc.format.mimetype

application/pdf

en

dc.identifier.citation

K. Livescu, O. Çetin, M. Hasegawa-Johnson, S. King, C. Bartels, N. Borges, A. Kantor, P. Lal, L. Yung, S. Bezman, Dawson-Haggerty, B. Woods, J. Frankel, M. Magimai-Doss, and K. Saenko. Articulatory feature-based methods for acoustic and audio-visual speech recognition: Summary from the 2006 JHU Summer Workshop. In Proc. ICASSP, Honolulu, April 2007.

dc.identifier.uri

http://hdl.handle.net/1842/1998

dc.language.iso

en

dc.subject

speech technology

en

dc.title

Articulatory feature-based methods for acoustic and audio-visual speech recognition: Summary from the 2006 JHU Summer Workshop.

en

dc.type

Conference Paper