dc.contributor.advisor | Renals, Stephen | en |
dc.contributor.advisor | Tate, Austin | en |
dc.contributor.author | Zwyssig, Erich Paul | en |
dc.date.accessioned | 2013-12-17T15:27:08Z | |
dc.date.available | 2013-12-17T15:27:08Z | |
dc.date.issued | 2013-11-28 | |
dc.identifier.uri | http://hdl.handle.net/1842/8287 | |
dc.description.abstract | The last few years have seen the start of a unique change in microphones for consumer
devices such as smartphones or tablets. Almost all analogue capacitive microphones
are being replaced by digital silicon microphones or MEMS microphones.
MEMS microphones perform differently to conventional analogue microphones. Their
greatest disadvantage is significantly increased self-noise or decreased SNR, while
their most significant benefits are ease of design and manufacturing and improved sensitivity
matching.
This thesis presents research on speech processing, comparing conventional analogue
microphones with the newly available digital MEMS microphones. Specifically, voice
activity detection, speaker diarisation (who spoke when), speech separation and speech
recognition are looked at in detail.
In order to carry out this research different microphone arrays were built using digital
MEMS microphones and corpora were recorded to test existing algorithms and devise
new ones. Some corpora that were created for the purpose of this research will be
released to the public in 2013.
It was found that the most commonly used VAD algorithm in current state-of-theart
diarisation systems is not the best-performing one, i.e. MLP-based voice activity
detection consistently outperforms the more frequently used GMM-HMM-based VAD
schemes. In addition, an algorithm was derived that can determine the number of active
speakers in a meeting recording given audio data from a microphone array of known
geometry, leading to improved diarisation results.
Finally, speech separation experiments were carried out using different post-filtering
algorithms, matching or exceeding current state-of-the art results.
The performance of the algorithms and methods presented in this thesis was verified
by comparing their output using speech recognition tools and simple MLLR adaptation
and the results are presented as word error rates, an easily comprehensible scale.
To summarise, using speech recognition and speech separation experiments, this thesis
demonstrates that the significantly reduced SNR of the MEMS microphone can be
compensated for with well established adaptation techniques such as MLLR. MEMS
microphones do not affect voice activity detection and speaker diarisation performance. | en |
dc.language.iso | en | |
dc.publisher | The University of Edinburgh | en |
dc.relation.hasversion | C. Fox, Y. Liu, E. Zwyssig, and T. Hain. The Sheffield Wargames Corpus. In Proceedings of Interspeech. Citeseer, 2013. | en |
dc.relation.hasversion | E. Zwyssig. Signal processing method and apparatus, 2012. Patent Application GB1203810.5. | en |
dc.relation.hasversion | E. Zwyssig, M. Lincoln, and S. Renals. A digital microphone array for distant speech recognition. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2010. | en |
dc.relation.hasversion | E. Zwyssig, S. Renals, and M. Lincoln. Determining the number of speakers in a meeting using microphone array features. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2012a. | en |
dc.relation.hasversion | E. Zwyssig, S. Renals, and M. Lincoln. On the effect of SNR and superdirective beamforming in speaker diarisation in meetings. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2012b. | en |
dc.relation.hasversion | E. Zwyssig, F. Faubel, S. Renals, and M. Lincoln. Recognition of overlapping speech using digital MEMS microphone arrays. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013. | en |
dc.subject | speech processing | en |
dc.subject | MEMS | en |
dc.subject | microphone array | en |
dc.subject | VAD | en |
dc.subject | diarisation | en |
dc.title | Speech processing using digital MEMS microphones | en |
dc.type | Thesis or Dissertation | en |
dc.type.qualificationlevel | Doctoral | en |
dc.type.qualificationname | PhD Doctor of Philosophy | en |