Edinburgh Research Archive

Improving the intelligibility of speech playback in everyday scenarios

dc.contributor.advisor
King, Simon
dc.contributor.advisor
Botinhao, Cassia Valentini
dc.contributor.author
Chermaz, Carol
dc.contributor.sponsor
European Commission
en
dc.date.accessioned
2023-11-30T15:38:15Z
dc.date.available
2023-11-30T15:38:15Z
dc.date.issued
2023-11-30
dc.description.abstract
Speech playback is widely used across the industrialized world: radio, TV and public address announcements are just some examples. The presence of noise and reverberation may affect intelligibility, in particular for those individuals who experience hearing difficulties. As new technologies that involve speech playback (e.g., smartphones, smart speakers) become more popular, society is paying more attention to accessibility, making speech intelligibility an increasingly relevant topic. We investigate NELE (Near End Listening Enhancement) algorithms, which can be used to boost the intelligibility of speech signals before they are played back. We simulate realistic acoustic environments to test the effectiveness of existing algorithms for different hearing profiles, obtaining more insights than traditional tests run with synthetic noise. We show that such algorithms can improve intelligibility across hearing profiles, in everyday listening scenarios. We then propose two novel NELE methods: one covert and the other overt. The covert strategy entails watermarking speech signals with data that can be read by a receiving device (e.g., a hearing aid), which in turn can use it to perform better speech enhancement. The overt method audibly modifies the signal, by reallocating its energy across time and frequency in a perceptually-motivated way. We consider audio quality to be as important as intelligibility, and show that it is possible to increase both at the same time. The beta version of our Automatic Sound Engineer set a benchmark in the field of NELE by winning the Hurricane Challenge 2.0, while its fully automated version has proven to be suitable also for music production.
en
dc.identifier.uri
https://hdl.handle.net/1842/41247
dc.identifier.uri
http://dx.doi.org/10.7488/era/3983
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Chermaz, C., Valentini-Botinhao, C., Schepker, H., and King, S. (2019). Near end listening enhancement in realistic environments. In 23rd International Congress on Acoustics, pages 5731–5735
en
dc.relation.hasversion
Chermaz, C., Valentini-Botinhao, C., Schepker, H., and King, S. (2019). Evaluating Near End Listening Enhancement Algorithms in Realistic Environments. In Proc. Interspeech 2019, pages 1373–1377. Best Student Paper
en
dc.relation.hasversion
Chermaz, C. and King, S. (2020). A sound engineering approach to near end listening enhancement. In Proc. Interspeech 2020, pages 1356–1360. Presents ASE beta, winner of the Hurricane Challenge 2.0
en
dc.relation.hasversion
Chermaz, C., Leuchtmann, D., Tanner, S., and Wattenhofer, R. (2021). Compressed representation of cepstral coefficients via recurrent neural networks for informed speech enhancement. In Proc. ICASSP 2021, pages 731–735
en
dc.relation.hasversion
P. V., M. S., Chermaz, C., Chimona, T., Tsiaras, V., and Stylianou, Y. (2019). Benefits of the wavenet-based speech intelligibility enhancement for normal and hearing impaired listeners. In 23rd International Congress on Acoustics, pages 5721-5725.
en
dc.relation.hasversion
Muzzi, E., Chermaz, C., Castro, V., Zaninoni, M., Saksida, A., and Orzan, E. (2021). Short report on the effects of SARS-COV-2 face protective equipment on verbal communication. European Archives of Oto-rhinolaryngology, 278(9):3565–3570.
en
dc.relation.hasversion
Chermaz, C., P. V., M. S., Raman, S., Govender, A., Paul, D., and Simantiraki, O. (2020). Enriched speech for effortless listening. Presented at ICASSP 2020 Show and Tell.
en
dc.subject
Near End Listening Enhancement
en
dc.subject
Speech Technology
en
dc.subject
Hearing Technology
en
dc.subject
Automatic Sound Engineer
en
dc.subject
Realistic listening scenarios
en
dc.subject
Realistic noise
en
dc.subject
Speech pre-enhancement
en
dc.subject
Automatic vocal mastering
en
dc.title
Improving the intelligibility of speech playback in everyday scenarios
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
ChermazC_2023.pdf
Size:
37.48 MB
Format:
Adobe Portable Document Format
Description:

This item appears in the following Collection(s)