Multi-dialect Arabic broadcast speech recognition

Ali, Ahmed Mohamed Abdel Maksoud

Multi-dialect Arabic broadcast speech recognition

Simple item page

dc.contributor.advisor

Renals, Stephen

en

dc.contributor.advisor

Bell, Peter

en

dc.contributor.author

Ali, Ahmed Mohamed Abdel Maksoud

en

dc.date.accessioned

2018-06-22T09:41:35Z

dc.date.available

2018-06-22T09:41:35Z

dc.date.issued

2018-07-02

dc.description.abstract

Dialectal Arabic speech research suffers from the lack of labelled resources and standardised orthography. There are three main challenges in dialectal Arabic speech recognition: (i) finding labelled dialectal Arabic speech data, (ii) training robust dialectal speech recognition models from limited labelled data and (iii) evaluating speech recognition for dialects with no orthographic rules. This thesis is concerned with the following three contributions: Arabic Dialect Identification: We are mainly dealing with Arabic speech without prior knowledge of the spoken dialect. Arabic dialects could be sufficiently diverse to the extent that one can argue that they are different languages rather than dialects of the same language. We have two contributions: First, we use crowdsourcing to annotate a multi-dialectal speech corpus collected from Al Jazeera TV channel. We obtained utterance level dialect labels for 57 hours of high-quality consisting of four major varieties of dialectal Arabic (DA), comprised of Egyptian, Levantine, Gulf or Arabic peninsula, North African or Moroccan from almost 1,000 hours. Second, we build an Arabic dialect identification (ADI) system. We explored two main groups of features, namely acoustic features and linguistic features. For the linguistic features, we look at a wide range of features, addressing words, characters and phonemes. With respect to acoustic features, we look at raw features such as mel-frequency cepstral coefficients combined with shifted delta cepstra (MFCC-SDC), bottleneck features and the i-vector as a latent variable. We studied both generative and discriminative classifiers, in addition to deep learning approaches, namely deep neural network (DNN) and convolutional neural network (CNN). In our work, we propose Arabic as a five class dialect challenge comprising of the previously mentioned four dialects as well as modern standard Arabic. Arabic Speech Recognition: We introduce our effort in building Arabic automatic speech recognition (ASR) and we create an open research community to advance it. This section has two main goals: First, creating a framework for Arabic ASR that is publicly available for research. We address our effort in building two multi-genre broadcast (MGB) challenges. MGB-2 focuses on broadcast news using more than 1,200 hours of speech and 130M words of text collected from the broadcast domain. MGB-3, however, focuses on dialectal multi-genre data with limited non-orthographic speech collected from YouTube, with special attention paid to transfer learning. Second, building a robust Arabic ASR system and reporting a competitive word error rate (WER) to use it as a potential benchmark to advance the state of the art in Arabic ASR. Our overall system is a combination of five acoustic models (AM): unidirectional long short term memory (LSTM), bidirectional LSTM (BLSTM), time delay neural network (TDNN), TDNN layers along with LSTM layers (TDNN-LSTM) and finally TDNN layers followed by BLSTM layers (TDNN-BLSTM). The AM is trained using purely sequence trained neural networks lattice-free maximum mutual information (LFMMI). The generated lattices are rescored using a four-gram language model (LM) and a recurrent neural network with maximum entropy (RNNME) LM. Our official WER is 13%, which has the lowest WER reported on this task. Evaluation: The third part of the thesis addresses our effort in evaluating dialectal speech with no orthographic rules. Our methods learn from multiple transcribers and align the speech hypothesis to overcome the non-orthographic aspects. Our multi-reference WER (MR-WER) approach is similar to the BLEU score used in machine translation (MT). We have also automated this process by learning different spelling variants from Twitter data. We mine automatically from a huge collection of tweets in an unsupervised fashion to build more than 11M n-to-m lexical pairs, and we propose a new evaluation metric: dialectal WER (WERd). Finally, we tried to estimate the word error rate (e-WER) with no reference transcription using decoding and language features. We show that our word error rate estimation is robust for many scenarios with and without the decoding features.

en

dc.identifier.uri

http://hdl.handle.net/1842/31224

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

A Ali, S Renals, "Word Error Rate Estimation for Speech Recognition: e-WER", in ACL 2018.

en

dc.relation.hasversion

A Ali, S Vogel, S Renals, "Speech Recognition Challenge in the Wild: Arabic MGB-3", in ASRU 2017.

en

dc.relation.hasversion

A Ali, P Nakov, P Bell, S Renals, "WERd: Using Social Text Spelling Variants for Evaluating Dialectal Speech Recognition", in ASRU 2017.

en

dc.relation.hasversion

S Shon, A Ali, J Glass, "MIT-QCRI Arabic Dialect Identification System for the 2017 Multi-Genre Broadcast Challenge", in ASRU 2017.

en

dc.relation.hasversion

S Khurana, M Najafian, A Ali, T Al Hanai, Y Belinkov, J Glass, "QMDIS: QCRI-MIT Advanced Dialect Identification System", in Interspeech 2017.

en

dc.relation.hasversion

F Dalvi, Y Zhang, S Khurana, N Durrani, H Sajjad, A Abdelali, H Mubarak, A Ali, S Vogel, "QCRI Live Speech Translation System", demo paper in EACL 2017.

en

dc.relation.hasversion

M Zampieri, S Malmasi, N Ljubešic, P Nakov, A Ali, J Tiedemann, "Findings of the VarDial Evaluation Campaign 2017", EACL 2017

en

dc.relation.hasversion

A Ali, P Bell, J Glass, Y Messaoui, H Mubarak, S Renals, Y Zhang, "The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition", SLT 2016.

en

dc.relation.hasversion

S Khurana, A Ali, "QCRI Advanced Transcription System (QATS) for the Arabic Multi-Dialect Broadcast Media Recognition: MGB-2 Challenge", SLT 2016.

en

dc.relation.hasversion

A Ali, N Dehak, P Cardinal, S Khurana, SH Yella, J Glass, P Bell, S Renals, "Automatic Dialect Detection in Arabic Broadcast Speech", InterSpeech 2016.

en

dc.relation.hasversion

A Ali, W Magdy, P Bell, S Renals, "Multi-reference WER for evaluating ASR for Languages with no Orthographic Rules", ASRU 2015.

en

dc.relation.hasversion

S Wray, H Mubarak, A Ali, "Best Practices for Crowdsourcing Dialectal Arabic Speech Transcription", ANLP workshop, ACL 2015.

en

dc.relation.hasversion

S Wray, A Ali,"Crowdsource a Little to Label a Lot: Labeling a Speech Corpus of Dialectal Arabic", InterSpeech 2015.

en

dc.relation.hasversion

MH Bahari, N Dehak, L Burget, AM Ali, J Glass, "Non-negative Factor Analysis of Gaussian Mixture Model Weight Adaptation for Language and Dialect Recognition", IEEE/ACM transactions on audio, speech, and language processing, 2014.

en

dc.relation.hasversion

A Ali, Y Zhang, S Vogel, "QCRI Advanced Transcription System (QATS)", demo paper SLT, 2014.

en

dc.relation.hasversion

A Ali, H Mubarak, S Vogel, "Advances in Dialectal Arabic Speech Recognition: A Study Using Twitter to Improve Egyptian ASR", IWSLT 2014.

en

dc.relation.hasversion

A Ali, Y Zhang, P Cardinal, N Dahak, S Vogel, J Glass, "A Complete Kaldi Recipe for Building Arabic Speech Recognition Systems", SLT, 2014.

en

dc.relation.hasversion

P Cardinal, A Ali, Dehak, Najim, Y Zhang, A Hanai, Tuka, Y Zhang, S Vogel, J Glass, "Recent Advances in ASR Applied to an Arabic Transcription System for Al-Jazeera", InterSpeech 2014.

en

dc.subject

Arabic speech research

en

dc.subject

standardised orthography

en

dc.subject

Arabic dialects

en

dc.subject

crowdsourcing

en

dc.subject

MFCC-SDC

en

dc.subject

convolutional neural network

en

dc.subject

deep neural network

en

dc.title

Multi-dialect Arabic broadcast speech recognition

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Ali2018.pdf
Size:: 4.46 MB
Format:: Adobe Portable Document Format

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection