Multi-dialect Arabic broadcast speech recognition
dc.contributor.advisor
Renals, Stephen
en
dc.contributor.advisor
Bell, Peter
en
dc.contributor.author
Ali, Ahmed Mohamed Abdel Maksoud
en
dc.date.accessioned
2018-06-22T09:41:35Z
dc.date.available
2018-06-22T09:41:35Z
dc.date.issued
2018-07-02
dc.description.abstract
Dialectal Arabic speech research suffers from the lack of labelled resources and
standardised orthography. There are three main challenges in dialectal Arabic
speech recognition: (i) finding labelled dialectal Arabic speech data, (ii) training
robust dialectal speech recognition models from limited labelled data and (iii)
evaluating speech recognition for dialects with no orthographic rules. This thesis
is concerned with the following three contributions:
Arabic Dialect Identification: We are mainly dealing with Arabic speech
without prior knowledge of the spoken dialect. Arabic dialects could be sufficiently
diverse to the extent that one can argue that they are different languages
rather than dialects of the same language. We have two contributions:
First, we use crowdsourcing to annotate a multi-dialectal speech corpus collected
from Al Jazeera TV channel. We obtained utterance level dialect labels for 57
hours of high-quality consisting of four major varieties of dialectal Arabic (DA),
comprised of Egyptian, Levantine, Gulf or Arabic peninsula, North African or
Moroccan from almost 1,000 hours. Second, we build an Arabic dialect identification
(ADI) system. We explored two main groups of features, namely acoustic
features and linguistic features. For the linguistic features, we look at a wide
range of features, addressing words, characters and phonemes. With respect to
acoustic features, we look at raw features such as mel-frequency cepstral coefficients
combined with shifted delta cepstra (MFCC-SDC), bottleneck features and
the i-vector as a latent variable. We studied both generative and discriminative
classifiers, in addition to deep learning approaches, namely deep neural network
(DNN) and convolutional neural network (CNN). In our work, we propose Arabic
as a five class dialect challenge comprising of the previously mentioned four
dialects as well as modern standard Arabic.
Arabic Speech Recognition: We introduce our effort in building Arabic automatic
speech recognition (ASR) and we create an open research community
to advance it. This section has two main goals: First, creating a framework for
Arabic ASR that is publicly available for research. We address our effort in building
two multi-genre broadcast (MGB) challenges. MGB-2 focuses on broadcast
news using more than 1,200 hours of speech and 130M words of text collected
from the broadcast domain. MGB-3, however, focuses on dialectal multi-genre
data with limited non-orthographic speech collected from YouTube, with special
attention paid to transfer learning. Second, building a robust Arabic ASR system
and reporting a competitive word error rate (WER) to use it as a potential
benchmark to advance the state of the art in Arabic ASR. Our overall system is
a combination of five acoustic models (AM): unidirectional long short term memory
(LSTM), bidirectional LSTM (BLSTM), time delay neural network (TDNN),
TDNN layers along with LSTM layers (TDNN-LSTM) and finally TDNN layers
followed by BLSTM layers (TDNN-BLSTM). The AM is trained using purely
sequence trained neural networks lattice-free maximum mutual information (LFMMI).
The generated lattices are rescored using a four-gram language model
(LM) and a recurrent neural network with maximum entropy (RNNME) LM.
Our official WER is 13%, which has the lowest WER reported on this task.
Evaluation: The third part of the thesis addresses our effort in evaluating dialectal
speech with no orthographic rules. Our methods learn from multiple
transcribers and align the speech hypothesis to overcome the non-orthographic
aspects. Our multi-reference WER (MR-WER) approach is similar to the BLEU
score used in machine translation (MT). We have also automated this process
by learning different spelling variants from Twitter data. We mine automatically
from a huge collection of tweets in an unsupervised fashion to build more than
11M n-to-m lexical pairs, and we propose a new evaluation metric: dialectal
WER (WERd). Finally, we tried to estimate the word error rate (e-WER) with
no reference transcription using decoding and language features. We show that
our word error rate estimation is robust for many scenarios with and without the
decoding features.
en
dc.identifier.uri
http://hdl.handle.net/1842/31224
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
A Ali, S Renals, "Word Error Rate Estimation for Speech Recognition: e-WER", in ACL 2018.
en
dc.relation.hasversion
A Ali, S Vogel, S Renals, "Speech Recognition Challenge in the Wild: Arabic MGB-3", in ASRU 2017.
en
dc.relation.hasversion
A Ali, P Nakov, P Bell, S Renals, "WERd: Using Social Text Spelling Variants for Evaluating Dialectal Speech Recognition", in ASRU 2017.
en
dc.relation.hasversion
S Shon, A Ali, J Glass, "MIT-QCRI Arabic Dialect Identification System for the 2017 Multi-Genre Broadcast Challenge", in ASRU 2017.
en
dc.relation.hasversion
S Khurana, M Najafian, A Ali, T Al Hanai, Y Belinkov, J Glass, "QMDIS: QCRI-MIT Advanced Dialect Identification System", in Interspeech 2017.
en
dc.relation.hasversion
F Dalvi, Y Zhang, S Khurana, N Durrani, H Sajjad, A Abdelali, H Mubarak, A Ali, S Vogel, "QCRI Live Speech Translation System", demo paper in EACL 2017.
en
dc.relation.hasversion
M Zampieri, S Malmasi, N Ljubešic, P Nakov, A Ali, J Tiedemann, "Findings of the VarDial Evaluation Campaign 2017", EACL 2017
en
dc.relation.hasversion
A Ali, P Bell, J Glass, Y Messaoui, H Mubarak, S Renals, Y Zhang, "The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition", SLT 2016.
en
dc.relation.hasversion
S Khurana, A Ali, "QCRI Advanced Transcription System (QATS) for the Arabic Multi-Dialect Broadcast Media Recognition: MGB-2 Challenge", SLT 2016.
en
dc.relation.hasversion
A Ali, N Dehak, P Cardinal, S Khurana, SH Yella, J Glass, P Bell, S Renals, "Automatic Dialect Detection in Arabic Broadcast Speech", InterSpeech 2016.
en
dc.relation.hasversion
A Ali, W Magdy, P Bell, S Renals, "Multi-reference WER for evaluating ASR for Languages with no Orthographic Rules", ASRU 2015.
en
dc.relation.hasversion
S Wray, H Mubarak, A Ali, "Best Practices for Crowdsourcing Dialectal Arabic Speech Transcription", ANLP workshop, ACL 2015.
en
dc.relation.hasversion
S Wray, A Ali,"Crowdsource a Little to Label a Lot: Labeling a Speech Corpus of Dialectal Arabic", InterSpeech 2015.
en
dc.relation.hasversion
MH Bahari, N Dehak, L Burget, AM Ali, J Glass, "Non-negative Factor Analysis of Gaussian Mixture Model Weight Adaptation for Language and Dialect Recognition", IEEE/ACM transactions on audio, speech, and language processing, 2014.
en
dc.relation.hasversion
A Ali, Y Zhang, S Vogel, "QCRI Advanced Transcription System (QATS)", demo paper SLT, 2014.
en
dc.relation.hasversion
A Ali, H Mubarak, S Vogel, "Advances in Dialectal Arabic Speech Recognition: A Study Using Twitter to Improve Egyptian ASR", IWSLT 2014.
en
dc.relation.hasversion
A Ali, Y Zhang, P Cardinal, N Dahak, S Vogel, J Glass, "A Complete Kaldi Recipe for Building Arabic Speech Recognition Systems", SLT, 2014.
en
dc.relation.hasversion
P Cardinal, A Ali, Dehak, Najim, Y Zhang, A Hanai, Tuka, Y Zhang, S Vogel, J Glass, "Recent Advances in ASR Applied to an Arabic Transcription System for Al-Jazeera", InterSpeech 2014.
en
dc.subject
Arabic speech research
en
dc.subject
standardised orthography
en
dc.subject
Arabic dialects
en
dc.subject
crowdsourcing
en
dc.subject
MFCC-SDC
en
dc.subject
convolutional neural network
en
dc.subject
deep neural network
en
dc.title
Multi-dialect Arabic broadcast speech recognition
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Ali2018.pdf
- Size:
- 4.46 MB
- Format:
- Adobe Portable Document Format
This item appears in the following Collection(s)

