Arabic sarcasm detection
dc.contributor.advisor
Magdy, Walid
dc.contributor.advisor
Webber, Bonnie
dc.contributor.author
Abu Farha, Ibrahim
dc.contributor.sponsor
Dstl
en
dc.contributor.sponsor
Alan Turing Institute
en
dc.date.accessioned
2023-09-19T09:42:41Z
dc.date.available
2023-09-19T09:42:41Z
dc.date.issued
2023-09-19
dc.description.abstract
Sarcasm is a form of verbal irony that is often used to express ridicule or contempt. When using sarcasm, a speaker expresses their opinion in an indirect way, where the literal meaning is different from the intended one. Additionally, sarcasm is a sociolinguistic tool that people use to express themselves and it reflects their cultural and social background. Sarcasm detection refers to the process of automatically and computationally identifying whether a piece of text is sarcastic. This has been well studied in the context of English, but Arabic lags behind. In this thesis, we try to fill in the gaps in the research on Arabic sarcasm detection.
First, we start by exploring approaches to create an Arabic sarcasm dataset. We create ArSarcasm dataset through the re-annotation of existing sentiment analysis datasets. These labels represent perceived sarcasm as the labels reflect the annotators' perception. The analysis shows that sarcasm is prominent in the used sentiment datasets, with 16% of the sentences being sarcastic. Our experiments show that sarcasm is disruptive for sentiment analysers. Analysis shows that annotating subjective content can be challenging and prone to biases.
Second, to mitigate the issues and fallbacks of sarcasm data collection approaches, we propose to collect sarcasm datasets by asking people to label their words, which is referred to as intended sarcasm. The resulting dataset, which is first-party annotated, would have more reliable and trustworthy labels and does not have the issues of third-party annotated data.
Next, we test state-of-the-art machine learning models on the newly created datasets. Those experiments provide a benchmark for these datasets. The experiments show that intended sarcasm detection is more challenging than perceived sarcasm detection. Also, the experiments show that monolingual Arabic language models, which include dialects in their pre-training data, perform better on the sarcasm detection task. Additionally, we provide the details of shared tasks that utilise the new datasets.
Finally, we provide an in-depth error analysis comparing humans' performance in sarcasm detection against the performance of state-of-the-art models. Our analysis confirms that sarcasm is challenging for both humans and machines. We also highlight the features and patterns used to express sarcasm, such as idioms and proverbs. When extending the analysis to focus on Arabic dialects, we found that dialect familiarity affects how Arabic speakers understand and interpret sarcasm. Arabic speakers were better able to detect sarcasm expressed in their dialect or one they were familiar with.
en
dc.identifier.uri
https://hdl.handle.net/1842/40923
dc.identifier.uri
http://dx.doi.org/10.7488/era/3675
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Ibrahim Abu Farha, Steven Wilson, Silviu Oprea, and Walid Magdy. 2022. Sarcasm Detection is Way Too Easy! An Empirical Comparison of Human and Machine Sarcasm Detection. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5284–5295, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
en
dc.relation.hasversion
Ibrahim Abu Farha and Walid Magdy. 2022. The Effect of Arabic Dialect Familiarity on Data Annotation. In Proceedings of the The Seventh Arabic Natural Language Processing Workshop (WANLP), pages 399–408, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
en
dc.relation.hasversion
Ibrahim Abu Farha, Silviu Vlad Oprea, Steven Wilson, and Walid Magdy. 2022. SemEval-2022 Task 6: iSarcasmEval, Intended Sarcasm Detection in English and Arabic. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 802–814, Seattle, United States. Association for Computational Linguistics.
en
dc.relation.hasversion
Ibrahim Abu Farha, Wajdi Zaghouani, and Walid Magdy. 2021. Overview of the WANLP 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 296–305, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
en
dc.relation.hasversion
Ibrahim Abu Farha and Walid Magdy. 2021. Benchmarking Transformer based Language Models for Arabic Sentiment and Sarcasm Detection. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 21–31, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
en
dc.relation.hasversion
Ibrahim Abu Farha and Walid Magdy. 2021. A Comparative Study of Effective Approaches for Arabic Sentiment Analysis. Information Processing & Management, 58(2):102438.
en
dc.relation.hasversion
Ibrahim Abu Farha and Walid Magdy. 2020. From Arabic Sentiment Analysis to Sarcasm Detection: The ArSarcasm Dataset. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pages 32–39, Marseille, France. European Language Resource Association.
en
dc.relation.hasversion
Ibrahim Abu Farha and Walid Magdy. 2020. Multitask Learning for Arabic Offensive Language and Hate-Speech Detection. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pages 86–90, Marseille, France. European Language Resource Association.
en
dc.relation.hasversion
Ibrahim Abu Farha and Walid Magdy. 2019. Mazajak: An online Arabic sentiment analyser. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 192–198, Florence, Italy. Association for Computational Linguistics.
en
dc.subject
Arabic
en
dc.subject
sarcasm
en
dc.subject
sarcasm detection
en
dc.subject
Arabic dialects
en
dc.subject
irony
en
dc.subject
Arabic sarcasm
en
dc.subject
Arabic irony
en
dc.subject
sarcasm dataset
en
dc.title
Arabic sarcasm detection
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Abu Farha2023.pdf
- Size:
- 3.13 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

