Arabic sarcasm detection

Abu Farha, Ibrahim

Arabic sarcasm detection

Simple item page

dc.contributor.advisor

Magdy, Walid

dc.contributor.advisor

Webber, Bonnie

dc.contributor.author

Abu Farha, Ibrahim

dc.contributor.sponsor

Dstl

en

dc.contributor.sponsor

Alan Turing Institute

en

dc.date.accessioned

2023-09-19T09:42:41Z

dc.date.available

2023-09-19T09:42:41Z

dc.date.issued

2023-09-19

dc.description.abstract

Sarcasm is a form of verbal irony that is often used to express ridicule or contempt. When using sarcasm, a speaker expresses their opinion in an indirect way, where the literal meaning is different from the intended one. Additionally, sarcasm is a sociolinguistic tool that people use to express themselves and it reflects their cultural and social background. Sarcasm detection refers to the process of automatically and computationally identifying whether a piece of text is sarcastic. This has been well studied in the context of English, but Arabic lags behind. In this thesis, we try to fill in the gaps in the research on Arabic sarcasm detection. First, we start by exploring approaches to create an Arabic sarcasm dataset. We create ArSarcasm dataset through the re-annotation of existing sentiment analysis datasets. These labels represent perceived sarcasm as the labels reflect the annotators' perception. The analysis shows that sarcasm is prominent in the used sentiment datasets, with 16% of the sentences being sarcastic. Our experiments show that sarcasm is disruptive for sentiment analysers. Analysis shows that annotating subjective content can be challenging and prone to biases. Second, to mitigate the issues and fallbacks of sarcasm data collection approaches, we propose to collect sarcasm datasets by asking people to label their words, which is referred to as intended sarcasm. The resulting dataset, which is first-party annotated, would have more reliable and trustworthy labels and does not have the issues of third-party annotated data. Next, we test state-of-the-art machine learning models on the newly created datasets. Those experiments provide a benchmark for these datasets. The experiments show that intended sarcasm detection is more challenging than perceived sarcasm detection. Also, the experiments show that monolingual Arabic language models, which include dialects in their pre-training data, perform better on the sarcasm detection task. Additionally, we provide the details of shared tasks that utilise the new datasets. Finally, we provide an in-depth error analysis comparing humans' performance in sarcasm detection against the performance of state-of-the-art models. Our analysis confirms that sarcasm is challenging for both humans and machines. We also highlight the features and patterns used to express sarcasm, such as idioms and proverbs. When extending the analysis to focus on Arabic dialects, we found that dialect familiarity affects how Arabic speakers understand and interpret sarcasm. Arabic speakers were better able to detect sarcasm expressed in their dialect or one they were familiar with.

en

dc.identifier.uri

https://hdl.handle.net/1842/40923

dc.identifier.uri

http://dx.doi.org/10.7488/era/3675

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Ibrahim Abu Farha, Steven Wilson, Silviu Oprea, and Walid Magdy. 2022. Sarcasm Detection is Way Too Easy! An Empirical Comparison of Human and Machine Sarcasm Detection. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5284–5295, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.

en

dc.relation.hasversion

Ibrahim Abu Farha and Walid Magdy. 2022. The Effect of Arabic Dialect Familiarity on Data Annotation. In Proceedings of the The Seventh Arabic Natural Language Processing Workshop (WANLP), pages 399–408, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.

en

dc.relation.hasversion

Ibrahim Abu Farha, Silviu Vlad Oprea, Steven Wilson, and Walid Magdy. 2022. SemEval-2022 Task 6: iSarcasmEval, Intended Sarcasm Detection in English and Arabic. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 802–814, Seattle, United States. Association for Computational Linguistics.

en

dc.relation.hasversion

Ibrahim Abu Farha, Wajdi Zaghouani, and Walid Magdy. 2021. Overview of the WANLP 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 296–305, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.

en

dc.relation.hasversion

Ibrahim Abu Farha and Walid Magdy. 2021. Benchmarking Transformer based Language Models for Arabic Sentiment and Sarcasm Detection. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 21–31, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.

en

dc.relation.hasversion

Ibrahim Abu Farha and Walid Magdy. 2021. A Comparative Study of Effective Approaches for Arabic Sentiment Analysis. Information Processing & Management, 58(2):102438.

en

dc.relation.hasversion

Ibrahim Abu Farha and Walid Magdy. 2020. From Arabic Sentiment Analysis to Sarcasm Detection: The ArSarcasm Dataset. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pages 32–39, Marseille, France. European Language Resource Association.

en

dc.relation.hasversion

Ibrahim Abu Farha and Walid Magdy. 2020. Multitask Learning for Arabic Offensive Language and Hate-Speech Detection. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pages 86–90, Marseille, France. European Language Resource Association.

en

dc.relation.hasversion

Ibrahim Abu Farha and Walid Magdy. 2019. Mazajak: An online Arabic sentiment analyser. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 192–198, Florence, Italy. Association for Computational Linguistics.

en

dc.subject

Arabic

en

dc.subject

sarcasm

en

dc.subject

sarcasm detection

en

dc.subject

Arabic dialects

en

dc.subject

irony

en

dc.subject

Arabic sarcasm

en

dc.subject

Arabic irony

en

dc.subject

sarcasm dataset

en

dc.title

Arabic sarcasm detection

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Abu Farha2023.pdf
Size:: 3.13 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection