Edinburgh Research Archive logo

Edinburgh Research Archive

University of Edinburgh homecrest
View Item 
  •   ERA Home
  • Informatics, School of
  • Informatics thesis and dissertation collection
  • View Item
  •   ERA Home
  • Informatics, School of
  • Informatics thesis and dissertation collection
  • View Item
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.

Arabic sarcasm detection

View/Open
Abu Farha2023.pdf (3.128Mb)
Date
19/09/2023
Author
Abu Farha, Ibrahim
Metadata
Show full item record
Abstract
Sarcasm is a form of verbal irony that is often used to express ridicule or contempt. When using sarcasm, a speaker expresses their opinion in an indirect way, where the literal meaning is different from the intended one. Additionally, sarcasm is a sociolinguistic tool that people use to express themselves and it reflects their cultural and social background. Sarcasm detection refers to the process of automatically and computationally identifying whether a piece of text is sarcastic. This has been well studied in the context of English, but Arabic lags behind. In this thesis, we try to fill in the gaps in the research on Arabic sarcasm detection. First, we start by exploring approaches to create an Arabic sarcasm dataset. We create ArSarcasm dataset through the re-annotation of existing sentiment analysis datasets. These labels represent perceived sarcasm as the labels reflect the annotators' perception. The analysis shows that sarcasm is prominent in the used sentiment datasets, with 16% of the sentences being sarcastic. Our experiments show that sarcasm is disruptive for sentiment analysers. Analysis shows that annotating subjective content can be challenging and prone to biases. Second, to mitigate the issues and fallbacks of sarcasm data collection approaches, we propose to collect sarcasm datasets by asking people to label their words, which is referred to as intended sarcasm. The resulting dataset, which is first-party annotated, would have more reliable and trustworthy labels and does not have the issues of third-party annotated data. Next, we test state-of-the-art machine learning models on the newly created datasets. Those experiments provide a benchmark for these datasets. The experiments show that intended sarcasm detection is more challenging than perceived sarcasm detection. Also, the experiments show that monolingual Arabic language models, which include dialects in their pre-training data, perform better on the sarcasm detection task. Additionally, we provide the details of shared tasks that utilise the new datasets. Finally, we provide an in-depth error analysis comparing humans' performance in sarcasm detection against the performance of state-of-the-art models. Our analysis confirms that sarcasm is challenging for both humans and machines. We also highlight the features and patterns used to express sarcasm, such as idioms and proverbs. When extending the analysis to focus on Arabic dialects, we found that dialect familiarity affects how Arabic speakers understand and interpret sarcasm. Arabic speakers were better able to detect sarcasm expressed in their dialect or one they were familiar with.
URI
https://hdl.handle.net/1842/40923

http://dx.doi.org/10.7488/era/3675
Collections
  • Informatics thesis and dissertation collection

Library & University Collections HomeUniversity of Edinburgh Information Services Home
Privacy & Cookies | Takedown Policy | Accessibility | Contact
Privacy & Cookies
Takedown Policy
Accessibility
Contact
feed RSS Feeds

RSS Feed not available for this page

 

 

All of ERACommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsPublication TypeSponsorSupervisorsThis CollectionBy Issue DateAuthorsTitlesSubjectsPublication TypeSponsorSupervisors
LoginRegister

Library & University Collections HomeUniversity of Edinburgh Information Services Home
Privacy & Cookies | Takedown Policy | Accessibility | Contact
Privacy & Cookies
Takedown Policy
Accessibility
Contact
feed RSS Feeds

RSS Feed not available for this page