Multi-view representation learning for natural language processing applications

Papasarantopoulos, Nikolaos

Multi-view representation learning for natural language processing applications

Simple item page

dc.contributor.advisor

Cohen, Shay

en

dc.contributor.advisor

Renals, Stephen

en

dc.contributor.author

Papasarantopoulos, Nikolaos

en

dc.date.accessioned

2020-04-08T11:43:10Z

dc.date.available

2020-04-08T11:43:10Z

dc.date.issued

2020-06-25

dc.description.abstract

The pervasion of machine learning in a vast number of applications has given rise to an increasing demand for the effective processing of complex, diverse and variable datasets. One representative case of data diversity can be found in multi-view datasets, which contain input originating from more than one source or having multiple aspects or facets. Examples include, but are not restricted to, multimodal datasets, where data may consist of audio, image and/or text. The nature of multi-view datasets calls for special treatment in terms of representation. A subsequent fundamental problem is that of combining information from potentially incoherent sources; a problem commonly referred to as view fusion. Quite often, the heuristic solution of early fusion is applied to this problem: aggregating representations from different views using a simple function (concatenation, summation or mean pooling). However, early fusion can cause overfitting in the case of small training samples and also, it may result in specific statistical properties of each view being lost in the learning process. Representation learning, the set of ideas and algorithms devised to learn meaningful representations for machine learning problems, has recently grown to a vibrant research field, that encompasses multiple view setups. A plethora of multi-view representation learning methods has been proposed in the literature, with a large portion of them being based on the idea of maximising the correlation between available views. Commonly, such techniques are evaluated on synthetic datasets or strictly defined benchmark setups; a role that, within Natural Language Processing, is often assumed by the multimodal sentiment analysis problem. This thesis argues that more complex downstream applications could benefit from such representations and describes a multi-view contemplation of a range of tasks, from static, two-view, unimodal to dynamic, three-view, trimodal applications.setting out to explore the limits of the seeming applicability of multi-view representation learning More specifically, we experiment with document summarisation, framing it as a multi-view problem where documents and summaries are considered two separate, textual views. Moreover, we present a multi-view inference algorithm for the bimodal problem of image captioning. Delving more into multimodal setups, we develop a set of multi-view models for applications pertaining to videos, including tagging and text generation tasks. Finally, we introduce narration generation, a new text generation task from movie videos, that requires inference on the storyline level and temporal context-based reasoning. The main argument of the thesis is that, due to their performance, multi-view representation learning tools warrant serious consideration by the researchers and practitioners of the Natural Language Processing community. Exploring the limits of multi-view representations, we investigate their fitness for Natural Language Processing tasks and show that they are able to hold information required for complex problems, while being a good alternative to the early fusion paradigm.

en

dc.identifier.uri

https://hdl.handle.net/1842/36966

dc.identifier.uri

http://dx.doi.org/10.7488/era/267

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Nikos Papasarantopoulos, Lea Frermann, Mirella Lapata, and Shay B. Cohen. 2019. Partners in crime: Multi-view sequential inference for movie understanding. In 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).

en

dc.relation.hasversion

Nikos Papasarantopoulos, Helen Jiang, and Shay B. Cohen. 2018. Canonical correlation inference for mapping abstract scenes to text. In Proceedings of theThirty-Second AAAI Conference on Artiﬁcial Intelligence,(AAAI),the 30th Innovative Applications of Artiﬁcial Intelligence (IAAI), and the 8th AAAI Symposium on Educationa Advances in Artiﬁcial Intelligence (EAAI), New Orleans, Louisiana, USA, Februar y2-7, 2018.

en

dc.relation.hasversion

Renars Liepins, Ulrich Germann, Guntis Barzdins, Alexandra Birch, Steve Renals, Susanne Weber, Peggy van der Kreeft, Herve Bourlard, João Prieto, Ondrej Klejch, Peter Bell, Alexandros Lazaridis, Alfonso Mendes, Sebastian Riedel, Mariana S. C. Almeida, Pedro Balage, Shay B. Cohen,Tomasz Dwojak, Philip N. Garner, Andreas Giefer, Marcin Junczys-Dowmunt, HinaImran, David Nogueira, Ahmed Ali, Sebastião Miranda, Andrei Popescu-Belis, Lesly Miculicich Werlen, Nikos Papasarantopoulos, Abiola Obamuyide, Clive Jones, Fahim Dalvi, Andreas Vlachos, Yang Wang, Sibo Tong, Rico Sennrich, Nikolaos Pappas, Shashi Narayan, Marco Damonte, Nadir Durrani, Sameer Khurana, Ahmed Abdelali, Hassan Sajjad, Stephan Vogel, David Sheppey, Chris Hernon,and Jeff Mitchell. 2017. The SUMMA platform prototype. In Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 116–119, Valencia, Spain. Association for Computational Linguistics.

en

dc.relation.hasversion

Shashi Narayan, Ronald Cardenas, Nikos Papasarantopoulos,Shay B. Cohen, Mirella Lapata, Jiangsheng Yu, and Yi Chang. 2018. Document modeling with external attention for sentence extraction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), Volume 1: Long Papers, pages 2020–2030

en

dc.subject

natural language processing

en

dc.subject

multi-view learning

en

dc.subject

representation learning

en

dc.subject

multimodal

en

dc.title

Multi-view representation learning for natural language processing applications

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Papasarantopoulos2020_Redacted.pdf
Size:: 9.83 MB
Format:: Adobe Portable Document Format
Description:

Download

Name:: Papasarantopoulos2020.pdf
Size:: 9.76 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection