Towards human-like compositional generalization with neural models

Zheng, Hao

Towards human-like compositional generalization with neural models

Simple item page

dc.contributor.advisor

Lapata, Mirella

dc.contributor.advisor

Titov, Ivan

dc.contributor.author

Zheng, Hao

dc.date.accessioned

2023-10-03T15:15:49Z

dc.date.available

2023-10-03T15:15:49Z

dc.date.issued

2023-10-03

dc.description.abstract

The human language system exhibits systematic compositionality: the ability to produce and understand a potentially infinite number of novel linguistic expressions by systematically combining known atomic components. This type of systematic compositionality is central to the human ability to learn from limited data and make compositional generalizations. There has been a long-standing debate whether this systematicity can be captured by connectionist architectures. Recent years have witnessed a resurgence of interest in this problem with the revival of neural networks. In particular, neural sequence-to-sequence models, as a powerful workhorse of natural language processing (NLP), have been successfully applied to various NLP tasks. However, despite widespread adoption, there is mounting evidence that neural sequence-to-sequence models are deficient in compositional generalization. In this thesis, we investigate the problem of how to improve compositional generalization of neural sequence-to-sequence models in pursuit of building systems with human-like systematic compositionality. First, assuming that connectionist architectures are fundamentally incapable of acquiring this systematic compositionality which is, in contrast, an inherent part of symbolic (e.g., grammar-based) systems, we attempt to marry symbolic structure with neural models to combine the best of both worlds. We present a two-stage decoding strategy to augment neural sequence-to-sequence models (connectionist architecture) with semantic tagging (symbolic structure), in which an input utterance is tagged with semantic symbols representing the meaning of individual words. Experimental results demonstrate that our framework improves compositional generation for semantic parsing across datasets and model architectures. Secondly, despite superior compositional generalization, it has not yet been empirically established that symbolic models are appropriate for handling the noise and complexity of natural language, as evidenced by their sub-par performance in practical applications. Therefore, tackling compositional generalization via pure architectural modification has the potential to maintain the robustness and flexibility of neural models required to process real language. We thus attempt to devise a more competent neural model than standard sequence-to-sequence models for compositional generalization. To approach this problem, we design Dangle, a new neural network architecture for sequence-to-sequence modeling to learn more disentangled representations for better compositional generalization compared to the Transformer model. Empirical results on both semantic parsing and machine translation verify that our proposal leads to more disentangled representations and better generalization, outperforming competitive baselines and more specialized techniques. Previously, we assess the proposed model on synthetic benchmarks to isolate compositional generalization. However, real-world settings involve both complex natural language and compositional generalization. We thus move on to apply disentangled sequence-to-sequence models to real-world compositional generalization challenges. Before doing so, we first propose a methodology for identifying compositional patterns in real-world data and create a new machine translation benchmark that better represents practical generalization requirements than existing artificial challenges. Then we introduce two key modifications to Dangle which encourage learning more disentangled representations more efficiently. We evaluate the proposed model on existing real-world benchmarks and the benchmark created in this thesis. Experimental results demonstrate that our new architecture achieves better generalization performance across tasks and datasets and is adept at handling real-world challenges.

en

dc.identifier.uri

https://hdl.handle.net/1842/41023

dc.identifier.uri

http://dx.doi.org/10.7488/era/3762

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Zheng, H. and Lapata, M. (2021). Compositional generalization via semantic tagging. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1022– 1032, Punta Cana, Dominican Republic. Association for Computational Linguistics.

en

dc.relation.hasversion

Zheng, H. and Lapata, M. (2022). Disentangled sequence to sequence learning for compositional generalization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4256– 4268, Dublin, Ireland. Association for Computational Linguistics.

en

dc.relation.hasversion

Zheng, H. and Lapata, M. (2023). Real-world compositional generalization with disentangled sequence-to-sequence learning. In Findings of the Association for Computational Linguistics: ACL 2023, pages 1711–1725, Toronto, Canada. Association for Computational Linguistics.

en

dc.subject

human-like compositional generalization

en

dc.subject

neural models

en

dc.subject

neural sequence-to-sequence models

en

dc.subject

natural language processing (NLP)

en

dc.subject

human-like systematic compositionality

en

dc.subject

neural network architecture for sequence-to-sequence modeling

en

dc.title

Towards human-like compositional generalization with neural models

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ZhengH_2023.pdf
Size:: 924.27 KB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection