Towards human-like compositional generalization with neural models
dc.contributor.advisor
Lapata, Mirella
dc.contributor.advisor
Titov, Ivan
dc.contributor.author
Zheng, Hao
dc.date.accessioned
2023-10-03T15:15:49Z
dc.date.available
2023-10-03T15:15:49Z
dc.date.issued
2023-10-03
dc.description.abstract
The human language system exhibits systematic compositionality: the ability to
produce and understand a potentially infinite number of novel linguistic expressions
by systematically combining known atomic components. This type of systematic
compositionality is central to the human ability to learn from limited data and make
compositional generalizations. There has been a long-standing debate whether this systematicity
can be captured by connectionist architectures. Recent years have witnessed
a resurgence of interest in this problem with the revival of neural networks. In particular,
neural sequence-to-sequence models, as a powerful workhorse of natural language processing
(NLP), have been successfully applied to various NLP tasks. However, despite
widespread adoption, there is mounting evidence that neural sequence-to-sequence
models are deficient in compositional generalization.
In this thesis, we investigate the problem of how to improve compositional generalization
of neural sequence-to-sequence models in pursuit of building systems with
human-like systematic compositionality. First, assuming that connectionist architectures
are fundamentally incapable of acquiring this systematic compositionality which is, in
contrast, an inherent part of symbolic (e.g., grammar-based) systems, we attempt to
marry symbolic structure with neural models to combine the best of both worlds. We
present a two-stage decoding strategy to augment neural sequence-to-sequence models
(connectionist architecture) with semantic tagging (symbolic structure), in which an
input utterance is tagged with semantic symbols representing the meaning of individual
words. Experimental results demonstrate that our framework improves compositional
generation for semantic parsing across datasets and model architectures.
Secondly, despite superior compositional generalization, it has not yet been empirically
established that symbolic models are appropriate for handling the noise and
complexity of natural language, as evidenced by their sub-par performance in practical
applications. Therefore, tackling compositional generalization via pure architectural
modification has the potential to maintain the robustness and flexibility of neural models
required to process real language. We thus attempt to devise a more competent neural
model than standard sequence-to-sequence models for compositional generalization.
To approach this problem, we design Dangle, a new neural network architecture for
sequence-to-sequence modeling to learn more disentangled representations for better
compositional generalization compared to the Transformer model. Empirical results
on both semantic parsing and machine translation verify that our proposal leads to
more disentangled representations and better generalization, outperforming competitive
baselines and more specialized techniques.
Previously, we assess the proposed model on synthetic benchmarks to isolate compositional
generalization. However, real-world settings involve both complex natural
language and compositional generalization. We thus move on to apply disentangled
sequence-to-sequence models to real-world compositional generalization challenges.
Before doing so, we first propose a methodology for identifying compositional patterns
in real-world data and create a new machine translation benchmark that better represents
practical generalization requirements than existing artificial challenges.
Then we introduce two key modifications to Dangle which encourage learning
more disentangled representations more efficiently. We evaluate the proposed model on
existing real-world benchmarks and the benchmark created in this thesis. Experimental
results demonstrate that our new architecture achieves better generalization performance
across tasks and datasets and is adept at handling real-world challenges.
en
dc.identifier.uri
https://hdl.handle.net/1842/41023
dc.identifier.uri
http://dx.doi.org/10.7488/era/3762
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Zheng, H. and Lapata, M. (2021). Compositional generalization via semantic tagging. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1022– 1032, Punta Cana, Dominican Republic. Association for Computational Linguistics.
en
dc.relation.hasversion
Zheng, H. and Lapata, M. (2022). Disentangled sequence to sequence learning for compositional generalization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4256– 4268, Dublin, Ireland. Association for Computational Linguistics.
en
dc.relation.hasversion
Zheng, H. and Lapata, M. (2023). Real-world compositional generalization with disentangled sequence-to-sequence learning. In Findings of the Association for Computational Linguistics: ACL 2023, pages 1711–1725, Toronto, Canada. Association for Computational Linguistics.
en
dc.subject
human-like compositional generalization
en
dc.subject
neural models
en
dc.subject
neural sequence-to-sequence models
en
dc.subject
natural language processing (NLP)
en
dc.subject
human-like systematic compositionality
en
dc.subject
neural network architecture for sequence-to-sequence modeling
en
dc.title
Towards human-like compositional generalization with neural models
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- ZhengH_2023.pdf
- Size:
- 924.27 KB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

