Unified framework for decomposing neural representations and analyzing specialization in language models
dc.contributor.advisor
Cohen, Shay
dc.contributor.advisor
Webber, Bonnie
dc.contributor.author
Zhao, Zheng
dc.contributor.sponsor
UKRI CDT in Natural Language Processing
dc.date.accessioned
2026-05-21T15:10:42Z
dc.date.issued
2026-05-21
dc.description.abstract
The rise of large, pre-trained Transformer models has transformed Natural Language Processing (NLP), yet the internal mechanisms by which these models handle diverse and heterogeneous data remain insufficiently understood. This thesis addresses this gap by developing and applying a unified analytical framework to examine how such models represent, differentiate, and specialize for distinct subpopulations of data. The central contribution is the Model-Oriented Sub-population and Spectral Analysis (MOSSA) framework, which systematically contrasts a generalist model, trained on multiple domains, languages, or tasks, with a suite of specialist control models trained on individual subpopulations. Through a set of advanced matrix analysis techniques, MOSSA quantifies representational similarities layer by layer, revealing where and how knowledge encoding and adaptation occur within the model architecture.
The framework is applied across three major studies of increasing complexity. The first investigates domain learning using Singular Vector Canonical Correlation Analysis (SVCCA) to assess how model capacity and data scale affect the encoding of domain-specific information. The findings show that larger models not only generalize across domains but also embed domain-specialist behavior within their internal representations, particularly for domain-specific vocabulary.
The second study extends this approach to multilingual modeling. A joint matrix factorization method is introduced to analyze representational structures across 33 languages. The analysis uncovers systematic variation in the encoding of morphosyntactic information across layers, shaped by linguistic properties such as script and morphological complexity. Moreover, the learned representations align with cross-lingual task performance and yield linguistically meaningful phylogenetic structures.
The third study explores the dynamics of massively multi-task instruction tuning in Large Language Models (LLMs). Using Centered Kernel Alignment (CKA) within MOSSA, we examine how an LLM represents over 60 NLP tasks. The results reveal a distinct architectural segmentation: early shared layers encode general-purpose features, intermediate transition layers rapidly acquire task-specific information, and later refinement layers optimize representations for precise task execution.
Together, these studies establish a principled methodology for probing and interpreting the internal organization of large neural models. The thesis demonstrates that generalist language models systematically partition their representational space, forming specialized subspaces tailored to different data regimes. This work identifies where such specialization arises within model depth and clarifies the mechanisms underlying adaptation, multilinguality, and multi-task learning in contemporary NLP systems.
dc.identifier.uri
https://era.ed.ac.uk/handle/1842/44735
dc.identifier.uri
https://doi.org/10.7488/era/7250
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Spectral editing of activations for large language model alignment Qiu, Y., Zhao, Z., Ziser, Y., Korhonen, A., Ponti, E. & Cohen, S. B., 16 Dec 2024, Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track. Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J. & Zhang, C. (eds.). Curran Associates Inc, p. 56958-56987 30 p. (Advances in Neural Information Processing Systems; vol. 37)
dc.relation.hasversion
Understanding Domain Learning in Language Models Through Subpopulation Analysis Zhao, Z., Ziser, Y. & Cohen, S., 8 Dec 2022, Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP. Abu Dhabi, United Arab Emirates (Hybrid): Association for Computational Linguistics, p. 192-209 18 p
dc.relation.hasversion
Layer by layer: Uncovering where multi-task learning happens in instruction-tuned large language models Zhao, Z., Ziser, Y. & Cohen, S. B., 1 Nov 2024, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Al-Onaizan, Y., Bansal, M. & Chen, Y.-N. (eds.). Kerrville, TX, USA: Association for Computational Linguistics, p. 15195-15214 20 p
dc.relation.hasversion
Zhao, Z., Ziser, Y., Webber, B., and Cohen, S. (2023). A joint matrix factorization analysis of multilingual representations. In Bouamor, H., Pino, J., and Bali, K., editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 12764–12783, Singapore. Association for Computational Linguistics
dc.subject
interpretability
dc.subject
Language Models
dc.subject
Representation Learning
dc.title
Unified framework for decomposing neural representations and analyzing specialization in language models
dc.type
Thesis
dc.type.qualificationlevel
Doctoral
dc.type.qualificationname
PhD Doctor of Philosophy
Files
Original bundle
1 - 1 of 1
- Name:
- ZhaoZ_2026.pdf
- Size:
- 36.5 MB
- Format:
- Adobe Portable Document Format
This item appears in the following Collection(s)

