Unified framework for decomposing neural representations and analyzing specialization in language models

Zhao, Zheng

Unified framework for decomposing neural representations and analyzing specialization in language models

Simple item page

dc.contributor.advisor

Cohen, Shay

dc.contributor.advisor

Webber, Bonnie

dc.contributor.author

Zhao, Zheng

dc.contributor.sponsor

UKRI CDT in Natural Language Processing

dc.date.accessioned

2026-05-21T15:10:42Z

dc.date.issued

2026-05-21

dc.description.abstract

The rise of large, pre-trained Transformer models has transformed Natural Language Processing (NLP), yet the internal mechanisms by which these models handle diverse and heterogeneous data remain insufficiently understood. This thesis addresses this gap by developing and applying a unified analytical framework to examine how such models represent, differentiate, and specialize for distinct subpopulations of data. The central contribution is the Model-Oriented Sub-population and Spectral Analysis (MOSSA) framework, which systematically contrasts a generalist model, trained on multiple domains, languages, or tasks, with a suite of specialist control models trained on individual subpopulations. Through a set of advanced matrix analysis techniques, MOSSA quantifies representational similarities layer by layer, revealing where and how knowledge encoding and adaptation occur within the model architecture. The framework is applied across three major studies of increasing complexity. The first investigates domain learning using Singular Vector Canonical Correlation Analysis (SVCCA) to assess how model capacity and data scale affect the encoding of domain-specific information. The findings show that larger models not only generalize across domains but also embed domain-specialist behavior within their internal representations, particularly for domain-specific vocabulary. The second study extends this approach to multilingual modeling. A joint matrix factorization method is introduced to analyze representational structures across 33 languages. The analysis uncovers systematic variation in the encoding of morphosyntactic information across layers, shaped by linguistic properties such as script and morphological complexity. Moreover, the learned representations align with cross-lingual task performance and yield linguistically meaningful phylogenetic structures. The third study explores the dynamics of massively multi-task instruction tuning in Large Language Models (LLMs). Using Centered Kernel Alignment (CKA) within MOSSA, we examine how an LLM represents over 60 NLP tasks. The results reveal a distinct architectural segmentation: early shared layers encode general-purpose features, intermediate transition layers rapidly acquire task-specific information, and later refinement layers optimize representations for precise task execution. Together, these studies establish a principled methodology for probing and interpreting the internal organization of large neural models. The thesis demonstrates that generalist language models systematically partition their representational space, forming specialized subspaces tailored to different data regimes. This work identifies where such specialization arises within model depth and clarifies the mechanisms underlying adaptation, multilinguality, and multi-task learning in contemporary NLP systems.

dc.identifier.uri

https://era.ed.ac.uk/handle/1842/44735

dc.identifier.uri

https://doi.org/10.7488/era/7250

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Spectral editing of activations for large language model alignment Qiu, Y., Zhao, Z., Ziser, Y., Korhonen, A., Ponti, E. & Cohen, S. B., 16 Dec 2024, Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track. Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J. & Zhang, C. (eds.). Curran Associates Inc, p. 56958-56987 30 p. (Advances in Neural Information Processing Systems; vol. 37)

dc.relation.hasversion

Understanding Domain Learning in Language Models Through Subpopulation Analysis Zhao, Z., Ziser, Y. & Cohen, S., 8 Dec 2022, Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP. Abu Dhabi, United Arab Emirates (Hybrid): Association for Computational Linguistics, p. 192-209 18 p

dc.relation.hasversion

Layer by layer: Uncovering where multi-task learning happens in instruction-tuned large language models Zhao, Z., Ziser, Y. & Cohen, S. B., 1 Nov 2024, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Al-Onaizan, Y., Bansal, M. & Chen, Y.-N. (eds.). Kerrville, TX, USA: Association for Computational Linguistics, p. 15195-15214 20 p

dc.relation.hasversion

Zhao, Z., Ziser, Y., Webber, B., and Cohen, S. (2023). A joint matrix factorization analysis of multilingual representations. In Bouamor, H., Pino, J., and Bali, K., editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 12764–12783, Singapore. Association for Computational Linguistics

dc.subject

interpretability

dc.subject

Language Models

dc.subject

Representation Learning

dc.title

Unified framework for decomposing neural representations and analyzing specialization in language models

dc.type

Thesis

dc.type.qualificationlevel

Doctoral

dc.type.qualificationname

PhD Doctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ZhaoZ_2026.pdf
Size:: 36.5 MB
Format:: Adobe Portable Document Format

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection