Edinburgh Research Archive

Unified framework for decomposing neural representations and analyzing specialization in language models

Abstract

The rise of large, pre-trained Transformer models has transformed Natural Language Processing (NLP), yet the internal mechanisms by which these models handle diverse and heterogeneous data remain insufficiently understood. This thesis addresses this gap by developing and applying a unified analytical framework to examine how such models represent, differentiate, and specialize for distinct subpopulations of data. The central contribution is the Model-Oriented Sub-population and Spectral Analysis (MOSSA) framework, which systematically contrasts a generalist model, trained on multiple domains, languages, or tasks, with a suite of specialist control models trained on individual subpopulations. Through a set of advanced matrix analysis techniques, MOSSA quantifies representational similarities layer by layer, revealing where and how knowledge encoding and adaptation occur within the model architecture. The framework is applied across three major studies of increasing complexity. The first investigates domain learning using Singular Vector Canonical Correlation Analysis (SVCCA) to assess how model capacity and data scale affect the encoding of domain-specific information. The findings show that larger models not only generalize across domains but also embed domain-specialist behavior within their internal representations, particularly for domain-specific vocabulary. The second study extends this approach to multilingual modeling. A joint matrix factorization method is introduced to analyze representational structures across 33 languages. The analysis uncovers systematic variation in the encoding of morphosyntactic information across layers, shaped by linguistic properties such as script and morphological complexity. Moreover, the learned representations align with cross-lingual task performance and yield linguistically meaningful phylogenetic structures. The third study explores the dynamics of massively multi-task instruction tuning in Large Language Models (LLMs). Using Centered Kernel Alignment (CKA) within MOSSA, we examine how an LLM represents over 60 NLP tasks. The results reveal a distinct architectural segmentation: early shared layers encode general-purpose features, intermediate transition layers rapidly acquire task-specific information, and later refinement layers optimize representations for precise task execution. Together, these studies establish a principled methodology for probing and interpreting the internal organization of large neural models. The thesis demonstrates that generalist language models systematically partition their representational space, forming specialized subspaces tailored to different data regimes. This work identifies where such specialization arises within model depth and clarifies the mechanisms underlying adaptation, multilinguality, and multi-task learning in contemporary NLP systems.

This item appears in the following Collection(s)