Detecting and preventing unargmaxable outputs in bottlenecked neural networks
Item Status
Embargo End Date
Date
Authors
Grivas, Andreas
Abstract
Deep Neural Networks (DNNs) with a large number of outputs are ubiquitous for Artificial Intelligence (AI).
For example, Large Language Models (LLMs) generate sentences from a vocabulary of hundreds of thousands of output tokens. Crucially, the output layer of these models typically receives as input a dense feature representation having far fewer dimensions than the output. We call such an output layer a bottlenecked classifier. It is known that bottlenecked classifiers reduce the expressivity of DNNs (Yang et al., 2018) and that in theory some outputs may be impossible to predict (Demeter et al., 2020), but there have been no concrete examples of this situation in the literature due to the lack of precise tools and terminology. This thesis fills this gap. We demonstrate examples where bottlenecked classifiers cause DNNs to have outputs that are impossible to predict irrespective of the input. We name such outputs unargmaxable and introduce tools to detect them in LLMs and multi-label classifiers. But detection can only get us so far, the impact of this thesis is in showing that we can prevent them in the presence of domain knowledge. By imposing structure on bottlenecked classifiers we guarantee that all outputs consistent with our domain knowledge are argmaxable.
This item appears in the following Collection(s)

