Shift happens: how can machine learning systems be best prepared?

Eastwood, Cian

Shift happens: how can machine learning systems be best prepared?

Files

EastwoodC_2024.pdf (13.47 MB)

Date

2024-03-18

Authors

Eastwood, Cian

Full item page

Abstract

Machine learning systems have made headlines in recent years, defeating world champions in Go, enhancing medical diagnoses, and redefining how we work with tools like ChatGPT. However, despite these impressive feats, machine learning systems remain fragile when faced with test data that differs from their training data. This fragility stems from a fundamental mismatch between textbook machine-learning methods and their real-world application. While textbook methods assume that the conditions under which a system is developed are similar to those in which it is deployed, in reality, systems tend to be developed under one set of conditions (e.g., in a lab) and deployed to another (e.g., a clinic). As a result, many machine learning systems are not prepared for the condition differences or distribution shifts they face upon deployment, leading to some high-profile and costly failures. For safety-critical settings like healthcare and autonomous driving, such failures represent a major barrier to real-world deployment. In this thesis, I argue that we must first accept that shift happens, and subsequently focus on how we can best prepare. To do so, I present four of my works that illustrate how machine learning systems can be prepared for (and adapted to) real-world distribution shifts. Together, these contributions take us closer to reliable machine learning systems that can be deployed in safety-critical settings. In the first work, the setting is source-free domain adaptation, i.e., adapting a model to unlabelled test data without the original training data. Here, we prepare for a change in measurement device (e.g., X-rays from a different scanner) by storing lightweight statistics of the training data. By restoring these statistics on the test data, we see improved accuracy, calibration and data efficiency over prior methods. In the second work, the setting is domain generalisation, i.e., performing well on test data from new environments or domains by leveraging data from multiple related domains at training time. Here, we prepare for more flexible and unknown changes by exploiting invariances across the training domains that hold with high probability in unseen test domains. In particular, by minimising a particular quantile of a model's performance distribution over domains, we learn models that perform well with the corresponding probability. In the third work, the setting is again domain generalisation, but this time we focus on ways to harness so-called "spurious'' features without test-domain labels. In particular, we show that predictions based on invariant/stable features can be used to adapt our usage of spurious/unstable features to new test domains, so long as the stable and unstable features are complementary (i.e., conditionally independent given the label). By safely harnessing complementary spurious features, we boost performance without sacrificing robustness. Finally, in the fourth work, the setting is disentangled representation learning which, in the context of this thesis, can be viewed as preparing for a change in the task itself by recovering and separating the underlying factors of variation. To this end, we extend an existing evaluation framework by first introducing a measure of representation explicitness or ease of use, and then connecting the framework to identifiability.

URI

https://hdl.handle.net/1842/41628
http://dx.doi.org/10.7488/era/4359

This item appears in the following Collection(s)

Informatics thesis and dissertation collection