Learning from alternative sources of supervision
View/ Open
Date
16/08/2022Author
Edwards, Harrison
Metadata
Abstract
With the rise of the internet, data of many varieties including: images, audio, text
and video are abundant. Unfortunately for a very specific task one might have, the
data for that problem is not typically abundant unless you are lucky. Typically one
might have only a small amount of labelled data, or only noisy labels, or labels for a
different task, or perhaps a simulator and reward function but no demonstrations, or
even a simulator but no reward function at all. However, arguably no task is truly novel
and so it is often possible for neural networks to benefit from the abundant data that
is related to your current task. This thesis documents three methods for learning from
alternative sources of supervision, an alternative to the more preferable case of simply
having unlimited amounts of direct examples of your task. Firstly we show how having
data from many related tasks could be described with a simple graphical model and
fit using a Variational-Autoencoder - directly modelling and representing the relations
amongst tasks. Secondly we investigate various forms of prediction-based intrinsic
rewards for agents in a simulator with no extrinsic rewards. Thirdly we introduce a
novel intrinsic reward and investigate how to best combine it with an extrinsic reward
for best performance.