Learning from alternative sources of supervision
With the rise of the internet, data of many varieties including: images, audio, text and video are abundant. Unfortunately for a very specific task one might have, the data for that problem is not typically abundant unless you are lucky. Typically one might have only a small amount of labelled data, or only noisy labels, or labels for a different task, or perhaps a simulator and reward function but no demonstrations, or even a simulator but no reward function at all. However, arguably no task is truly novel and so it is often possible for neural networks to benefit from the abundant data that is related to your current task. This thesis documents three methods for learning from alternative sources of supervision, an alternative to the more preferable case of simply having unlimited amounts of direct examples of your task. Firstly we show how having data from many related tasks could be described with a simple graphical model and fit using a Variational-Autoencoder - directly modelling and representing the relations amongst tasks. Secondly we investigate various forms of prediction-based intrinsic rewards for agents in a simulator with no extrinsic rewards. Thirdly we introduce a novel intrinsic reward and investigate how to best combine it with an extrinsic reward for best performance.