Unsupervised Learning of Multiple Objects in Images
Developing computer vision algorithms able to learn from unsegmented images containing multiple objects is important since this is how humans constantly learn from visual experiences. In this thesis we consider images containing views of multiple objects and our task is to learn about each of the objects present in the images. This task can be approached as a factorial learning problem, where each image is explained by instantiating a model for each of the objects present with the correct instantiation parameters. A major problem with learning a factorial model is that as the number of objects increases, there is a combinatorial explosion of the number of configurations that need to be considered. We develop a greedy algorithm to extract object models sequentially from the data by making use of a robust statistical method, thus avoiding the combinatorial explosion. When we have video data, we greatly speed up the greedy algorithm by carrying out approximate tracking of the multiple objects in the scene. This method is applied to raw image sequence data and extracts the objects one at a time. First, the (possibly moving) background is learned, and moving objects are found at later stages. The algorithm recursively updates an appearance model so that occlusion is taken into account, and matches this model to the frames through the sequence. We apply this method to learn multiple objects in image sequences as well as articulated parts of the human body. Additionally, we learn a distribution over parts undergoing full affine transformations that expresses the relative movements of the parts. The idea of fitting a model to data sequentially using robust statistics is quite general and it can be applied to other models. We describe a method for training mixture models by learning one component at a time and thus building the mixture model in a sequential manner. We do this by incorporating an outlier component into the mixture model which allows us to fit just one data cluster by "ignoring" the rest of the clusters. Once a model is fitted we remove from consideration all the data explained by this model and then repeat the operation. This algorithm can be used to provide a sensible initialization of the mixture components when we train a mixture model.