Show simple item record

dc.contributor.advisorFerrari, Vittorioen
dc.contributor.advisorHospedales, Timothyen
dc.contributor.authorHenderson, Paul Matthewen
dc.date.accessioned2019-03-26T11:32:17Z
dc.date.available2019-03-26T11:32:17Z
dc.date.issued2019-07-01
dc.identifier.urihttp://hdl.handle.net/1842/35600
dc.description.abstractThe goal of scene understanding is to capture the full content of an image in a human-interpretable representation. This must describe the different objects present, including their attributes such as class, shape, and pose, as well as the relations between objects. Moreover, the representation should be globally-consistent across the entire image. In this thesis, we consider four sub-tasks within scene understanding, and make contributions to each. When describing the content of an image, it is natural to start by detecting all the objects that are present—that is, localising and classifying them. Our first contribution is to show how to train a neural-network-based object class detector end-to-end in a principled fashion, using the evaluation metric as the training loss, and using the same pipeline at both training and test time. This is simpler and more elegant than the traditional approach of using a surrogate loss, yet we show it achieves comparable performance. Once the location and class of an object are known, we can estimate its shape and pose in 3D space. Our second contribution is a new approach to these tasks, which supports training purely from 2D images—without 3D supervision, multiple views, or annotations such as pose or keypoints. Moreover, this model is generative, and so allows sampling new object shapes a priori. To produce a globally-consistent description of a scene, it is important to reason over all objects simultaneously, rather than considering each individually. Our third contribution is a probabilistic generative model over complete indoor scene layouts. It models complex arrangements in 3D space, including high-order spatial relations among furniture and other objects. One common approach to generating predictions that are consistent over all objects in a scene, or pixels in an image, is to formulate and solve a discrete energy minimisation problem. The energy is defined as a sum over factors, and the factor structure greatly affects what minimisation algorithms work well. Our fourth contribution is a method that automatically selects a suitable algorithm to solve a given energy minimisation problem. To do so, it learns to predict the best algorithm based on characteristics of the problem instance.en
dc.contributor.sponsorEngineering and Physical Sciences Research Council (EPSRC)en
dc.language.isoen
dc.publisherThe University of Edinburghen
dc.relation.hasversionHenderson, P. and Ferrari, V. (2016a). Automatically selecting inference algorithms for discrete energy minimisation. In Proceedings of the European Conference on Computer Vision, pages 235–252. 4, 114en
dc.relation.hasversionHenderson, P. and Ferrari, V. (2016b). End-to-end training of object class detectors for mean average precision. In Proceedings of the Asian Conference on Computer Vision, pages 198–213. 3, 9en
dc.relation.hasversionHenderson, P. and Ferrari, V. (2018). Learning to generate and reconstruct 3D meshes with only 2D supervision. In Proceedings of the British Machine Vision Conference. 3, 35en
dc.subjectscene understandingen
dc.subjectglobally-consistenten
dc.subjectimage descriptionen
dc.subjectneural-networken
dc.subjectobject class detectionen
dc.subjectprobabilistic generative modelsen
dc.subject3D spaceen
dc.subjectminimisation algorithmsen
dc.titleAdvances in scene understanding: object detection, reconstruction, layouts, and inferenceen
dc.typeThesis or Dissertationen
dc.type.qualificationlevelDoctoralen
dc.type.qualificationnamePhD Doctor of Philosophyen


Files in this item

This item appears in the following Collection(s)

Show simple item record