Image context for object detection, object context for part detection
Item Status
Embargo End Date
Date
Authors
Abstract
Objects and parts are crucial elements for achieving automatic image understanding.
The goal of the object detection task is to recognize and localize all the objects in an
image. Similarly, semantic part detection attempts to recognize and localize the object
parts. This thesis proposes four contributions. The first two make object detection
more efficient by using active search strategies guided by image context. The last two
involve parts. One of them explores the emergence of parts in neural networks trained
for object detection, whereas the other improves on part detection by adding object
context.
First, we present an active search strategy for efficient object class detection. Modern
object detectors evaluate a large set of windows using a window classifier. Instead,
our search sequentially chooses what window to evaluate next based on all the information
gathered before. This results in a significant reduction on the number of necessary
window evaluations to detect the objects in the image. We guide our search strategy
using image context and the score of the classifier.
In our second contribution, we extend this active search to jointly detect pairs of
object classes that appear close in the image, exploiting the valuable information that
one class can provide about the location of the other. This leads to an even further
reduction on the number of necessary evaluations for the smaller, more challenging
classes.
In the third contribution of this thesis, we study whether semantic parts emerge
in Convolutional Neural Networks trained for different visual recognition tasks, especially
object detection. We perform two quantitative analyses that provide a deeper
understanding of their internal representation by investigating the responses of the network
filters. Moreover, we explore several connections between discriminative power
and semantics, which provides further insights on the role of semantic parts in the
network.
Finally, the last contribution is a part detection approach that exploits object context.
We complement part appearance with the object appearance, its class, and the expected
relative location of the parts inside it. We significantly outperform approaches
that use part appearance alone in this challenging task.
This item appears in the following Collection(s)

