Generative factorization for object-centric representation learning

Li, Nanbo

Generative factorization for object-centric representation learning

Simple item page

dc.contributor.advisor

Fisher, Bob

dc.contributor.advisor

Williams, Chris

dc.contributor.author

Li, Nanbo

dc.contributor.sponsor

European Union Horizon 2020

en

dc.date.accessioned

2022-12-12T14:57:33Z

dc.date.available

2022-12-12T14:57:33Z

dc.date.issued

2022-12-12

dc.description.abstract

Empowering machines to understand compositionality is considered by many (Lake et al., 2017; Lake and Baroni, 2018; Schölkopf et al., 2021) a promising path towards improved representational interpretability and out-of-distribution generalization. Yet, discovering the compositional structures of raw sensory data requires solving a factorization problem, i.e. decomposing the unstructured observations into modular components. Handling the factorization problem presents numerous technical challenges, especially in unsupervised settings which we explore to avoid the heavy burden of human annotation. In this thesis, we approach the factorization problem from a generative perspective. Specifically, we develop unsupervised machine learning models to recover the compositional data-generation mechanisms around objects from visual scene observations. First, we present MulMON as the first feasible unsupervised solution to the multi-view object-centric representation learning problem. MulMON resolves the spatial ambiguities arising from single-image observations of static scenes, e.g. optical illusions and occlusion, with a multi-view inference design. We demonstrate that not only can MulMON perform better scene object factorization with less uncertainty than single-view methods, but it can also predict a scene's appearances and object segmentations for novel viewpoints. Next, we present a technique, namely for latent duplicate suppression (abbr. LDS), and demonstrate its effectiveness in fixing a common scene object factorization issue that exists in various unsupervised object-centric learning models---i.e. inferring duplicate representations for the same objects. Finally, we present DyMON as the first unsupervised learner that can recover object-centric compositional generative mechanism from moving-view-dynamic-scene observational data. We demonstrate that not only can DyMON factorize dynamic scenes in terms of objects, but it can also factorize the entangled effects of observer motions and object dynamics that function independently. Furthermore, we demonstrate that DyMON can predict a scene's appearances and segmentations at arbitrary times (querying across time) and from arbitrary viewpoints (querying across space)---i.e. answer counterfactual questions. The scene modeling explored in this thesis is a proof of concept, which we hope will inspire: 1) a broader range of downstream applications (e.g. "world modelling'' and environment interactions) and 2) generative factorization research that targets more complex compositional structures (e.g. complex textures, multi-granularity compositions).

en

dc.identifier.uri

https://hdl.handle.net/1842/39597

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Li Nanbo, Cian Eastwood, and Robert Fisher. “Learning Object-Centric Representations of Multi-Object Scenes From Multiple Views” Advances in Neural Information Processing Systems, 2020.

en

dc.relation.hasversion

Li Nanbo and Robert Fisher. “Duplicate Latent Representation Suppression For Multi-Object Variational Autoencoders.” The British Machine Vision Conference, 2021.

en

dc.relation.hasversion

Li Nanbo, Muhammad Ahmed Raza, Hu Wenbin, Zhaole Sun, and Robert Fisher. “Object-Centric Representation Learning with Generative Spatial- Temporal Factorization.” Advances in Neural Information Processing Systems, 2021.

en

dc.subject

PhD Thesis

en

dc.subject

Machine Learning

en

dc.subject

Computer Vision

en

dc.subject

Generative Models

en

dc.subject

Representation Learning

en

dc.subject

Object-Centric Models

en

dc.title

Generative factorization for object-centric representation learning

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: LiN_2022.pdf
Size:: 8.04 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection