Generative factorization for object-centric representation learning
dc.contributor.advisor
Fisher, Bob
dc.contributor.advisor
Williams, Chris
dc.contributor.author
Li, Nanbo
dc.contributor.sponsor
European Union Horizon 2020
en
dc.date.accessioned
2022-12-12T14:57:33Z
dc.date.available
2022-12-12T14:57:33Z
dc.date.issued
2022-12-12
dc.description.abstract
Empowering machines to understand compositionality is considered by many (Lake et al., 2017; Lake and Baroni, 2018; Schölkopf et al., 2021) a promising path towards improved representational interpretability and out-of-distribution generalization. Yet, discovering the compositional structures of raw sensory data requires solving a factorization problem, i.e. decomposing the unstructured observations into modular components. Handling the factorization problem presents numerous technical challenges, especially in unsupervised settings which we explore to avoid the heavy burden of human annotation. In this thesis, we approach the factorization problem from a generative perspective. Specifically, we develop unsupervised machine learning models to recover the compositional data-generation mechanisms around objects from visual scene observations.
First, we present MulMON as the first feasible unsupervised solution to the multi-view object-centric representation learning problem. MulMON resolves the spatial ambiguities arising from single-image observations of static scenes, e.g. optical illusions and occlusion, with a multi-view inference design. We demonstrate that not only can MulMON perform better scene object factorization with less uncertainty than single-view methods, but it can also predict a scene's appearances and object segmentations for novel viewpoints. Next, we present a technique, namely for latent duplicate suppression (abbr. LDS), and demonstrate its effectiveness in fixing a common scene object factorization issue that exists in various unsupervised object-centric learning models---i.e. inferring duplicate representations for the same objects. Finally, we present DyMON as the first unsupervised learner that can recover object-centric compositional generative mechanism from moving-view-dynamic-scene observational data. We demonstrate that not only can DyMON factorize dynamic scenes in terms of objects, but it can also factorize the entangled effects of observer motions and object dynamics that function independently. Furthermore, we demonstrate that DyMON can predict a scene's appearances and segmentations at arbitrary times (querying across time) and from arbitrary viewpoints (querying across space)---i.e. answer counterfactual questions.
The scene modeling explored in this thesis is a proof of concept, which we hope will inspire: 1) a broader range of downstream applications (e.g. "world modelling'' and environment interactions) and 2) generative factorization research that targets more complex compositional structures (e.g. complex textures, multi-granularity compositions).
en
dc.identifier.uri
https://hdl.handle.net/1842/39597
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Li Nanbo, Cian Eastwood, and Robert Fisher. “Learning Object-Centric Representations of Multi-Object Scenes From Multiple Views” Advances in Neural Information Processing Systems, 2020.
en
dc.relation.hasversion
Li Nanbo and Robert Fisher. “Duplicate Latent Representation Suppression For Multi-Object Variational Autoencoders.” The British Machine Vision Conference, 2021.
en
dc.relation.hasversion
Li Nanbo, Muhammad Ahmed Raza, Hu Wenbin, Zhaole Sun, and Robert Fisher. “Object-Centric Representation Learning with Generative Spatial- Temporal Factorization.” Advances in Neural Information Processing Systems, 2021.
en
dc.subject
PhD Thesis
en
dc.subject
Machine Learning
en
dc.subject
Computer Vision
en
dc.subject
Generative Models
en
dc.subject
Representation Learning
en
dc.subject
Object-Centric Models
en
dc.title
Generative factorization for object-centric representation learning
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- LiN_2022.pdf
- Size:
- 8.04 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

