Learning to represent, model and generate the world

Anciukevičius, Titas

Learning to represent, model and generate the world

Simple item page

dc.contributor.advisor

Bilen, Hakan

dc.contributor.advisor

Williams, Chris

dc.contributor.author

Anciukevičius, Titas

dc.date.accessioned

2025-06-30T12:14:02Z

dc.date.available

2025-06-30T12:14:02Z

dc.date.issued

2025-06-30

dc.description.abstract

Every agent acting in the world receives sensory signals from which it must extract information necessary for intelligent actions. While neural networks can be trained to interpret these signals using labelled supervision, this thesis presents unsupervised algorithms for learning to represent, infer, and generate the physical world around us. Our core idea is to develop methods for learning latent variable generative models of images and videos, which support the generation and inference of useful latent representations, even from observations that are substantially different from those encountered during learning. In the first part of this thesis, we introduce a novel generative model based on denoising diffusion probabilistic models, which can learn a prior over underlying 3D representations despite being trained on 2D images. The core insight is that the model can be trained to denoise images by rendering a 3D scene. We demonstrate that the model learns a prior, which allows us to sample 3D representations from the true underlying distribution. In the second part of this thesis, we introduce a new neural representation for unbounded scenes and extend the denoising-by-rendering framework to support reconstruction of 3D representations given one or a few input images. We demonstrate that our model can sample a diverse set of 3D representations that explain a sparse visual conditioning signal. In the final part of the thesis, we explore how latent generative models can infer useful representations of the world from unfamiliar observations, i.e.~generalise outside their training distribution. The core insight is to learn the underlying causal, compositional and generative mechanisms, including how images are formed, and then exploit them in out-of-distribution scenarios, such as understanding images taken from unfamiliar viewpoints. We demonstrate that, while classical machine learning models fail, our framework can infer useful representations even when the input image is generated with a different data generation mechanism than the one used during training. Overall, our contributions advance the field by providing approaches to learning generative models that can reason about the complex physical world using raw sensory inputs---even in unfamiliar situations.

en

dc.identifier.uri

https://hdl.handle.net/1842/43632

dc.identifier.uri

http://dx.doi.org/10.7488/era/6165

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Titas Anciukevičius, Patrick Fox-Roberts, Edward Rosten, and Paul Henderson. Unsupervised causal generative understanding of images. Advances in Neural Information Processing Systems, 2022

en

dc.relation.hasversion

Titas Anciukevičius, Paul Henderson, and Hakan Bilen. Learning to Predict Keypoints and Structure of Articulated Objects without Supervision. In 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 2022.

en

dc.relation.hasversion

Titas Anciukevičius, Fabian Manhardt, Federico Tombari, and Paul Henderson. Denoising Diffusion via Image-Based Rendering. In The Twelfth International Conference on Learning Representations, 2024

en

dc.relation.hasversion

Titas Anciukevičius, Zexiang Xu, Matthew Fisher, Paul Henderson, Hakan Bilen, Niloy J Mitra, and Paul Guerrero. Renderdiffusion: Image diffusion for 3d reconstruction, inpainting and generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023

en

dc.relation.hasversion

Paul Henderson, Melonie de Almeida, Daniela Ivanova, and Titas Anciukevičius. Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models. arXiv preprint arXiv:2406.13099, 2024

en

dc.subject

raw sensory input

en

dc.subject

models

en

dc.subject

computer generated 3D versions

en

dc.subject

unsupervised algorithms

en

dc.subject

learning latent variable generative models

en

dc.subject

inference

en

dc.subject

denoising diffusion probabilistic models

en

dc.subject

denoising-by-rendering

en

dc.title

Learning to represent, model and generate the world

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Anciukevicius2025.pdf
Size:: 37.92 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection