Learning to represent, model and generate the world
dc.contributor.advisor
Bilen, Hakan
dc.contributor.advisor
Williams, Chris
dc.contributor.author
Anciukevičius, Titas
dc.date.accessioned
2025-06-30T12:14:02Z
dc.date.available
2025-06-30T12:14:02Z
dc.date.issued
2025-06-30
dc.description.abstract
Every agent acting in the world receives sensory signals from which it must extract information necessary for intelligent actions. While neural networks can be trained to interpret these signals using labelled supervision, this thesis presents unsupervised algorithms for learning to represent, infer, and generate the physical world around us. Our core idea is to develop methods for learning latent variable generative models of images and videos, which support the generation and inference of useful latent representations, even from observations that are substantially different from those encountered during learning. In the first part of this thesis, we introduce a novel generative model based on denoising diffusion probabilistic models, which can learn a prior over underlying 3D representations despite being trained on 2D images. The core insight is that the model can be trained to denoise images by rendering a 3D scene. We demonstrate that the model learns a prior, which allows us to sample 3D representations from the true underlying distribution. In the second part of this thesis, we introduce a new neural representation for unbounded scenes and extend the denoising-by-rendering framework to support reconstruction of 3D representations given one or a few input images. We demonstrate that our model can sample a diverse set of 3D representations that explain a sparse visual conditioning signal. In the final part of the thesis, we explore how latent generative models can infer useful representations of the world from unfamiliar observations, i.e.~generalise outside their training distribution. The core insight is to learn the underlying causal, compositional and generative mechanisms, including how images are formed, and then exploit them in out-of-distribution scenarios, such as understanding images taken from unfamiliar viewpoints. We demonstrate that, while classical machine learning models fail, our framework can infer useful representations even when the input image is generated with a different data generation mechanism than the one used during training. Overall, our contributions advance the field by providing approaches to learning generative models that can reason about the complex physical world using raw sensory inputs---even in unfamiliar situations.
en
dc.identifier.uri
https://hdl.handle.net/1842/43632
dc.identifier.uri
http://dx.doi.org/10.7488/era/6165
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Titas Anciukevičius, Patrick Fox-Roberts, Edward Rosten, and Paul Henderson. Unsupervised causal generative understanding of images. Advances in Neural Information Processing Systems, 2022
en
dc.relation.hasversion
Titas Anciukevičius, Paul Henderson, and Hakan Bilen. Learning to Predict Keypoints and Structure of Articulated Objects without Supervision. In 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 2022.
en
dc.relation.hasversion
Titas Anciukevičius, Fabian Manhardt, Federico Tombari, and Paul Henderson. Denoising Diffusion via Image-Based Rendering. In The Twelfth International Conference on Learning Representations, 2024
en
dc.relation.hasversion
Titas Anciukevičius, Zexiang Xu, Matthew Fisher, Paul Henderson, Hakan Bilen, Niloy J Mitra, and Paul Guerrero. Renderdiffusion: Image diffusion for 3d reconstruction, inpainting and generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023
en
dc.relation.hasversion
Paul Henderson, Melonie de Almeida, Daniela Ivanova, and Titas Anciukevičius. Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models. arXiv preprint arXiv:2406.13099, 2024
en
dc.subject
raw sensory input
en
dc.subject
models
en
dc.subject
computer generated 3D versions
en
dc.subject
unsupervised algorithms
en
dc.subject
learning latent variable generative models
en
dc.subject
inference
en
dc.subject
denoising diffusion probabilistic models
en
dc.subject
denoising-by-rendering
en
dc.title
Learning to represent, model and generate the world
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Anciukevicius2025.pdf
- Size:
- 37.92 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

