Learning shape, structure, and semantics: self-supervised learning with 3D priors
dc.contributor.advisor
Mac Aodha, Oisin
dc.contributor.advisor
Bilen, Hakan
dc.contributor.author
Aygün, Mehmet
dc.date.accessioned
2025-10-24T10:12:31Z
dc.date.available
2025-10-24T10:12:31Z
dc.date.issued
2025-10-24
dc.description.abstract
The world exists in three dimensions, yet when 3D objects are projected onto a 2D image plane, vital spatial information is inevitably lost. Despite this limitation, humans possess a remarkable ability to infer 3D structure from 2D images, enabling us to navigate and interact seamlessly with our surroundings. In contrast, modern computer vision algorithms primarily interpret the world as a collection of 2D patterns (e.g. bag of 2D visual words), leading to several shortcomings: poor generalization to novel environments, difficulty in learning object categories from limited training samples, and vulnerability to adversarial attacks, where minor texture modifications can drastically degrade performance.
This thesis aims to reduce the gap between human and machine perception by improving the extraction of 3D object shape information from 2D images and leveraging 3D understanding to enhance high-level vision tasks such as semantic correspondence estimation. To do so, we take inspiration from developmental psychology which suggests that human vision is strongly driven by shape cues, particularly in early cognitive development. However, with the rise of deep learning, classical approaches that explicitly encode shape, such as pictorial structure models and deformable part-based models, have largely been abandoned in favor of end-to-end learning paradigms.
In this thesis, we first assess the capabilities of unsupervised computer vision models on semantic correspondence tasks using a novel evaluation protocol that jointly captures semantic and geometric understanding. Our findings reveal that current models fall short on this task, and we proposed a new method that improved the state-of-the-art performance at the time, demonstrating significant advancements over existing approaches.
Next, we introduce a method for extracting the 3D shape of articulated objects, such as animals, from single-view images without requiring manual supervision. Finally, we present a novel approach to integrate 3D priors into self-supervised learning frameworks, improving robustness for semantic tasks such as image recognition while maintaining accuracy. By emphasizing the role of 3D shape in visual learning, this work introduces new methods that enhance the robustness of machine perception, advancing it toward human-level competence.
en
dc.identifier.uri
https://hdl.handle.net/1842/44103
dc.identifier.uri
http://dx.doi.org/10.7488/era/6629
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
Mehmet Aygün and Oisin Mac Aodha.“Demystifying Unsupervised Semantic Correspondence Estimation.” European Conference on Computer Vision, (ECCV). 202
en
dc.relation.hasversion
Mehmet Aygün and Oisin Mac Aodha. “SAOR: Single-view Articulated Object Reconstruction.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (CVPR). 2024.
en
dc.relation.hasversion
Mehmet Aygün , Prithviraj Dhar, Zhicheng Yan, Oisin Mac Aodha, and Rakesh Ranjan. “Enhancing 2D Representation Learning with a 3D Prior.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, (CVPR Workshops). 2024
en
dc.relation.hasversion
Danier, Duolikun, Mehmet Aygün , Changjian Li, Hakan Bilen, and Oisin Mac Aodha. “DepthCues: Evaluating Monocular Depth Perception in Large Vision Models.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (CVPR). 2025
en
dc.subject
computer vision systems
en
dc.subject
3D shape knowledge
en
dc.subject
computer learning systems
en
dc.subject
artificial intelligence
en
dc.subject
deep learning
en
dc.subject
unsupervised computer vision models
en
dc.subject
self-supervised learning framework
en
dc.subject
machine perception
en
dc.title
Learning shape, structure, and semantics: self-supervised learning with 3D priors
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- Aygun2025.pdf
- Size:
- 49.98 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

