Learning shape, structure, and semantics: self-supervised learning with 3D priors

Aygün, Mehmet

Learning shape, structure, and semantics: self-supervised learning with 3D priors

Simple item page

dc.contributor.advisor

Mac Aodha, Oisin

dc.contributor.advisor

Bilen, Hakan

dc.contributor.author

Aygün, Mehmet

dc.date.accessioned

2025-10-24T10:12:31Z

dc.date.available

2025-10-24T10:12:31Z

dc.date.issued

2025-10-24

dc.description.abstract

The world exists in three dimensions, yet when 3D objects are projected onto a 2D image plane, vital spatial information is inevitably lost. Despite this limitation, humans possess a remarkable ability to infer 3D structure from 2D images, enabling us to navigate and interact seamlessly with our surroundings. In contrast, modern computer vision algorithms primarily interpret the world as a collection of 2D patterns (e.g. bag of 2D visual words), leading to several shortcomings: poor generalization to novel environments, difficulty in learning object categories from limited training samples, and vulnerability to adversarial attacks, where minor texture modifications can drastically degrade performance. This thesis aims to reduce the gap between human and machine perception by improving the extraction of 3D object shape information from 2D images and leveraging 3D understanding to enhance high-level vision tasks such as semantic correspondence estimation. To do so, we take inspiration from developmental psychology which suggests that human vision is strongly driven by shape cues, particularly in early cognitive development. However, with the rise of deep learning, classical approaches that explicitly encode shape, such as pictorial structure models and deformable part-based models, have largely been abandoned in favor of end-to-end learning paradigms. In this thesis, we first assess the capabilities of unsupervised computer vision models on semantic correspondence tasks using a novel evaluation protocol that jointly captures semantic and geometric understanding. Our findings reveal that current models fall short on this task, and we proposed a new method that improved the state-of-the-art performance at the time, demonstrating significant advancements over existing approaches. Next, we introduce a method for extracting the 3D shape of articulated objects, such as animals, from single-view images without requiring manual supervision. Finally, we present a novel approach to integrate 3D priors into self-supervised learning frameworks, improving robustness for semantic tasks such as image recognition while maintaining accuracy. By emphasizing the role of 3D shape in visual learning, this work introduces new methods that enhance the robustness of machine perception, advancing it toward human-level competence.

en

dc.identifier.uri

https://hdl.handle.net/1842/44103

dc.identifier.uri

http://dx.doi.org/10.7488/era/6629

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

Mehmet Aygün and Oisin Mac Aodha.“Demystifying Unsupervised Semantic Correspondence Estimation.” European Conference on Computer Vision, (ECCV). 202

en

dc.relation.hasversion

Mehmet Aygün and Oisin Mac Aodha. “SAOR: Single-view Articulated Object Reconstruction.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (CVPR). 2024.

en

dc.relation.hasversion

Mehmet Aygün , Prithviraj Dhar, Zhicheng Yan, Oisin Mac Aodha, and Rakesh Ranjan. “Enhancing 2D Representation Learning with a 3D Prior.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, (CVPR Workshops). 2024

en

dc.relation.hasversion

Danier, Duolikun, Mehmet Aygün , Changjian Li, Hakan Bilen, and Oisin Mac Aodha. “DepthCues: Evaluating Monocular Depth Perception in Large Vision Models.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (CVPR). 2025

en

dc.subject

computer vision systems

en

dc.subject

3D shape knowledge

en

dc.subject

computer learning systems

en

dc.subject

artificial intelligence

en

dc.subject

deep learning

en

dc.subject

unsupervised computer vision models

en

dc.subject

self-supervised learning framework

en

dc.subject

machine perception

en

dc.title

Learning shape, structure, and semantics: self-supervised learning with 3D priors

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Aygun2025.pdf
Size:: 49.98 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection