|dc.description.abstract||This thesis focuses on modelling visual attention in tasks in which vision interacts
with language and other sources of contextual information. The work is based on
insights provided by experimental studies in visual cognition and psycholinguistics,
particularly cross-modal processing.
We present a series of models of eye-movements in situated language comprehension
capable of generating human-like scan-paths. Moreover we investigate the existence
of high level structure of the scan-paths and applicability of tools used in Natural
Language Processing in the analysis of this structure.
We show that scan paths carry interesting information that is currently neglected
in both experimental and modelling studies. This information, studied at a level beyond
simple statistical measures such as proportion of looks, can be used to extract
knowledge of more complicated patterns of behaviour, and to build models capable of
simulating human behaviour in the presence of linguistic material.
We also revisit classical model saliency and its extensions, in particular the Contextual
Guidance Model of Torralba et al. (2006), and extend it with memory of target
positions in visual search. We show that models of contextual guidance should contain
components responsible for short term learning and memorisation. We also investigate
the applicability of this type of model to prediction of human behaviour in tasks with
incremental stimuli as in situated language comprehension.
Finally we investigate the issue of objectness and object saliency, including their
effects on eye-movements and human responses to experimental tasks. In a simple
experiment we show that when using an object-based notion of saliency it is possible
to predict fixation locations better than using pixel-based saliency as formulated by Itti
et al. (1998). In addition we show that object based saliency fits into current theories
such as cognitive relevance and can be used to build unified models of cross-referential
visual and linguistic processing.
This thesis forms a foundation towards a more detailed study of scan-paths within
an object-based framework such as Cognitive Relevance Framework (Henderson et al.,
2007, 2009) by providing models capable of explaining human behaviour, and the
delivery of tools and methodologies to predict which objects would be attended to
during synchronous visual and linguistic processing.||en_US
|dc.contributor.sponsor||European Research Council||en_US
|dc.publisher||The University of Edinburgh||en_US
|dc.relation.hasversion||Dziemianko, M., Clarke, A., and Keller, F. (2011). Towards object-based saliency. In Proceedings of the 24th International Conference on Intelligent Robots and Systems (IROS).||en_US
|dc.relation.hasversion||Dziemianko, M., Clarke, A., and Keller, F. (2013). Object-based saliency as a predictor of attention in visual tasks. In Proceedings of the 34nd Annual Conference of the Cognitive Science Society.||en_US
|dc.relation.hasversion||Dziemianko, M., Coco, M., and Keller, F. (2011). Increamental learning of target positions in visual search. In Proceedings of the 32nd Annual Conference of the Cognitive Science Society.||en_US
|dc.relation.hasversion||Dziemianko, M. and Keller, F. (2013). Memory modulated saliency: A computational model of the incremental learning of target locations in visual searchs. Visual Cognition, 21(3):277–305.||en_US
|dc.title||Modelling eye movements and visual attention in synchronous visual and linguistic processing||en_US
|dc.type||Thesis or Dissertation||en_US
|dc.type.qualificationname||PhD Doctor of Philosophy||en_US