Processing embodied conversation for interactive task learning

Rubavičius, Rimvydas

Processing embodied conversation for interactive task learning

Files

RubavičiusR_2025.pdf (32.77 MB)

Date

2025-06-26

Authors

Rubavičius, Rimvydas

Full item page

Abstract

Lifelong learning is a long-standing goal of human-robot interaction. One approach to achieving this is through Interactive Task Learning (ITL) scenarios, in which the learner uses natural interaction with a user, acting as a teacher, to learn new tasks. One type of natural interaction is embodied conversation, initiated by an instruction such as “move the one red cube in front of a blue cylinder”. A key challenge for ITL is that the learner’s domain conceptualization may entirely lack the concepts necessary for solving the task. In other words, the teacher’s natural language expressions are neologisms to the learner (“red” or “cube”, say, in our example) that may denote concepts that are not a part of the learner’s conceptualization of the domain at all. To handle such un-foreseen possibilities, the learner must perform interactive symbol grounding: that is, they must expand their hypothesis space of possible states to include newly discovered concepts and learn in real-time how to recognize objects denoted by this unforeseen concept to successfully solve the task, using the teacher’s messages as evidence. This thesis studies three ways in which the formal semantic analysis of embodied conversation can aid ITL. Firstly, we study how processing referential expressions like “the one red cube” with their logical consequences (e.g. that there is exactly one red cube in the environment) aids interactive symbol grounding. Secondly, we look into designing dialogue strategies under unawareness by quantifying the value of asking questions which require the teacher’s effort vs. risking solving tasks with current beliefs, which will be costly if wrong. Our unique contribution is that our learning models cope with an ever-expanding hypothesis space of possible states and actions that arise in ITL. Finally, we consider corrections of the agent’s execution actions, which arise when the ITL agent attempts to solve the task but performs a sub-optimal action in the environment (e.g., picking a red cylinder rather than a red cube), exposing its false beliefs and so prompting the teacher to express the source of the error (e.g., “No, this is a cylinder”). Such corrective feedback triggers belief revision, which, by exploiting the semantic consequences of the fact that it’s a correction (in this case, that the picked object is not a cube), complements and further improves the learning process. We study these three facets by developing and evaluating neuro-symbolic methods for interactive symbol grounding and policy learning. Through our experiments, we conclude that agents who use formal semantic analysis when processing embodied conversation, outperform learners lacking these capabilities.

URI

https://hdl.handle.net/1842/43620
http://dx.doi.org/10.7488/era/6153

This item appears in the following Collection(s)

Informatics thesis and dissertation collection