Recognising activities by jointly modelling actions and their effects
With the rapid increase in adoption of consumer technologies, including inexpensive but powerful hardware, robotics appears poised at the cusp of widespread deployment in human environments. A key barrier that still prevents this is the machine understanding and interpretation of human activity, through a perceptual medium such as computer vision, or RBG-D sensing such as with the Microsoft Kinect sensor. This thesis contributes novel video-based methods for activity recognition. Specifically, the focus is on activities that involve interactions between the human user and objects in the environment. Based on streams of poses and object tracking, machine learning models are provided to recognize various of these interactions. The thesis main contributions are (1) a new model for interactions that explicitly learns the human-object relationships through a latent distributed representation, (2) a practical framework for labeling chains of manipulation actions in temporally extended activities and (3) an unsupervised sequence segmentation technique that relies on slow feature analysis and spectral clustering. These techniques are validated by experiments with publicly available data sets, such as the Cornell CAD-120 activity corpus which is one of the most extensive publicly available such data sets that is also annotated with ground truth information. Our experiments demonstrate the advantages of the proposed methods, over and above state of the art alternatives from the recent literature on sequence classifiers.
The following license files are associated with this item: