Resource efficient action recognition in videos

Gowda, Shreyank Narayana

Resource efficient action recognition in videos

Simple item page

dc.contributor.advisor

Keller, Frank

dc.contributor.advisor

Sevilla-Lara, Laura

dc.contributor.author

Gowda, Shreyank Narayana

dc.date.accessioned

2023-10-18T15:01:42Z

dc.date.available

2023-10-18T15:01:42Z

dc.date.issued

2023-10-18

dc.description.abstract

This thesis traces an innovative journey in the domain of real-world action recognition, in particular focusing on memory and data efficient systems. It begins by introducing a novel approach for smart frame selection, which significantly reduces computational costs in video classification. It further optimizes the action recognition process by addressing the challenges of training time and memory consumption in video transformers, laying a strong foundation for memory efficient action recognition. The thesis then delves into zero-shot learning, focusing on the flaws of the currently existing protocol and establishing a new split for true zero-shot action recognition, ensuring zero overlap between unseen test classes and training or pre-training classes. Building on this, a unique cluster-based representation, optimized using reinforcement learning, is proposed for zero-shot action recognition. Crucially, we show that a joint visual-semantic representation learning is essential for improved performance. We also experiment with feature generation approaches for zero-shot action recognition by introducing a synthetic sample selection methodology extending the utility of zero-shot learning to both images and videos and selecting high-quality samples for synthetic data augmentation. This form of data valuation is then incorporated for our novel video data augmentation approach where we generate video composites using foreground and background mixing of videos. The data valuation helps us choose good composites at a reduced overall cost. Finally, we propose the creation of a meaningful semantic space for action labels. We create a textual description dataset for each action class and propose a novel feature generating approach to maximise the benefits of this semantic space. The research contributes significantly to the field, potentially paving the way for more efficient, resource-friendly, and robust video processing and understanding techniques.

en

dc.identifier.uri

https://hdl.handle.net/1842/41077

dc.identifier.uri

http://dx.doi.org/10.7488/era/3816

dc.language.iso

en

dc.publisher

The University of Edinburgh

en

dc.relation.hasversion

S. N. Gowda. Human activity recognition using combinatorial deep belief networks. In IEEE Conf. Comput. Vis. Pattern Recog., pages 1–6, 2017.

en

dc.relation.hasversion

S. N. Gowda. Synthetic sample selection for generalized zero-shot learning. In IEEE Conf. Comput. Vis. Pattern Recog., pages 58–67, 2023.

en

dc.relation.hasversion

S. N. Gowda, A. Arnab, and J. Huang. Optimizing vivit training: Time and memory reduction for action recognition. arXiv preprint arXiv:2306.04822, 2023.

en

dc.relation.hasversion

S. N. Gowda, P. Eustratiadis, T. Hospedales, and L. Sevilla-Lara. Alba: Reinforcement learning for video object segmentation. arXiv preprint arXiv:2005.13039, 2020.

en

dc.relation.hasversion

S. N. Gowda, M. Rohrbach, F. Keller, and L. Sevilla-Lara. Learn2augment: Learning to composite videos for data augmentation in action recognition. In Eur. Conf. Comput. Vis., pages 242–259. Springer, 2022.

en

dc.relation.hasversion

S. N. Gowda, M. Rohrbach, and L. Sevilla-Lara. Smart frame selection for action recognition. In AAAI, volume 35, pages 1451–1459, 2021.

en

dc.relation.hasversion

S. N. Gowda and L. Sevilla-Lara. Telling stories for common sense zero-shot action recognition. arXiv preprint arXiv:2309.17327, 2023.

en

dc.relation.hasversion

S. N. Gowda, L. Sevilla-Lara, F. Keller, and M. Rohrbach. Claster: clustering with reinforcement learning for zero-shot action recognition. In Eur. Conf. Comput. Vis., pages 187–203. Springer, 2022.

en

dc.relation.hasversion

S. N. Gowda, L. Sevilla-Lara, K. Kim, F. Keller, and M. Rohrbach. A new split for evaluating true zero-shot action recognition. arXiv preprint arXiv:2107.13029, 2021.

en

dc.subject

video

en

dc.subject

Resource efficient action recognition

en

dc.subject

real-world action recognition

en

dc.subject

memory

en

dc.subject

data efficient systems

en

dc.subject

action recognition process

en

dc.subject

video transformers

en

dc.subject

zero-shot learning

en

dc.title

Resource efficient action recognition in videos

en

dc.type

Thesis or Dissertation

en

dc.type.qualificationlevel

Doctoral

en

dc.type.qualificationname

PhD Doctor of Philosophy

en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: GowdaSN_2023.pdf
Size:: 23.57 MB
Format:: Adobe Portable Document Format
Description:

Download

This item appears in the following Collection(s)

Informatics thesis and dissertation collection