Resource efficient action recognition in videos
dc.contributor.advisor
Keller, Frank
dc.contributor.advisor
Sevilla-Lara, Laura
dc.contributor.author
Gowda, Shreyank Narayana
dc.date.accessioned
2023-10-18T15:01:42Z
dc.date.available
2023-10-18T15:01:42Z
dc.date.issued
2023-10-18
dc.description.abstract
This thesis traces an innovative journey in the domain of real-world action recognition, in particular focusing on memory and data efficient systems. It begins by introducing a novel approach for smart frame selection, which significantly reduces computational costs in video classification. It further optimizes the action recognition process by addressing the challenges of training time and memory consumption in video transformers, laying a strong foundation for memory efficient action recognition.
The thesis then delves into zero-shot learning, focusing on the flaws of the currently existing protocol and establishing a new split for true zero-shot action recognition, ensuring zero overlap between unseen test classes and training or pre-training classes. Building on this, a unique cluster-based representation, optimized using reinforcement learning, is proposed for zero-shot action recognition. Crucially, we show that a joint
visual-semantic representation learning is essential for improved performance. We also experiment with feature generation approaches for zero-shot action recognition by introducing a synthetic sample selection methodology extending the utility of zero-shot learning to both images and videos and selecting high-quality samples for synthetic data augmentation. This form of data valuation is then incorporated for our novel video data augmentation approach where we generate video composites using foreground and background mixing of videos. The data valuation helps us choose good composites at a reduced overall cost. Finally, we propose the creation of a meaningful semantic space for action labels. We create a textual description dataset for each action class and propose a novel feature generating approach to maximise the benefits of this semantic space. The research contributes significantly to the field, potentially paving the way for more efficient, resource-friendly, and robust video processing and understanding techniques.
en
dc.identifier.uri
https://hdl.handle.net/1842/41077
dc.identifier.uri
http://dx.doi.org/10.7488/era/3816
dc.language.iso
en
en
dc.publisher
The University of Edinburgh
en
dc.relation.hasversion
S. N. Gowda. Human activity recognition using combinatorial deep belief networks. In IEEE Conf. Comput. Vis. Pattern Recog., pages 1–6, 2017.
en
dc.relation.hasversion
S. N. Gowda. Synthetic sample selection for generalized zero-shot learning. In IEEE Conf. Comput. Vis. Pattern Recog., pages 58–67, 2023.
en
dc.relation.hasversion
S. N. Gowda, A. Arnab, and J. Huang. Optimizing vivit training: Time and memory reduction for action recognition. arXiv preprint arXiv:2306.04822, 2023.
en
dc.relation.hasversion
S. N. Gowda, P. Eustratiadis, T. Hospedales, and L. Sevilla-Lara. Alba: Reinforcement learning for video object segmentation. arXiv preprint arXiv:2005.13039, 2020.
en
dc.relation.hasversion
S. N. Gowda, M. Rohrbach, F. Keller, and L. Sevilla-Lara. Learn2augment: Learning to composite videos for data augmentation in action recognition. In Eur. Conf. Comput. Vis., pages 242–259. Springer, 2022.
en
dc.relation.hasversion
S. N. Gowda, M. Rohrbach, and L. Sevilla-Lara. Smart frame selection for action recognition. In AAAI, volume 35, pages 1451–1459, 2021.
en
dc.relation.hasversion
S. N. Gowda and L. Sevilla-Lara. Telling stories for common sense zero-shot action recognition. arXiv preprint arXiv:2309.17327, 2023.
en
dc.relation.hasversion
S. N. Gowda, L. Sevilla-Lara, F. Keller, and M. Rohrbach. Claster: clustering with reinforcement learning for zero-shot action recognition. In Eur. Conf. Comput. Vis., pages 187–203. Springer, 2022.
en
dc.relation.hasversion
S. N. Gowda, L. Sevilla-Lara, K. Kim, F. Keller, and M. Rohrbach. A new split for evaluating true zero-shot action recognition. arXiv preprint arXiv:2107.13029, 2021.
en
dc.subject
video
en
dc.subject
Resource efficient action recognition
en
dc.subject
real-world action recognition
en
dc.subject
memory
en
dc.subject
data efficient systems
en
dc.subject
action recognition process
en
dc.subject
video transformers
en
dc.subject
zero-shot learning
en
dc.title
Resource efficient action recognition in videos
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en
Files
Original bundle
1 - 1 of 1
- Name:
- GowdaSN_2023.pdf
- Size:
- 23.57 MB
- Format:
- Adobe Portable Document Format
- Description:
This item appears in the following Collection(s)

