Video Summarization Using Global Attention With Memory Network and LSTM

Published in The Fifth IEEE International Conference on Multimedia Big Data., 2019

Recommended citation: Shashwat Uttam*, Yaman Kumar*, Dhruva Sahrawat*, Mansi Agarwal, Rajiv Ratn Shah, Debanjan Mahata. The Fifth IEEE International Conference on Multimedia Big Data. BigMM 2019.

[PDF] [DOI]

Abstract

Videos are one of the most engaging and interesting mediums of effective information delivery and constitute the majority of the content generated online today. As human attention span shrinks, it is imperative to shorten videos while maintaining most of its information. The premier challenge is that summaries more intuitive to a human are difficult for machines to generalize. We present a simple approach to video summarization using Kernel Temporal Segmentation (KTS) for shot segmentation and a global attention based modified memory network module with LSTM for shot score learning. The modified memory network termed as Global Attention Memory Module (GAMM) increases the learning capability of the model and with the addition of LSTM, it is further able to learn better contextual features. Experiments on the benchmark datasets TVSum and SumMe show that our method out-performs the current state of the art by about 15%.