Learning Motion-guided Multi-scale Memory Features for Video Shadow Detection

Junhao LIN, Jiaxing SHEN, Xin YANG, Huazhu FU, Qing ZHANG, Ping LI, Bin SHENG, Liansheng WANG, Lei ZHU

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Natural images often contain multiple shadow regions, and existing video shadow detection methods tend to fail in fully identifying all shadow regions, since they mainly learned temporal features at single-scale and single memory. In this work, we develop a novel convolutional neural network (CNN) to learn motion-guided multi-scale memory features to obtain multi-scale temporal information based on multiple network memories for boosting video shadow detection. To do so, our network first constructs three memories (i.e., a global memory, a local memory, and a motion memory) to combine spatial context and object motion for detecting shadows. Based on these three memories, we then devise a multi-scale motion-guided long-short transformer (MMLT) module to learn multi-scale temporal and motion memory features for predicting a shadow detection map of the input video frame. Our MMLT module includes a dense-scale long transformer (DLT), a dense-scale short transformer (DST), and a dense-scale motion transformer (DMT) to read three memories for learning multi-scale transformer features. Our DLT, DST, and DMT consist of a set of memory-read pooling attention (MPA) blocks and densely connect these output features of multiple MPA blocks to learn multi-scale transformer features since the scales of these output features are varied. By doing so, we can more accurately identify multiple shadow regions with different sizes from the input video. Moreover, we devise a self-supervised pretext task to pre-training the feature encoder for enhancing the downstream video shadow detection. Experimental results on three benchmark datasets show that our video shadow detection network quantitatively and qualitatively outperforms 26 state-of-the-art methods.
Original languageEnglish
Pages (from-to)1
Number of pages1
JournalIEEE Transactions on Circuits and Systems for Video Technology
Early online date26 Jul 2024
DOIs
Publication statusE-pub ahead of print - 26 Jul 2024

Bibliographical note

Publisher Copyright:
IEEE

Funding

This work is supported by the Guangzhou-HKUST(GZ) Joint Funding Program (No. 2023A03J0671), the InnoHK funding launched by Innovation and Technology Commission, Hong Kong SAR, the Guangzhou Industrial Information and Intelligent Key Laboratory Project (No. 2024A03J0628), the Nansha Key Area Science and Technology Project (No. 2023ZD003), and Guangzhou-HKUST(GZ) Joint Funding Program (No. 2024A03J0618).

Keywords

  • neural networks
  • video shadow detection

Fingerprint

Dive into the research topics of 'Learning Motion-guided Multi-scale Memory Features for Video Shadow Detection'. Together they form a unique fingerprint.

Cite this