Learning to Explore Saliency for Stereoscopic Videos Via Component-Based Interaction

Qiudan ZHANG, Xu WANG*, Shiqi WANG, Zhenhao SUN, Sam KWONG, Jianmin JIANG

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

8 Citations (Scopus)

Abstract

In this paper, we devise a saliency prediction model for stereoscopic videos that learns to explore saliency inspired by the component-based interactions including spatial, temporal, as well as depth cues. The model first takes advantage of specific structure of 3D residual network (3D-ResNet) to model the saliency driven by spatio-temporal coherence from consecutive frames. Subsequently, the saliency inferred by implicit-depth is automatically derived based on the displacement correlation between left and right views by leveraging a deep convolutional network (ConvNet). Finally, a component-wise refinement network is devised to produce final saliency maps over time by aggregating saliency distributions obtained from multiple components. In order to further facilitate research towards stereoscopic video saliency, we create a new dataset including 175 stereoscopic video sequences with diverse content, as well as their dense eye fixation annotations. Extensive experiments support that our proposed model can achieve superior performance compared to the state-of-the-art methods on all publicly available eye fixation datasets.

Original languageEnglish
Article number9062560
Pages (from-to)5722-5736
Number of pages15
JournalIEEE Transactions on Image Processing
Volume29
Early online date9 Apr 2020
DOIs
Publication statusPublished - 2020
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 31670553, Grant 61871270, Grant 61672443, Grant 61620106008, and Grant 61702335, in part by the Natural Science Foundation of SZU under Grant 827000144, and in part by the National Engineering Laboratory for Big Data System Computing Technology of China. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Chun-Shien Lu.

Keywords

  • deep learning
  • stereoscopic video
  • Visual saliency

Fingerprint

Dive into the research topics of 'Learning to Explore Saliency for Stereoscopic Videos Via Component-Based Interaction'. Together they form a unique fingerprint.

Cite this