Learning to Explore Intrinsic Saliency for Stereoscopic Video

Qiudan ZHANG, Xu WANG, Shiqi WANG, Shikai LI, Sam KWONG, Jianmin JIANG

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

6 Citations (Scopus)


The human visual system excels at biasing the stereoscopic visual signals by the attention mechanisms. Traditional methods relying on the low-level features and depth relevant information for stereoscopic video saliency prediction have fundamental limitations. For example, it is cumbersome to model the interactions between multiple visual cues including spatial, temporal, and depth information as a result of the sophistication. In this paper, we argue that the high-level features are crucial and resort to the deep learning framework to learn the saliency map of stereoscopic videos. Driven by spatio-temporal coherence from consecutive frames, the model first imitates the mechanism of saliency by taking advantage of the 3D convolutional neural network. Subsequently, the saliency originated from the intrinsic depth is derived based on the correlations between left and right views in a data-driven manner. Finally, a Convolutional Long Short-Term Memory (Conv-LSTM) based fusion network is developed to model the instantaneous interactions between spatio-temporal and depth attributes, such that the ultimate stereoscopic saliency maps over time are produced. Moreover, we establish a new large-scale stereoscopic video saliency dataset (SVS) including 175 stereoscopic video sequences and their fixation density annotations, aiming to comprehensively study the intrinsic attributes for stereoscopic video saliency detection. Extensive experiments show that our proposed model can achieve superior performance compared to the state-of-the-art methods on the newly built dataset for stereoscopic videos.
Original languageEnglish
Title of host publicationProceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Number of pages10
ISBN (Electronic)9781728132938
ISBN (Print)9781728132945
Publication statusPublished - Jun 2019
Externally publishedYes
Event32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019) - Long Beach, United States
Duration: 16 Jun 201920 Jun 2019


Conference32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019)
Country/TerritoryUnited States
CityLong Beach

Bibliographical note

This work was supported in part by the National Natural Science Foundation of China under Grant 61871270, 61672443 and 61620106008, in part by the Hong Kong RGC Early Career Scheme under Grant 9048122 (CityU 21211018), in part by the Guangdong Nature Science Foundation of China under Grant 2016A030310058, in part by the Natural Science Foundation of SZU (grant no. 827000144), and in part by the National Engineering Laboratory for Big Data System Computing Technology of China.


  • 3D from Multiview and Sensors
  • Datasets and Evaluation
  • Deep Learning
  • RGBD sensors and analytics
  • Video Analytics


Dive into the research topics of 'Learning to Explore Intrinsic Saliency for Stereoscopic Video'. Together they form a unique fingerprint.

Cite this