Abstract
Video data has become the largest source of data consumed globally. Due to the rapid growth of video applications and boosting demands for higher quality video services, video data volume has been increasing explosively worldwide, which has been the most severe challenge for multimedia computing, transmission and storage. Video coding by compressing videos into a much smaller size is one of the key solutions; however, its development has become saturated to some extent while the compression ratio continuously grows in the last three decades. Machine leaning algorithms, especially those employing deep learning, which are capable of discovering knowledge from unstructured massive data and providing data-driven predictions, provide new opportunities for further upgrading video coding technologies. In this article, we present a review on machine learning based video encoding optimization, aiming to provide researchers with a strong foundation and inspire future developments for data-driven video coding. Firstly, we analyze the representations and redundancies of video data. Secondly, we review the development of video coding standards and key requirements. Subsequently, we present a systemic survey on the recent advances and challenges associated with the machine learning based video coding optimizations from three key aspects, including high efficiency, low complexity and high visual quality. Their workflows, representative schemes, performances, advantages and disadvantages are analyzed in detail. Finally, the challenges and opportunities are identified, which may provide the academic and industrial communities with groundwork and potential directions for future research.
Original language | English |
---|---|
Pages (from-to) | 395-423 |
Journal | Information Sciences |
Volume | 506 |
Early online date | 29 Jul 2019 |
DOIs | |
Publication status | Published - Jan 2020 |
Externally published | Yes |
Bibliographical note
This work was supported in part by the National Natural Science Foundation of China under Grant 61672443, 61772344 and 61871372, in part by Guangdong Natural Science Foundation for Distinguished Young Scholar under Grant 2016A030306022, in part by the Key Project for Guangdong Provincial Science and Technology Development under Grant 2017B010110014, in part by RGC General Research Fund (GRF) 9042322, 9042489 (CityU 11200116,11206317), Shenzhen International Collaborative ResearchProject under Grant GJHZ20170314155404913, in part by the Shenzhen Science and Technology Program under Grant No. JCYJ20170811160212033 and JCYJ20180507183823045, in part by Guangdong International Science and TechnologyCooperative Research Project under Grant 2018A050506063, in part by Membership of Youth Innovation Promotion Association, Chinese Academy of Sciences under Grant 2018392.Keywords
- Convolutional neural network
- Deep learning
- High efficiency video coding
- Machine learning
- Mode decision
- Versatile video coding
- Video coding
- Visual quality assessment