Rate control (RC) plays a critical role in the transmission of high-quality video data under certain bandwidth restrictions in High Efficiency Video Coding (HEVC). Most current HEVC RC algorithms based on spatio-temporal information for rate-distortion (R-D) model parameters cannot effectively handle the cases with dynamic video sequences that contain fast moving objects, significant object occlusion or scene changes. In this paper, we propose an RC method based on deep reinforcement learning (DRL) for dynamic video sequences in HEVC to improve the coding efficiency. First, the rate control problem is formulated as a Markov decision process (MDP) problem. Second, with the MDP model, we develop a DRL-based algorithm to find the optimal quantization parameters (QPs) by training a deep neural network. The resulting intelligent agent selects the optimal RC strategy to reduce distortion, buffer and quality fluctuations by observing the current state of the encoder. The asynchronous advantage actor-critic (A3C) method is used to solve the MDP problem. Finally, the proposed DRL-based RC method is implemented in the newest video coding standard. Experimental results show that the proposed method offers substantially enhanced RC accuracy and consistently outperforms HEVC reference software and other state-of-the-art algorithms.
Bibliographical noteThis work was supported in part by the Key Project of Science and Technology Innovation 2030 supported by the Ministry of Science and Technology of China under Grant 2018AAA0101301, in part by the Natural Science Foundation of China under Grants 61871342, 61772344, and 61672443, in part by the Hong Kong RGC General Research Funds under Grants 9042820 (CityU 11219019), 9042489 (CityU 11206317), 9042322 (CityU 11200116), and 9048123 (CityU 21211518), in part by the Chongqing University under Grant 02160011044118, in part by the Natural Science Foundation of China under Grant 61876026, in part by the Research on Key Technologies of Pedestrian Recognition for Different Resolution under Grant qnsy2018006, in part by the Research on Key Technologies of pedestrian recognition in complex scenes under Grant CST_2019SN02, and in part by the Research on pedestrian recognition for monitoring Qiannan Kehe discipline construction under Grant Zi(2018)No.7.
- dynamically changing video
- rate control
- rate-distortion optimization
- Reinforcement learning