TY - JOUR
T1 - Rate Control Method Based on Deep Reinforcement Learning for Dynamic Video Sequences in HEVC
AU - ZHOU, Mingliang
AU - WEI, Xuekai
AU - KWONG, Sam
AU - JIA, Weijia
AU - FANG, Bin
PY - 2021
Y1 - 2021
N2 - Rate control (RC) plays a critical role in the transmission of high-quality video data under certain bandwidth restrictions in High Efficiency Video Coding (HEVC). Most current HEVC RC algorithms based on spatio-temporal information for rate-distortion (R-D) model parameters cannot effectively handle the cases with dynamic video sequences that contain fast moving objects, significant object occlusion or scene changes. In this paper, we propose an RC method based on deep reinforcement learning (DRL) for dynamic video sequences in HEVC to improve the coding efficiency. First, the rate control problem is formulated as a Markov decision process (MDP) problem. Second, with the MDP model, we develop a DRL-based algorithm to find the optimal quantization parameters (QPs) by training a deep neural network. The resulting intelligent agent selects the optimal RC strategy to reduce distortion, buffer and quality fluctuations by observing the current state of the encoder. The asynchronous advantage actor-critic (A3C) method is used to solve the MDP problem. Finally, the proposed DRL-based RC method is implemented in the newest video coding standard. Experimental results show that the proposed method offers substantially enhanced RC accuracy and consistently outperforms HEVC reference software and other state-of-the-art algorithms.
AB - Rate control (RC) plays a critical role in the transmission of high-quality video data under certain bandwidth restrictions in High Efficiency Video Coding (HEVC). Most current HEVC RC algorithms based on spatio-temporal information for rate-distortion (R-D) model parameters cannot effectively handle the cases with dynamic video sequences that contain fast moving objects, significant object occlusion or scene changes. In this paper, we propose an RC method based on deep reinforcement learning (DRL) for dynamic video sequences in HEVC to improve the coding efficiency. First, the rate control problem is formulated as a Markov decision process (MDP) problem. Second, with the MDP model, we develop a DRL-based algorithm to find the optimal quantization parameters (QPs) by training a deep neural network. The resulting intelligent agent selects the optimal RC strategy to reduce distortion, buffer and quality fluctuations by observing the current state of the encoder. The asynchronous advantage actor-critic (A3C) method is used to solve the MDP problem. Finally, the proposed DRL-based RC method is implemented in the newest video coding standard. Experimental results show that the proposed method offers substantially enhanced RC accuracy and consistently outperforms HEVC reference software and other state-of-the-art algorithms.
KW - dynamically changing video
KW - rate control
KW - rate-distortion optimization
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85102021527&partnerID=8YFLogxK
U2 - 10.1109/TMM.2020.2992968
DO - 10.1109/TMM.2020.2992968
M3 - Journal Article (refereed)
SN - 1520-9210
VL - 23
SP - 1106
EP - 1121
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -