TY - JOUR
T1 - Reinforcement learning-based QoE-oriented dynamic adaptive streaming framework
AU - WEI, Xuekai
AU - ZHOU, Mingliang
AU - KWONG, Sam
AU - YUAN, Hui
AU - WANG, Shiqi
AU - ZHU, Guopu
AU - CAO, Jingchao
PY - 2021/8
Y1 - 2021/8
N2 - Dynamic adaptive streaming over the HTTP (DASH) standard has been widely adopted by many content providers for online video transmission and greatly improve the performance. Designing an efficient DASH system is challenging because of the inherent large fluctuations characterizing both encoded video sequences and network traces. In this paper, a reinforcement learning (RL)-based DASH technique that addresses user quality of experience (QoE) is constructed. The DASH adaptive bitrate (ABR) selection problem is formulated as a Markov decision process (MDP) problem. Accordingly, an RL-based solution is proposed to solve the MDP problem, in which the DASH clients act as the RL agent, and the network variation constitutes the environment. The proposed user QoE is used as the reward by jointly considering the video quality and buffer status. The goal of the RL algorithm is to select a suitable video quality level for each video segment to maximize the total reward. Then, the proposed RL-based ABR algorithm is embedded in the QoE-oriented DASH framework. Experimental results show that the proposed RL-based ABR algorithm outperforms state-of-the-art schemes in terms of both temporal and visual QoE factors by a noticeable margin while guaranteeing application-level fairness when multiple clients share a bottlenecked network.
AB - Dynamic adaptive streaming over the HTTP (DASH) standard has been widely adopted by many content providers for online video transmission and greatly improve the performance. Designing an efficient DASH system is challenging because of the inherent large fluctuations characterizing both encoded video sequences and network traces. In this paper, a reinforcement learning (RL)-based DASH technique that addresses user quality of experience (QoE) is constructed. The DASH adaptive bitrate (ABR) selection problem is formulated as a Markov decision process (MDP) problem. Accordingly, an RL-based solution is proposed to solve the MDP problem, in which the DASH clients act as the RL agent, and the network variation constitutes the environment. The proposed user QoE is used as the reward by jointly considering the video quality and buffer status. The goal of the RL algorithm is to select a suitable video quality level for each video segment to maximize the total reward. Then, the proposed RL-based ABR algorithm is embedded in the QoE-oriented DASH framework. Experimental results show that the proposed RL-based ABR algorithm outperforms state-of-the-art schemes in terms of both temporal and visual QoE factors by a noticeable margin while guaranteeing application-level fairness when multiple clients share a bottlenecked network.
KW - Machine learning
KW - MPEG-DASH
KW - Quality of experience
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85107705501&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2021.05.012
DO - 10.1016/j.ins.2021.05.012
M3 - Journal Article (refereed)
SN - 0020-0255
VL - 569
SP - 786
EP - 803
JO - Information Sciences
JF - Information Sciences
ER -