Log-normality and skewness of estimated state/action values in reinforcement learning

Liangpeng ZHANG, Ke TANG, Xin YAO

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

4 Citations (Scopus)

Abstract

Under/overestimation of state/action values are harmful for reinforcement learning agents. In this paper, we show that a state/action value estimated using the Bellman equation can be decomposed to a weighted sum of path-wise values that follow log-normal distributions. Since log-normal distributions are skewed, the distribution of estimated state/action values can also be skewed, leading to an imbalanced likelihood of under/overestimation. The degree of such imbalance can vary greatly among actions and policies within a single problem instance, making the agent prone to select actions/policies that have inferior expected return and higher likelihood of overestimation. We present a comprehensive analysis to such skewness, examine its factors and impacts through both theoretical and empirical results, and discuss the possible ways to reduce its undesirable effects. © 2017 Neural information processing systems foundation. All rights reserved.
Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems, 30 : 31st Annual Conference on Neural Information Processing Systems (NIPS 2017)
EditorsUlrike VON LUXBURG, Isabelle GUYON , Samy BENGIO , Hanna WALLACH, Rob FERGUS, S.V.N. VISHWANATHAN, Roman GARNETT
PublisherNeural Information Processing Systems Foundation
Pages1805-1815
Number of pages11
ISBN (Print)9781510860964
Publication statusPublished - Dec 2017
Externally publishedYes
Event31st Conference on Neural Information Processing Systems - Long Beach, United States
Duration: 4 Dec 20179 Dec 2017

Publication series

NameAdvances in Neural Information Processing Systems
ISSN (Print)1049-5258

Conference

Conference31st Conference on Neural Information Processing Systems
Abbreviated titleNIPS 2017
Country/TerritoryUnited States
CityLong Beach
Period4/12/179/12/17

Bibliographical note

This paper was supported by Ministry of Science and Technology of China (Grant No. 2017YFB1003102), the National Natural Science Foundation of China (Grant Nos. 61672478 and 61329302), the Science and Technology Innovation Committee Foundation of Shenzhen (Grant No. ZDSYS201703031748284), EPSRC (Grant No. J017515/1), and in part by the Royal Society Newton Advanced Fellowship (Reference No. NA150123).

Fingerprint

Dive into the research topics of 'Log-normality and skewness of estimated state/action values in reinforcement learning'. Together they form a unique fingerprint.

Cite this