Abstract
Efficient exploration is crucial to achieving good performance in reinforcement learning. Existing systematic exploration strategies (R-MAX, MBIE, UCRL, etc.), despite being promising theoretically, are essentially greedy strategies that follow some predefined heuristics. When the heuristics do not match the dynamics of Markov decision processes (MDPs) well, an excessive amount of time can be wasted in travelling through already-explored states, lowering the overall efficiency. We argue that explicit planning for exploration can help alleviate such a problem, and propose a Value Iteration for Exploration Cost (VIEC) algorithm which computes the optimal exploration scheme by solving an augmented MDP. We then present a detailed analysis of the exploration behaviour of some popular strategies, showing how these strategies can fail and spend O(n2md) or O(n2m + nmd) steps to collect sufficient data in some tower-shaped MDPs, while the optimal exploration scheme, which can be obtained by VIEC, only needs O(nmd), where n, m are the numbers of states and actions and d is the data demand. The analysis not only points out the weakness of existing heuristic-based strategies, but also suggests a remarkable potential in explicit planning for exploration. © 2019 Neural information processing systems foundation. All rights reserved.
Original language | English |
---|---|
Title of host publication | Advances in Neural Information Processing Systems, 32 : 33rd Conference on Neural Information Processing Systems (NeurIPS 2019) |
Editors | Hanna M. WALLACH, Hugo LAROCHELLE, Alina BEYGELZIMER, Florence D'ALCHÉ-BUC, Edward A. FOX, Roman GARNETT |
Publisher | Neural Information Processing Systems Foundation |
Pages | 7488–7497 |
Number of pages | 10 |
ISBN (Print) | 9781713807933 |
Publication status | Published - 2019 |
Externally published | Yes |
Event | 33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver Convention Center, Vancouver, Canada Duration: 8 Dec 2019 → 14 Dec 2019 https://nips.cc/Conferences/2019 |
Publication series
Name | Advances in Neural Information Processing Systems |
---|---|
ISSN (Print) | 1049-5258 |
Conference
Conference | 33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 |
---|---|
Abbreviated title | NeurIPS 2019 |
Country/Territory | Canada |
City | Vancouver |
Period | 8/12/19 → 14/12/19 |
Internet address |
Funding
This work was supported by EPSRC (Grant Nos. EP/J017515/1 and EP/P005578/1), the Royal Society (through a Newton Advanced Fellowship to Ke Tang and hosted by Xin Yao), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No. 2017ZT07X386), Shenzhen Peacock Plan (Grant No. KQTD2016112514355531) and the Program for University Key Laboratory of Guangdong Province (Grant No. 2017KSYS008).