Explicit planning for efficient exploration in reinforcement learning

Liangpeng ZHANG, Ke TANG, Xin YAO*

*Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

5 Citations (Scopus)

Abstract

Efficient exploration is crucial to achieving good performance in reinforcement learning. Existing systematic exploration strategies (R-MAX, MBIE, UCRL, etc.), despite being promising theoretically, are essentially greedy strategies that follow some predefined heuristics. When the heuristics do not match the dynamics of Markov decision processes (MDPs) well, an excessive amount of time can be wasted in travelling through already-explored states, lowering the overall efficiency. We argue that explicit planning for exploration can help alleviate such a problem, and propose a Value Iteration for Exploration Cost (VIEC) algorithm which computes the optimal exploration scheme by solving an augmented MDP. We then present a detailed analysis of the exploration behaviour of some popular strategies, showing how these strategies can fail and spend O(n2md) or O(n2m + nmd) steps to collect sufficient data in some tower-shaped MDPs, while the optimal exploration scheme, which can be obtained by VIEC, only needs O(nmd), where n, m are the numbers of states and actions and d is the data demand. The analysis not only points out the weakness of existing heuristic-based strategies, but also suggests a remarkable potential in explicit planning for exploration. © 2019 Neural information processing systems foundation. All rights reserved.
Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems, 32 : 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)
EditorsHanna M. WALLACH, Hugo LAROCHELLE, Alina BEYGELZIMER, Florence D'ALCHÉ-BUC, Edward A. FOX, Roman GARNETT
PublisherNeural Information Processing Systems Foundation
Pages7488–7497
Number of pages10
ISBN (Print)9781713807933
Publication statusPublished - 2019
Externally publishedYes
Event33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver Convention Center, Vancouver, Canada
Duration: 8 Dec 201914 Dec 2019
https://nips.cc/Conferences/2019

Publication series

NameAdvances in Neural Information Processing Systems
ISSN (Print)1049-5258

Conference

Conference33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019
Abbreviated titleNeurIPS 2019
Country/TerritoryCanada
CityVancouver
Period8/12/1914/12/19
Internet address

Bibliographical note

This work was supported by EPSRC (Grant Nos. EP/J017515/1 and EP/P005578/1), the Royal Society (through a Newton Advanced Fellowship to Ke Tang and hosted by Xin Yao), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No. 2017ZT07X386), Shenzhen Peacock Plan (Grant No. KQTD2016112514355531) and the Program for University Key Laboratory of Guangdong Province (Grant No. 2017KSYS008).

Fingerprint

Dive into the research topics of 'Explicit planning for efficient exploration in reinforcement learning'. Together they form a unique fingerprint.

Cite this