Increasingly cautious optimism for practical PAC-MDP exploration

Liangpeng ZHANG, Ke TANG, Xin YAO

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

3 Citations (Scopus)

Abstract

Exploration strategy is an essential part of learning agents in model-based Reinforcement Learning. R-MAX and V-MAX are PAC-MDP strategies proved to have polynomial sample complexity; yet, their exploration behavior tend to be overly cautious in practice. We propose the principle of Increasingly Cautious Optimism (ICO) to automatically cut off unnecessarily cautious exploration, and apply ICO to R-MAX and V-MAX, yielding two new strategies, namely Increasingly Cautious R-MAX (ICR) and Increasingly Cautious V-MAX (ICV). We prove that both ICR and ICV are PAC-MDP, and show that their improvement is guaranteed by a tighter sample complexity upper bound. Then, we demonstrate their significantly improved performance through empirical results.
Original languageEnglish
Title of host publicationIJCAI International Joint Conference on Artificial Intelligence
PublisherInternational Joint Conferences on Artificial Intelligence
Pages4033-4040
Number of pages8
Volume2015-January
ISBN (Print)9781577357384
Publication statusPublished - 2015
Externally publishedYes

Fingerprint

Dive into the research topics of 'Increasingly cautious optimism for practical PAC-MDP exploration'. Together they form a unique fingerprint.

Cite this