Increasingly cautious optimism for practical PAC-MDP exploration

Liangpeng ZHANG, Ke TANG, Xin YAO

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

3 Citations (Scopus)

Abstract

Exploration strategy is an essential part of learning agents in model-based Reinforcement Learning. R-MAX and V-MAX are PAC-MDP strategies proved to have polynomial sample complexity; yet, their exploration behavior tend to be overly cautious in practice. We propose the principle of Increasingly Cautious Optimism (ICO) to automatically cut off unnecessarily cautious exploration, and apply ICO to R-MAX and V-MAX, yielding two new strategies, namely Increasingly Cautious R-MAX (ICR) and Increasingly Cautious V-MAX (ICV). We prove that both ICR and ICV are PAC-MDP, and show that their improvement is guaranteed by a tighter sample complexity upper bound. Then, we demonstrate their significantly improved performance through empirical results.
Original languageEnglish
Title of host publicationProceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015)
EditorsQiang YANG, Michael WOOLDRIDGE
PublisherInternational Joint Conferences on Artificial Intelligence
Pages4033-4040
Number of pages8
Volume2015-January
ISBN (Print)9781577357384
Publication statusPublished - 2015
Externally publishedYes

Fingerprint

Dive into the research topics of 'Increasingly cautious optimism for practical PAC-MDP exploration'. Together they form a unique fingerprint.

Cite this