Abstract
Exploration strategy is an essential part of learning agents in model-based Reinforcement Learning. R-MAX and V-MAX are PAC-MDP strategies proved to have polynomial sample complexity; yet, their exploration behavior tend to be overly cautious in practice. We propose the principle of Increasingly Cautious Optimism (ICO) to automatically cut off unnecessarily cautious exploration, and apply ICO to R-MAX and V-MAX, yielding two new strategies, namely Increasingly Cautious R-MAX (ICR) and Increasingly Cautious V-MAX (ICV). We prove that both ICR and ICV are PAC-MDP, and show that their improvement is guaranteed by a tighter sample complexity upper bound. Then, we demonstrate their significantly improved performance through empirical results.
Original language | English |
---|---|
Title of host publication | Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) |
Editors | Qiang YANG, Michael WOOLDRIDGE |
Publisher | International Joint Conferences on Artificial Intelligence |
Pages | 4033-4040 |
Number of pages | 8 |
Volume | 2015-January |
ISBN (Print) | 9781577357384 |
Publication status | Published - 2015 |
Externally published | Yes |