TY - JOUR
T1 - Strategic choices in optimization
AU - CHOU, Cheng Wei
AU - CHOU, Ping Chiang
AU - CHRISTOPHE, Jean Joseph
AU - COUËTOUX, Adrien
AU - DE FREMINVILLE, Pierre
AU - GALICHET, Nicolas
AU - LEE, Chang Shing
AU - LIU, Jialin
AU - SAINT-PIERRE, David L.
AU - SEBAG, Michèle
AU - TEYTAUD, Olivier
AU - WANG, Mei Hui
AU - WU, Li Wen
AU - YEN, Shi Jim
PY - 2014/5
Y1 - 2014/5
N2 - Many decision problems have two levels: one for strategic decisions, and another for tactical management. This paper focuses on the strategic level, more specifically the sequential exploration of the possible options and the final selection (recommendation) of the best option. Several sequential exploration and recommendation criteria are considered and empirically compared on real world problems (board games, card games and energy management problems) in the uniform (1-player) and adversarial (2-player) settings. W.r.t. the sequential exploration part, the classical upper confidence bound algorithm, the exponential exploration-exploitation algorithm, the successive reject algorithm (designed specifically for simple regret), and the Bernstein races, are considered. W.r.t. the recommendation part, the selection is based on the empirically best arm, most played arm, lower confidence bounds, based on the reward distribution or variants thereof designed for risk control. This paper presents a systematic study, comparing the coupling of the sequential exploration and recommendation variants on the considered problems in terms of their simple regret. A secondary contribution is that, to the best of our knowledge, this is the first win ever of a computer-kill-all Go player against professional human players [16].
AB - Many decision problems have two levels: one for strategic decisions, and another for tactical management. This paper focuses on the strategic level, more specifically the sequential exploration of the possible options and the final selection (recommendation) of the best option. Several sequential exploration and recommendation criteria are considered and empirically compared on real world problems (board games, card games and energy management problems) in the uniform (1-player) and adversarial (2-player) settings. W.r.t. the sequential exploration part, the classical upper confidence bound algorithm, the exponential exploration-exploitation algorithm, the successive reject algorithm (designed specifically for simple regret), and the Bernstein races, are considered. W.r.t. the recommendation part, the selection is based on the empirically best arm, most played arm, lower confidence bounds, based on the reward distribution or variants thereof designed for risk control. This paper presents a systematic study, comparing the coupling of the sequential exploration and recommendation variants on the considered problems in terms of their simple regret. A secondary contribution is that, to the best of our knowledge, this is the first win ever of a computer-kill-all Go player against professional human players [16].
KW - Games
KW - Investment optimization
KW - Metagaming
KW - Multi-armed bandit problems
KW - Simple regret
KW - Strategic choices
KW - Upper confidence bounds
UR - http://www.scopus.com/inward/record.url?scp=84902346329&partnerID=8YFLogxK
M3 - Journal Article (refereed)
AN - SCOPUS:84902346329
SN - 1016-2364
VL - 30
SP - 727
EP - 747
JO - Journal of Information Science and Engineering
JF - Journal of Information Science and Engineering
IS - 3
ER -