Data-Efficient Hierarchical Reinforcement Learning for Robotic Assembly Control Applications

  • Zhimin HOU
  • , Jiajun FEI
  • , Yuelin DENG
  • , Jing XU*
  • *Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

75 Citations (Scopus)

Abstract

Hierarchical reinforcement learning (HRL) can learn the decomposed subpolicies corresponding to the local state-space; therefore, it is a promising solution to complex robotic assembly control tasks with fewer interactions with environments. Most existing HRL algorithms often require on-policy learning, where resampling is necessary for every training step. In this article, we propose a data-efficient HRL via off-policy learning with three main contributions. First, two augmented MDPs (Markov decision processes) are reformulated to learn the higher level policy and lower level policy from the same samples. Second, to learn higher level policy that leads to efficient exploration, a softmax gating policy is derived to determine the lower level policy for interacting with the environment. Third, to learn the lower level policies via off-policy samples from one lower level replay buffer, the higher level policy derived by the option-value network is adopted to select the appropriate option for learning the corresponding lower level policy. The data-efficiency performance of our algorithm is validated on two simulations and real-world robotic dual peg-in-hole assembly tasks.

Original languageEnglish
Article number9264727
Pages (from-to)11565-11575
Number of pages11
JournalIEEE Transactions on Industrial Electronics
Volume68
Issue number11
Early online date19 Nov 2020
DOIs
Publication statusPublished - Nov 2021
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 1982-2012 IEEE.

Funding

This work was supported in part by the National Key R&D Program of China under Grant 2017YFC0822204; in part by the National Natural Science Foundation of China under Grant U1613205, Grant 51675291, and Grant 51935010; in part by the Beijing Municipal Natural Science Foundation under Grant L192001; in part by the Funding for Basic Scientific Research Program under Grant JCKY2018205B029; and in part by the State Key Laboratory of China under Grant SKL2020C15.

Keywords

  • Data-efficiency
  • hierarchical reinforcement learning
  • robotic assembly control

Fingerprint

Dive into the research topics of 'Data-Efficient Hierarchical Reinforcement Learning for Robotic Assembly Control Applications'. Together they form a unique fingerprint.

Cite this