Feedback Deep Deterministic Policy Gradient with Fuzzy Reward for Robotic Multiple Peg-in-Hole Assembly Tasks

  • Jing XU*
  • , Zhimin HOU
  • , Wei WANG
  • , Bohao XU
  • , Kuangen ZHANG
  • , Ken CHEN
  • *Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

170 Citations (Scopus)

Abstract

The automatic completion of multiple peg-in-hole assembly tasks by robots remains a formidable challenge because the traditional control strategies require a complex analysis of the contact model. In this paper, the assembly task is formulated as a Markov decision process, and a model-driven deep deterministic policy gradient algorithm is proposed to accomplish the assembly task through the learned policy without analyzing the contact states. In our algorithm, the learning process is driven by a simple traditional force controller. In addition, a feedback exploration strategy is proposed to ensure that our algorithm can efficiently explore the optimal assembly policy and avoid risky actions, which can address the data efficiency and guarantee stability in realistic assembly scenarios. To improve the learning efficiency, we utilize a fuzzy reward system for the complex assembly process. Then, simulations and realistic experiments of a dual peg-in-hole assembly demonstrate the effectiveness of the proposed algorithm. The advantages of the fuzzy reward system and feedback exploration strategy are validated by comparing the performances of different cases in simulations and experiments.

Original languageEnglish
Article number8454796
Pages (from-to)1658-1667
Number of pages10
JournalIEEE Transactions on Industrial Informatics
Volume15
Issue number3
Early online date5 Sept 2018
DOIs
Publication statusPublished - Mar 2019
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2005-2012 IEEE.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 51675291 and Grant U1613205, and in part by the State Key Laboratory of China (SKLT2018C04). Paper no. TII-18-0499.

Keywords

  • Continuous actions control
  • feedback exploration
  • fuzzy reward
  • intelligent assembly
  • multiple peg-in-hole
  • reinforcement learning

Fingerprint

Dive into the research topics of 'Feedback Deep Deterministic Policy Gradient with Fuzzy Reward for Robotic Multiple Peg-in-Hole Assembly Tasks'. Together they form a unique fingerprint.

Cite this