Abstract
The automatic completion of multiple peg-in-hole assembly tasks by robots remains a formidable challenge because the traditional control strategies require a complex analysis of the contact model. In this paper, the assembly task is formulated as a Markov decision process, and a model-driven deep deterministic policy gradient algorithm is proposed to accomplish the assembly task through the learned policy without analyzing the contact states. In our algorithm, the learning process is driven by a simple traditional force controller. In addition, a feedback exploration strategy is proposed to ensure that our algorithm can efficiently explore the optimal assembly policy and avoid risky actions, which can address the data efficiency and guarantee stability in realistic assembly scenarios. To improve the learning efficiency, we utilize a fuzzy reward system for the complex assembly process. Then, simulations and realistic experiments of a dual peg-in-hole assembly demonstrate the effectiveness of the proposed algorithm. The advantages of the fuzzy reward system and feedback exploration strategy are validated by comparing the performances of different cases in simulations and experiments.
| Original language | English |
|---|---|
| Article number | 8454796 |
| Pages (from-to) | 1658-1667 |
| Number of pages | 10 |
| Journal | IEEE Transactions on Industrial Informatics |
| Volume | 15 |
| Issue number | 3 |
| Early online date | 5 Sept 2018 |
| DOIs | |
| Publication status | Published - Mar 2019 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2005-2012 IEEE.
Funding
This work was supported in part by the National Natural Science Foundation of China under Grant 51675291 and Grant U1613205, and in part by the State Key Laboratory of China (SKLT2018C04). Paper no. TII-18-0499.
Keywords
- Continuous actions control
- feedback exploration
- fuzzy reward
- intelligent assembly
- multiple peg-in-hole
- reinforcement learning