A Novel Task-Driven Diffusion-Based Policy With Affordance Learning for Generalizable Manipulation of Articulated Objects

  • Hao ZHANG
  • , Zhen KAN*
  • , Weiwei SHANG
  • , Yongduan SONG
  • *Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Despite recent advances in dexterous manipulations, the manipulation of articulated objects and generalization across different categories remain significant challenges. To address these issues, we introduce DART, a novel framework that enhances a diffusion-based policy with affordance learning and linear temporal (DART) logic (LTL) representations to improve the learning efficiency and generalizability of articulated dexterous manipulation. Specifically, DART leverages LTL to understand task semantics and affordance learning to identify optimal interaction points. The diffusion-based policy then generalizes these interactions across various categories. In addition, we exploit an optimization method based on interaction data to refine actions, overcoming the limitations of traditional diffusion policies that typically rely on offline reinforcement learning or learning from demonstrations. Experimental results demonstrate that DART outperforms most existing methods in manipulation ability, generalization performance, transfer reasoning, and robustness.
Original languageEnglish
Number of pages13
JournalIEEE/ASME Transactions on Mechatronics
Early online date10 Sept 2025
DOIs
Publication statusE-pub ahead of print - 10 Sept 2025
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 1996-2012 IEEE.

Funding

This work was supported in part by the National Key R&D Program of China under Grant 2022YFB4701400/4701403 and in part by the National Natural Science Foundation of China under Grant U201360.

Keywords

  • Affordance learning
  • dexterous manipulation
  • diffusion policy
  • linear temporal logic

Fingerprint

Dive into the research topics of 'A Novel Task-Driven Diffusion-Based Policy With Affordance Learning for Generalizable Manipulation of Articulated Objects'. Together they form a unique fingerprint.

Cite this