[1]李超,张智,夏桂华,等.基于强化学习的学习变阻抗控制[J].哈尔滨工程大学学报,2019,40(02):304-311.[doi:10.11990/jheu.201803020]
 LI Chao,ZHANG Zhi,XIA Guihua,et al.Learning variable impedance control based on reinforcement learning[J].hebgcdxxb,2019,40(02):304-311.[doi:10.11990/jheu.201803020]
点击复制

基于强化学习的学习变阻抗控制(/HTML)
分享到:

《哈尔滨工程大学学报》[ISSN:1006-6977/CN:61-1281/TN]

卷:
40
期数:
2019年02期
页码:
304-311
栏目:
出版日期:
2019-02-05

文章信息/Info

Title:
Learning variable impedance control based on reinforcement learning
作者:
李超1 张智1 夏桂华1 谢心如1 朱齐丹1 刘琦2
1. 哈尔滨工程大学 自动化学院, 黑龙江 哈尔滨 150001;
2. 中国工程物理研究院 化工材料研究所, 四川 绵阳 621000
Author(s):
LI Chao1 ZHANG Zhi1 XIA Guihua1 XIE Xinru1 ZHU Qidan1 LIU Qi2
1. College of Automation, Harbin Engineering University, Harbin 150001, China;
2. Institute of Chemical Materials, China Academy of Engineering Physics, Mianyang 621000, China
关键词:
机器人阻抗控制力控制控制策略强化学习高效高斯过程成本函数
分类号:
TP242
DOI:
10.11990/jheu.201803020
文献标志码:
A
摘要:
为了提高力控制的性能,使机器人高效自主地学习执行力控制任务,本文提出一种学习变阻抗控制方法。该方法采用基于模型的强化学习算法学习最优阻抗调节策略,使用高斯过程模型作为系统的变换动力学模型,允许概率化的推理与规划,并在成本函数中加入能量损失项,实现误差和能量的权衡。仿真实验结果表明:该学习变阻抗控制方法具有高效性,仅需数次交互即可成功学习完成力控制任务,大大减少了所需的交互次数与交互时间,且学习得到的阻抗控制策略具有仿生特性,可用于学习执行力敏感型任务。

参考文献/References:

[1] TSUJI T, TANAKA Y. On-line learning of robot arm impedance using neural networks[J]. Robotics and autonomous systems, 2005, 52(4):257-271.
[2] ARIMOTO S, HAN H Y, CHEAH C C, et al. Extension of impedance matching to nonlinear dynamics of robotic tasks[J]. Systems & control letters, 1999, 36(2):109-119.
[3] VAN DEN KIEBOOM J, IJSPEERT A J. Exploiting natural dynamics in biped locomotion using variable impedance control[C]//Proceedings of the 2013 13th IEEE-RAS International Conference on Humanoid Robots. Atlanta, GA, USA, 2013:348-353.
[4] BUCHLI J, THEODOROU E, STULP F, et al. Variable impedance control:a reinforcement learning approach[C]//Proceedings of Robotics:Science and Systems. Zaragoza, Spain, 2010:153.
[5] 付春江, 王如彬. 人体手臂阻抗控制研究[J]. 力学季刊, 2010, 31(1):37-45.FU Chunjiang, WANG Rubin. On the human arm impedance control[J]. Chinese quarterly of mechanics, 2010, 31(1):37-45.
[6] STULP F, BUCHLI J, ELLMER A, et al. Model-free reinforcement learning of impedance control in stochastic environments[J]. IEEE transactions on autonomous mental development, 2012, 4(4):330-341.
[7] 黄瑞, 程洪, 郭宏亮. 基于增强学习的下肢助力外骨骼虚阻抗控制(英文)[J]. 电子科技大学学报, 2018, 47(3):321-329.HUANG Rui, CHENG Hong, GUO Hongliang. Learning virtual impedance for control of a human-coupled lower exoskeleton[J]. Journal of University of Electronic Science and Technology of China, 2018, 47(3):321-329.
[8] 邱静, 陈启明, 卢军, 等. 下肢助力外骨骼机器人自适应阻抗控制研究[J]. 电子科技大学学报, 2016, 45(4):689-695.QIU Jing, CHEN Qiming, LU Jun, et al. Learning-based adaptive impedance control for a human-powered augmentation lower exoskeleton[J]. Journal of University of Electronic Science and Technology of China, 2016, 45(4):689-695.
[9] DU Zhijiang, WANG Wei, YAN Zhiyuan, et al. Variable admittance control based on fuzzy reinforcement learning for minimally invasive surgery manipulator[J]. Sensors, 2017, 17(4):844.
[10] IZAWA J, KONDO T, ITO K. Biological arm motion through reinforcement learning[J]. Biological cybernetics, 2004, 91(1):10-22.
[11] MITROVIC D, KLANKE S, HOWARD M, et al. Exploiting sensorimotor stochasticity for learning control of variable impedance actuators[C]//Proceedings of the 2010 10th IEEE-RAS International Conference on Humanoid Robots. Nashville, TN, USA, 2010:536-541.
[12] BUCHLI J, STULP F, THEODOROU E, et al. Learning variable impedance control[J]. International journal of robotics research, 2011, 30(7):820-833.
[13] WINTER F, SAVERIANO M, LEE D. The role of coupling terms in variable impedance policies learning[C]//Proceedings of International Workshop on Human-Friendly Robotics. 2016.
[14] HOGAN N. Impedance control:an approach to manipulation. Part I-Theory. Part Ⅱ-Implementation. Part Ⅲ-Applications[J]. Journal of dynamic systems, measurement, and control, 1985, 107(1):1-24.
[15] DEISENROTH M P, NEUMANN G, PETERS J. A survey on policy search for robotics[J]. Foundations and trends in robotics, 2013, 2(1-2):1-142.
[16] DEISENROTH M P, RASMUSSEN C E. PILCO:a model-based and data-efficient approach to policy search[C]//Proceedings of the 28th International Conference on International Conference on Machine Learning. Bellevue, WA, USA, 2011:465-472.
[17] DEISENROTH M P, FOX D, RASMUSSEN C E. Gaussian processes for data-efficient learning in robotics and control[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(2):408-423.
[18] RASMUSSEN C E, WILLIAMS C K I. Gaussian processes for machine learning (adaptive computation and machine learning)[M]. Cambridge:MIT Press, 2006:69-106.

相似文献/References:

[1]吴健荣,王立权,王才东,等.一种机器人精度设计的新方法研究[J].哈尔滨工程大学学报,2010,(10):0.
 WU Jian-rong,WANG Li-quan,WANG Cai-dong,et al.A novel method for robotic precision design[J].hebgcdxxb,2010,(02):0.
[2]陈兆芃,金明河,樊绍巍,等.面向任务的机器人灵巧手控制系统及多指空间协调阻抗控制[J].哈尔滨工程大学学报,2012,(04):476.[doi:10.3969/j.issn.1006-7043.201105063]
 CHEN Zhaopeng,JIN Minghe,FAN Shaowei,et al.The task-orientated control system of a dexterous robot hand with a multi-fingered spatial coordinating impedance control[J].hebgcdxxb,2012,(02):476.[doi:10.3969/j.issn.1006-7043.201105063]
[3]马洪文,王立权,赵朋,等.串联弹性驱动器力驱动力学模型和稳定性分析[J].哈尔滨工程大学学报,2012,(11):1410.[doi:10.3969/j.issn.1006-7043.201109012]
 MA Hongwen,WANG Liquan,ZHAO Peng,et al.Research of dynamic dodel and stability of a series elastic actuator[J].hebgcdxxb,2012,(02):1410.[doi:10.3969/j.issn.1006-7043.201109012]
[4]霍光磊,赵立军,李瑞峰,等.基于假设检验的室内环境多特征检测方法[J].哈尔滨工程大学学报,2015,(03):348.[doi:10.3969/j.issn.1006-7043.201310040]
 HUO Guanglei,ZHAO Lijun,LI Ruifeng,et al.An indoor environmental multi-feature identification method based on the hypothesis testing[J].hebgcdxxb,2015,(02):348.[doi:10.3969/j.issn.1006-7043.201310040]

备注/Memo

备注/Memo:
收稿日期:2018-03-07。
基金项目:国家自然科学基金项目(U1530119).
作者简介:李超,男,博士研究生;张智,男,副教授,硕士生导师.
通讯作者:张智,E-mail:zhangzhi1981@hrbeu.edu.cn
更新日期/Last Update: 2019-01-30