现代制造工程 ›› 2025, Vol. 532 ›› Issue (1): 33-41.doi: 10.16731/j.cnki.1671-3133.2025.01.004

• 机器人技术 • 上一篇    下一篇

基于改进TD3的四足机器人非结构化地形运动控制*

谢子健1,2, 秦建军1,2, 曹钰1,2   

  1. 1 北京建筑大学机电与车辆工程学院,北京 100044;
    2 北京市建筑安全监测工程技术研究中心, 北京 100044
  • 收稿日期:2024-02-06 出版日期:2025-01-18 发布日期:2025-02-10
  • 通讯作者: 秦建军,博士,教授,主要研究方向为机器人学、智能设计方法、工程教育。E-mail:13639777766@163.com;qinjianjun@bucea.edu.cn
  • 作者简介:谢子健,硕士研究生,主要研究方向为机器人控制、强化学习。
  • 基金资助:
    * 北京市属高校基本科研业务费专项资金项目(X20060);北京市建筑安全监测工程技术研究中心研究基金项目(BJC2020K012);北京建筑大学研究生创新项目(PG2024139);载运工具先进制造与测试技术教育部重点实验室(北京交通大学)开放课题基金项目

Unstructured terrain motion control for quadruped robot based on improved TD3

XIE Zijian1,2, QIN Jianjun1,2, CAO Yu1,2   

  1. 1 School of Mechanical-Electronic and Vehicle Engineering,Beijing University of Civil Engineering and Architecture,Beijing 100044,China;
    2 Beijing Engineering Research Center of Monitoring for Construction Safety,Beijing 100044,China
  • Received:2024-02-06 Online:2025-01-18 Published:2025-02-10

摘要: 四足机器人在非结构化地形的运动控制高度依赖于复杂的动力学模型和控制器设计,利用深度强化学习方法设计四足机器人控制器已成为趋势。针对在深度强化学习训练过程中收敛较慢、容易陷入局部最优解及计算资源消耗较大等问题,提出一种融合记忆组件的双延迟深度确定性策略梯度(Memory-integrated Twin Delayed Deep Deterministic policy gradient,M-TD3)算法。首先,对四足机器人以及非结构化地形建模;其次,分析M-TD3算法收敛状态与学习效率;最后,为验证控制器性能,针对多种地形进行运动控制仿真对比并制作样机进行测试。仿真结果表明,相较于传统TD3算法,M-TD3算法收敛更快,效率更高,运动控制性能有显著改善,样机测试结果证明基于改进TD3算法所设计的控制器能够让四足机器人在非结构化地形进行有效的运动越障。

关键词: 四足机器人, 非结构化地形, 深度强化学习, TD3算法

Abstract: The motion control of quadruped robots in unstructured terrains heavily relies on complex dynamics models and controller designs. Utilizing Deep Reinforcement Learning (DRL) methods to design controllers for quadruped robots has become a trend. To address issues such as slow convergence, susceptibility to local optima, and high computational resource consumption during the training process of DRL, an improved algorithm called Memory-integrated Twin Delayed Deep Deterministic policy gradient (M-TD3) was proposed. Firstly, models for both the quadruped robot and unstructured terrains were established. Then, the convergence state and learning efficiency of M-TD3 algorithm were analyzed. Finally, motion control simulation comparisons for various terrains were carried out, and production of prototype was tested to verify controller performance. Simulation results show that compared with traditional TD3 algorithm, M-TD3 algorithm has faster convergence, higher efficiency, and significantly improves motion control performance. Prototype test results show that the controller designed based on the improved TD3 algorithm can make quadruped robot effectively move over obstacles in unstructured terrain.

Key words: quadruped robot, unstructured terrains, Deep Reinforcement Learning(DRL), TD3 algorithm

中图分类号: 


版权所有 © 《现代制造工程》编辑部 
地址:北京市东城区东四块玉南街28号 邮编:100061 电话:010-67126028 电子信箱:2645173083@qq.com
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
访问总数:,当日访问:,当前在线: