现代制造工程 ›› 2025, Vol. 534 ›› Issue (3): 19-30.doi: 10.16731/j.cnki.1671-3133.2025.03.003

• 先进制造系统管理运作 • 上一篇    下一篇

基于卷积金字塔网络的PPO算法求解作业车间调度问题*

徐帅, 李艳武, 谢辉, 牛晓伟   

  1. 重庆三峡学院电子与信息工程学院,重庆 404020
  • 收稿日期:2024-04-07 发布日期:2025-03-28
  • 通讯作者: 李艳武,博士,高级工程师,主要研究方向为智能优化算法、车间调度。E-mail:liyanwu2022@sina.com
  • 作者简介:徐帅,硕士研究生,主要研究方向为作业车间调度问题、深度强化学习。谢辉,硕士,教授,主要研究方向为计算机智能与优化、计算机控制技术。牛晓伟,硕士,副教授,主要研究方向为智能信号处理。E-mail:liangyve0702@163.com
  • 基金资助:
    *国家自然科学基金面上项目(12175194);重庆市教委科学技术研究项目(KJQN202301216,KJQN202001224)

The PPO algorithm based on convolutional pyramid network to solve job-shop scheduling problem

XU Shuai, LI Yanwu, XIE Hui, NIU Xiaowei   

  1. College of Electronic & Information Engineering,Chongqing Three Gorges University, Chongqing 404020,China
  • Received:2024-04-07 Published:2025-03-28

摘要: 作业车间调度问题是一个经典的NP-hard组合优化问题,其调度方案的优劣直接影响制造系统的运行效率。为得到更优的调度策略,以最小化最大完工时间为优化目标,提出了一种基于近端策略优化(Proximal Policy Optimization,PPO)和卷积神经网络(Convolutional Neural Network,CNN)的深度强化学习(Deep Reinforcement Learning,DRL)调度方法。设计了一种三通道状态表示方法,选取16种启发式调度规则作为动作空间,将奖励函数等价为最小化机器总空闲时间。为使训练得到的调度策略能够处理不同规模的调度算例,在卷积神经网络中使用空间金字塔池化(Spatial Pyramid Pooling,SPP),将不同维度的特征矩阵转化为固定长度的特征向量。在公开OR-Library的42个作业车间调度(Job-Shop Scheduling Problem,JSSP)算例上进行了计算实验。仿真实验结果表明,该算法优于单一启发式调度规则和遗传算法,在大部分算例中取得了比现有深度强化学习算法更好的结果,且平均完工时间最小。

关键词: 深度强化学习, 作业车间调度, 卷积神经网络, 近端策略优化, 空间金字塔池化

Abstract: The job-shop scheduling problem is a classic NP-hard combinatorial optimization problem,and the quality of scheduling directly impacts the operational efficiency of manufacturing systems.In order to obtain a better scheduling strategy with the goal of minimizing the maximum completion time,a Deep Reinforcement Learning (DRL) scheduling method based on Proximal Policy Optimization (PPO) and Convolutional Neural Network (CNN) is proposed. A three-channel state representation method is designed,with 16 heuristic scheduling rules selected as the action space,and the reward function is equivalent to minimizing the total idle time of machines. In order to enable the trained scheduling strategy to handle scheduling instances of different scales,Spatial Pyramid Pooling (SPP) is applied in the convolutional neural network to convert feature matrices of different dimensions into fixed-length feature vectors.Computational experiments are conducted on 42 Job-Shop Scheduling Problem (JSSP) instances from the public OR-Library. The results of the simulation experiments show that the proposed algorithm outperforms single heuristic scheduling rules and genetic algorithms,achieving better results than existing deep reinforcement learning algorithms in most instances,and with the smallest average completion time.

Key words: Deep Reinforcement Learning(DRL), job-shop scheduling problem, Convolutional Neural Network(CNN), Proximal Policy Optimization(PPO), Spatial Pyramid Pooling(SPP)

中图分类号: 


版权所有 © 《现代制造工程》编辑部 
地址:北京市东城区东四块玉南街28号 邮编:100061 电话:010-67126028 电子信箱:2645173083@qq.com
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn
访问总数:,当日访问:,当前在线: