基于卷积金字塔网络的PPO算法求解作业车间调度问题*

doi:10.16731/j.cnki.1671-3133.2025.03.003

现代制造工程 ›› 2025, Vol. 534 ›› Issue (3): 19-30.doi: 10.16731/j.cnki.1671-3133.2025.03.003

• 先进制造系统管理运作 • 上一篇下一篇

基于卷积金字塔网络的PPO算法求解作业车间调度问题^*

徐帅, 李艳武, 谢辉, 牛晓伟

重庆三峡学院电子与信息工程学院,重庆 404020

收稿日期:2024-04-07 发布日期:2025-03-28
通讯作者: 李艳武,博士,高级工程师,主要研究方向为智能优化算法、车间调度。E-mail:liyanwu2022@sina.com
作者简介:徐帅,硕士研究生,主要研究方向为作业车间调度问题、深度强化学习。谢辉,硕士,教授,主要研究方向为计算机智能与优化、计算机控制技术。牛晓伟,硕士,副教授,主要研究方向为智能信号处理。E-mail:liangyve0702@163.com
基金资助:
*国家自然科学基金面上项目(12175194);重庆市教委科学技术研究项目(KJQN202301216,KJQN202001224)

The PPO algorithm based on convolutional pyramid network to solve job-shop scheduling problem

XU Shuai, LI Yanwu, XIE Hui, NIU Xiaowei

College of Electronic & Information Engineering,Chongqing Three Gorges University, Chongqing 404020,China

Received:2024-04-07 Published:2025-03-28

摘要/Abstract

摘要： 作业车间调度问题是一个经典的NP-hard组合优化问题,其调度方案的优劣直接影响制造系统的运行效率。为得到更优的调度策略,以最小化最大完工时间为优化目标,提出了一种基于近端策略优化(Proximal Policy Optimization,PPO)和卷积神经网络(Convolutional Neural Network,CNN)的深度强化学习(Deep Reinforcement Learning,DRL)调度方法。设计了一种三通道状态表示方法,选取16种启发式调度规则作为动作空间,将奖励函数等价为最小化机器总空闲时间。为使训练得到的调度策略能够处理不同规模的调度算例,在卷积神经网络中使用空间金字塔池化(Spatial Pyramid Pooling,SPP),将不同维度的特征矩阵转化为固定长度的特征向量。在公开OR-Library的42个作业车间调度(Job-Shop Scheduling Problem,JSSP)算例上进行了计算实验。仿真实验结果表明,该算法优于单一启发式调度规则和遗传算法,在大部分算例中取得了比现有深度强化学习算法更好的结果,且平均完工时间最小。

关键词: 深度强化学习, 作业车间调度, 卷积神经网络, 近端策略优化, 空间金字塔池化

Abstract: The job-shop scheduling problem is a classic NP-hard combinatorial optimization problem,and the quality of scheduling directly impacts the operational efficiency of manufacturing systems.In order to obtain a better scheduling strategy with the goal of minimizing the maximum completion time,a Deep Reinforcement Learning (DRL) scheduling method based on Proximal Policy Optimization (PPO) and Convolutional Neural Network (CNN) is proposed. A three-channel state representation method is designed,with 16 heuristic scheduling rules selected as the action space,and the reward function is equivalent to minimizing the total idle time of machines. In order to enable the trained scheduling strategy to handle scheduling instances of different scales,Spatial Pyramid Pooling (SPP) is applied in the convolutional neural network to convert feature matrices of different dimensions into fixed-length feature vectors.Computational experiments are conducted on 42 Job-Shop Scheduling Problem (JSSP) instances from the public OR-Library. The results of the simulation experiments show that the proposed algorithm outperforms single heuristic scheduling rules and genetic algorithms,achieving better results than existing deep reinforcement learning algorithms in most instances,and with the smallest average completion time.

Key words: Deep Reinforcement Learning(DRL), job-shop scheduling problem, Convolutional Neural Network(CNN), Proximal Policy Optimization(PPO), Spatial Pyramid Pooling(SPP)

中图分类号:

TP301.6

徐帅, 李艳武, 谢辉, 牛晓伟. 基于卷积金字塔网络的PPO算法求解作业车间调度问题^*[J]. 现代制造工程, 2025, 534(3): 19-30.

XU Shuai, LI Yanwu, XIE Hui, NIU Xiaowei. The PPO algorithm based on convolutional pyramid network to solve job-shop scheduling problem[J]. Modern Manufacturing Engineering, 2025, 534(3): 19-30.

参考文献

[1] FANG Y,PENG C,LOU P,et al. Digital-twin-based job shop scheduling toward smart manufacturing[J].IEEE Trans on Industrial Informatics,2019,15(12):6425-6435.
[2] WAGNER H.An integer linear-programming model for machine scheduling[J].Naval Research Logistics Quarterly,1959,6(2):131-140.
[3] BRUCKER P,JURISCH B,SIEVERS B.A branch and bound algorithm for the job-shop scheduling problem[J].Discrete Applied Mathematics,1994,49(1):107-127.
[4] ZHANG J,DING G,ZOU Y,et al.Review of job shop scheduling research and its new perspectives under industry 4.0[J].Journal of Intelligent Manufacturing,2019,30(4):1809-1830.
[5] 黄学文,陈绍芬,周阗玉,等.求解柔性作业车间调度的遗传算法综述[J].计算机集成制造系统,2022,28(2):536-551.
[6] 顾幸生,丁豪杰.面向柔性作业车间调度问题的改进博弈粒子群算法[J].同济大学学报(自然科学版),2020,48(12):1782-1789.
[7] SELS V,GHEYSEN N,VANHOUCKE M.A comparison of priority rules for the job shop scheduling problem under different flow time and tardiness-related objective functions[J].International Journal of Production Research,2012,50(15):4255-4270.
[8] HAN B,YANG J.Research on adaptive job shop scheduling problems based on dueling double DQN[J].IEEE Access,2020,8:186474-186495.
[9] LUO S.Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning[J].Applied Soft Computing,2020,91(21):106208.
[10] LIU C L,CHANG C C,TSENG C J.Actor-critic deep reinforcement learning for solving job shop scheduling problems[J].IEEE Access,2020,8:71752-71762.
[11] 苗宽,李崇寿.一种基于修正机制和强化学习的作业车间调度问题的优化算法[J].计算机科学,2023,50(6):274-282.
[12] WANG R,WANG G,SUN J,et al.Flexible job shop scheduling via dual attention network based reinforcement learning[J].IEEE Trans on Neural Networks and Learning Systems,2023,99:1-12.
[13] HE K,ZHANG X,REN S,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Trans on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916.
[14] TASSEL P,GEBSER M,SCHEKOTIHIN K.A reinforce-ment learning environment for job-shop scheduling[J].ArXiv,2021:2104.03760.
[15] WU X,YAN X.A spatial pyramid pooling-based deep reinforcement learning model for dynamic job-shop scheduling problem[J].Computers & Operations Research,2023,106:160401.
[16] ZHAO Y,WANG Y,TAN Y,et al.Dynamic jobshop scheduling algorithm based on deep Q network[J].IEEE Access,2021,9:122995-123011.
[17] ADAMS J,BALAS E,ZAWACK D.The shifting bottleneck procedure for job shop scheduling[J].Management Science,1988,34(3):391-401.
[18] LAWRENCE S.Resouce constrained project scheduling:An experimental investigation of heuristic scheduling techniques(supplement)[D].Pennsylvania,USA:Graduate School of Industrial Administration,Carnegie-Mellon University,1984.
[19] YAMADA T,NAKANO R.A genetic algorithm applicable to large-scale job-shop instances [C]// Proc of Parallel Problem Solving from Nature.[S.l.]:[s.n.],1992:281-290.
[20] DEMIRKOL E,MEHTA S,UZSOY R.Benchmarks for shop scheduling problems[J].European Journal of Operational Research,1996,109(1):137-141.
[21] STORER R H,WU S D,VACCARI R.New search spaces for sequencing problems with application to job shop scheduling[J].Management Science,1992,38(10):1495-1509.
[22] TAILLARD E.Benchmarks for basic scheduling problems[J].European Journal of Operational Research,1993,64(2):278-285.
[23] ZHANG C,SONG W,CAO Z,et al.Learning to dispatch for job shop scheduling via deep reinforcement learning[C]//Proc of the 34th Conference on Neural Information Processing Systems.[S.l.]:[s.n.],2020:1621-1632.
[24] PARK J Y,BAKHTIYAR S,PARK J.ScheduleNet:Learn to solve multi-agent scheduling problems with reinforcement learning[C]//Proc of International Conference on Learning Representations.[S.l.]:[s.n.],2022.
[25] CHEN R,LI W,YANG H.A deep reinforcement learning framework based on an attention mechanism and disjunctive graph embedding for the job-shop scheduling problem[J].IEEE Trans on Industrial Informatics,2023,19(2):1322-1331.
[26] TASSEL P,GEBSER M,SCHEKOTIHIN K.An end-to-end reinforcement learning approach for job-shop scheduling problems based on constraint programming[C]//Proc of the 33rd International Conference on Automated Planning and Scheduling.[S.l.]:[s.n.],2023.

基于卷积金字塔网络的PPO算法求解作业车间调度问题^*

The PPO algorithm based on convolutional pyramid network to solve job-shop scheduling problem

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	刘亮, 贺禹铭, 祁思远. 基于数字孪生仿真的柔性作业车间调度优化研究^*[J]. 现代制造工程, 2025, 534(3): 41-51.
[2]	杨丹, 舒先涛, 余震, 鲁光涛, 纪松霖, 王家兵. 深度强化学习求解动态柔性作业车间调度问题^*[J]. 现代制造工程, 2025, 533(2): 10-16.
[3]	杨逢海, 杨晓英, 裴志杰, 武亚琪, 张志伟. 基于深度强化学习的风电拉挤板生产智能排程^*[J]. 现代制造工程, 2025, 532(1): 23-32.
[4]	谢子健, 秦建军, 曹钰. 基于改进TD3的四足机器人非结构化地形运动控制^*[J]. 现代制造工程, 2025, 532(1): 33-41.
[5]	李泽稷, 周学良, 孙培禄. 融合残差块与Swin-Transformer机制的刀具磨损监测方法^*[J]. 现代制造工程, 2024, 527(8): 126-135.
[6]	聂鑫鹏, 袁逸萍, 祁雷, 朱广贺. 压延车间能源信息采集关键技术研究与系统实现[J]. 现代制造工程, 2024, 525(6): 75-81.
[7]	闫富乾, 石致远, 王立闻. 基于改进灰狼算法的柔性作业车间动态节能分批调度问题^*[J]. 现代制造工程, 2024, 520(1): 24-32.
[8]	赵东旭, 袁志响, 易思广, 潘加港, 张云鹏, 卢文壮. 基于双路并行卷积信息融合的刀具磨损识别^*[J]. 现代制造工程, 2024, 520(1): 124-129.
[9]	李峥峰;丁其聪;张东方;张国辉. 改进离散麻雀搜索算法求解柔性作业车间调度问题[J]. 现代制造工程, 2023, 516(9): 18-27.
[10]	陆心屹;韩晓龙. 基于强化学习的改进NSGA-Ⅱ求解柔性作业车间节能调度问题[J]. 现代制造工程, 2023, 515(8): 22-35.
[11]	莫坚;张泽. 轴承剩余使用寿命的注意力多尺度卷积神经网络预测[J]. 现代制造工程, 2023, 515(8): 148-154.
[12]	吴迎晨，肖彪，赵正彩，彭仕鑫，苏宏华，朱夏林. 柔性作业车间调度多策略果蝇优化算法研究[J]. 现代制造工程, 2023, 512(5): 22-30.
[13]	刘昕宇;姜长泓;刘一铮;王其铭. 基于空洞CNN和LSTM的滚动轴承剩余使用寿命预测[J]. 现代制造工程, 2023, 511(4): 130-135.
[14]	秦红斌;常永顺;唐红涛;张峰;王玲军. 混合麻雀算法求解带准备时间的分布式柔性作业车间调度问题[J]. 现代制造工程, 2023, 518(11): 1-11.
[15]	唐艺军;李雪. 基于改进混合遗传算法的柔性车间调度问题研究[J]. 现代制造工程, 2023, 517(10): 8-14.