Application of deep reinforcement learning in automatic IMRT planning for rectal cancer(PDF)
《中国医学物理学杂志》[ISSN:1005-202X/CN:44-1351/R]
- Issue:
- 2022年第1期
- Page:
- 1-8
- Research Field:
- 医学放射物理
- Publishing date:
Info
- Title:
- Application of deep reinforcement learning in automatic IMRT planning for rectal cancer
- Author(s):
- WANG Hanlin; LIU Jiacheng; WANG Qingying; YUE Haizhen; DU Yi; ZHANG Yibao; WANG Ruoxi; WU Hao
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education)/Department of Radiotherapy, Peking University Cancer Hospital & Institute, Beijing 100142, China
- Keywords:
- Keywords: rectum cancer automatic optimization deep reinforcement learning Eclipse scripting application programming interface optimization adjustment policy network
- PACS:
- R318;R811.1
- DOI:
- DOI:10.3969/j.issn.1005-202X.2022.01.001
- Abstract:
- Abstract: Objective The optimization of intensity-modulated radiotherapy planning is often time-consuming, and the plan quality depends on the experience of the planner and the available planning time. An unsupervised automatic intensity-modulated radiotherapy optimization procedure is discussed and implemented to simulate the human operation during the whole optimization process. Methods Based on the framework of deep reinforcement learning (DRL), an optimization adjustment policy network (OAPN) was proposed to automate the process of treatment planning optimization. The scripting application programming interface (ESAPI) of Varian Eclipse 15.6 TPS was used to realize the interaction between OAPN and TPS. Taking dose-volume histogram as the information input, OAPN learned the adjustment strategy of objective parameters in TPS by the training mode of reinforcement learning, so as to gradually improve and obtain high-quality plans. A total of 18 cases of rectum cancer which had completed treatment were selected from the clinical database. Five of the cases were used for OAPN training, and the remaining 13 for evaluating the feasibility and effectiveness of OAPN after training. Finally, a third-party scoring tool was used to evaluate plan quality. Results The average score of 13 tested plans using uniform initial optimization objective parameters (OOPs) was 45.53±4.58 (the upper limit value was 110). After adjusting OOPs by OAPN, the average plan score was 88.67±6.74. Conclusion OAPN can realize the data interaction with TPS through ESAPI, and form an action-value strategy through DRL. After training, OAPN can efficiently adjust OOPs and obtain a high-quality plan.
Last Update: 2022-01-17