多维协同与标签导向交叉注意力胸部数字化报告生成-《中国医学物理学杂志》

文章信息/Info

Title:: Multi-dimensional synergy and label-oriented cross-attention for chest digital report generation

文章编号:: 1005-202X（2026）04-0538-09

作者:: 张华卫; 周瑾明; 蒋占军; 廉敬; 牛本杰; 兰州交通大学电子与信息工程学院，甘肃兰州 730070

Author(s):: Multi-dimensional synergy and label-oriented cross-attention for chest digital report generation; School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China

关键词:: 报告生成; 多维协同; 标签集; 交叉注意力; 多层聚合

Keywords:: Keywords: report generation multi-dimensional synergy label set cross-attention multi-layer aggregation

分类号:: R318；TP391

DOI:: DOI:10.3969/j.issn.1005-202X.2026.04.018

文献标志码:: A

摘要:: 针对现有放射学报告生成模型在表征细粒度解剖结构特征和对齐跨模态细粒度特征方面的局限性，提出具有多维协同与标签导向交叉注意力胸部数字化报告生成（MSLANet）模型。首先，在视觉特征提取网络之后引入多维协同注意力模块，通过融合多维度的视觉特征来提高解剖纹理识别能力。随后，在编码器之前引入标签导向交叉注意力模块，通过将疾病标签集与交叉注意力机制相结合，实现图像-报告对之间的精细配准。最后，在报告生成阶段加入多层聚合模块，以充分利用不同编码器层的编码表征，从而提高生成报告的准确性和细节丰富度。为了验证方法的有效性，在两个公开数据集IU X-Ray和MIMIC-CXR上进行对比实验。本文方法在IU X-Ray数据集上，BLEU-1、BLEU-2和BLEU-3指标分别达到0.505、0.345和0.251，较现有大多数同任务有所提升；在MIMIC-CXR数据集上相比RMAP模型，BLEU-2、BLEU-3、METEOR和ROUGE-L分别提升0.5%、0.2%、0.2%和0.5%，验证本文模型的有效性。

Abstract:: Abstract: A novel chest X-ray report generation network (MSLANet) integrating multi-dimensional synergy and label-oriented cross-attention is proposed to address the limitation that existing radiology report generation models often struggle to represent fine-grained anatomical features and aligning cross-modal fine-grained features. Specifically, a multi-dimensional synergy attention module is introduced after the visual feature extraction network to enhance the recognition of anatomical textures by fusing multi-dimensional visual features. Subsequently, a label-oriented cross-attention module is incorporated before the encoder to achieve fine-grained alignment between image-report pairs by leveraging a disease label set combined with a cross-attention mechanism. Finally, a multi-layer aggregation module is added to the report generation phase to fully utilize encoded representations from different encoder layers, thereby improving the accuracy and richness of detail in the generated reports. To validate the effectiveness of the proposed approach, comparative experiments are conducted on two public datasets, IU X-Ray and MIMIC-CXR. Results show that on the IU X-Ray dataset, the proposed approach achieves 0.505, 0.345, and 0.251 on the BLEU-1, BLEU-2, and BLEU-3, respectively, outperforming most existing baseline methods. On the MIMIC-CXR dataset, relative to the RMAP model, the proposed approach improves BLEU-2, BLEU-3, METEOR, and ROUGE-L by 0.5%, 0.2%, 0.2%, and 0.5%, respectively, verifying its effectiveness.

《中国医学物理学杂志》[ISSN:1005-202X/CN:44-1351/R]

文章信息/Info

备注/Memo

常用功能

导航/Navigate

工具/Tools

统计/Statistics