[1]张华卫,周瑾明,蒋占军,等.多维协同与标签导向交叉注意力胸部数字化报告生成[J].中国医学物理学杂志,2026,43(4):538-546.[doi:DOI:10.3969/j.issn.1005-202X.2026.04.018]
Multi-dimensional synergy and label-oriented cross-attention for chest digital report generation.Multi-dimensional synergy and label-oriented cross-attention for chest digital report generation[J].Chinese Journal of Medical Physics,2026,43(4):538-546.[doi:DOI:10.3969/j.issn.1005-202X.2026.04.018]
点击复制
多维协同与标签导向交叉注意力胸部数字化报告生成(
)
《中国医学物理学杂志》[ISSN:1005-202X/CN:44-1351/R]
- 卷:
-
43卷
- 期数:
-
2026年第4期
- 页码:
-
538-546
- 栏目:
-
医学人工智能
- 出版日期:
-
2026-04-28
文章信息/Info
- Title:
-
Multi-dimensional synergy and label-oriented cross-attention for chest digital report generation
- 文章编号:
-
1005-202X(2026)04-0538-09
- 作者:
-
张华卫; 周瑾明; 蒋占军; 廉敬; 牛本杰
-
兰州交通大学电子与信息工程学院, 甘肃 兰州 730070
- Author(s):
-
Multi-dimensional synergy and label-oriented cross-attention for chest digital report generation
-
School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
-
- 关键词:
-
报告生成; 多维协同; 标签集; 交叉注意力; 多层聚合
- Keywords:
-
Keywords: report generation multi-dimensional synergy label set cross-attention multi-layer aggregation
- 分类号:
-
R318;TP391
- DOI:
-
DOI:10.3969/j.issn.1005-202X.2026.04.018
- 文献标志码:
-
A
- 摘要:
-
针对现有放射学报告生成模型在表征细粒度解剖结构特征和对齐跨模态细粒度特征方面的局限性,提出具有多维协同与标签导向交叉注意力胸部数字化报告生成(MSLANet)模型。首先,在视觉特征提取网络之后引入多维协同注意力模块,通过融合多维度的视觉特征来提高解剖纹理识别能力。随后,在编码器之前引入标签导向交叉注意力模块,通过将疾病标签集与交叉注意力机制相结合,实现图像-报告对之间的精细配准。最后,在报告生成阶段加入多层聚合模块,以充分利用不同编码器层的编码表征,从而提高生成报告的准确性和细节丰富度。为了验证方法的有效性,在两个公开数据集IU X-Ray和MIMIC-CXR上进行对比实验。本文方法在IU X-Ray数据集上,BLEU-1、BLEU-2和BLEU-3指标分别达到0.505、0.345和0.251,较现有大多数同任务有所提升;在MIMIC-CXR数据集上相比RMAP模型,BLEU-2、BLEU-3、METEOR和ROUGE-L分别提升0.5%、0.2%、0.2%和0.5%,验证本文模型的有效性。
- Abstract:
-
Abstract: A novel chest X-ray report generation network (MSLANet) integrating multi-dimensional synergy and label-oriented cross-attention is proposed to address the limitation that existing radiology report generation models often struggle to represent fine-grained anatomical features and aligning cross-modal fine-grained features. Specifically, a multi-dimensional synergy attention module is introduced after the visual feature extraction network to enhance the recognition of anatomical textures by fusing multi-dimensional visual features. Subsequently, a label-oriented cross-attention module is incorporated before the encoder to achieve fine-grained alignment between image-report pairs by leveraging a disease label set combined with a cross-attention mechanism. Finally, a multi-layer aggregation module is added to the report generation phase to fully utilize encoded representations from different encoder layers, thereby improving the accuracy and richness of detail in the generated reports. To validate the effectiveness of the proposed approach, comparative experiments are conducted on two public datasets, IU X-Ray and MIMIC-CXR. Results show that on the IU X-Ray dataset, the proposed approach achieves 0.505, 0.345, and 0.251 on the BLEU-1, BLEU-2, and BLEU-3, respectively, outperforming most existing baseline methods. On the MIMIC-CXR dataset, relative to the RMAP model, the proposed approach improves BLEU-2, BLEU-3, METEOR, and ROUGE-L by 0.5%, 0.2%, 0.2%, and 0.5%, respectively, verifying its effectiveness.
备注/Memo
- 备注/Memo:
-
【收稿日期】2025-11-26
【基金项目】国家自然科学基金(62061023);甘肃省杰出青年基金(21JR7RA345)
【作者简介】张华卫,硕士,高级工程师,研究方向:电子与通信、自然语言处理,E-mail: zhanghuawei@mail.lzjtu.cn
更新日期/Last Update:
2026-04-29