|Table of Contents|

Evaluating generic and domain-specific large visual models for T staging of esophageal cancer using CT: a study of zero-shot performance and the impact of prompt engineering(PDF)

《中国医学物理学杂志》[ISSN:1005-202X/CN:44-1351/R]

Issue:
2025年第11期
Page:
1532-1540
Research Field:
医学人工智能
Publishing date:

Info

Title:
Evaluating generic and domain-specific large visual models for T staging of esophageal cancer using CT: a study of zero-shot performance and the impact of prompt engineering
Author(s):
ZHU Dabing1 GAO Wei2 LIN Yanghao1 LAI Wuhao3 LIANG Zhichao3 ZENG Xianyi2 DENG Xikai2 AN Jun3 4
1. Department of Radiology, Yuedong Hospital, the Third Affiliated Hospital of Sun Yat-sen University, Meizhou 514700, China 2. China Unicom Digital Intelligent Medical Technology Co., Ltd., Guangzhou 511457, China 3. Department of Cardiothoracic Surgery, Yuedong Hospital, the Third Affiliated Hospital of Sun Yat-sen University, Meizhou 514700, China 4. Department of Cardiothoracic Surgery, the Third Affiliated Hospital of Sun Yat-sen University, Guangzhou 510630, China
Keywords:
Keywords: esophageal cancer large vision model computed tomography staging prompt engineering domain-specific model zero-shot learning diagnostic accuracy
PACS:
R318;R735.1
DOI:
DOI:10.3969/j.issn.1005-202X.2025.11.019
Abstract:
Abstract: Background Accurate T-staging is critical for esophageal cancer therapy, but CT-based assessment has significant limitations. Large vision models (LVMs) hold promise, yet their zero-shot clinical diagnostic capability without fine-tuning remains unvalidated. Methods A retrospective analysis was conducted on the chest CT images from 98 esophageal cancer patients and 50 normal controls. Using radiologist-consensus as the gold standard, the zero-shot T-staging performance of 3 LVMs (GPT-5, Gemini, and MedGemma) was evaluated with prompts of varying complexity. Results GPT-5 exhibited the highest accuracy and stability. Significant biases were observed among models: Gemini tended to over-stage, while MedGemma showed a tendency to under-stage. All models faced challenges in identifying early-stage tumors, but structured prompts improved diagnostic performance for mid-to-late stage lesions. Conclusion LVMs have potential for zero-shot T-staging, but their performance highly depends on model choice and prompt design. The generic model GPT-5 show superior zero-shot generalization. However, current model performance is not yet clinically viable, especially for early diagnosis. Future work should focus on fine-tuning with high-quality clinical data and developing standardized prompt frameworks.

References:

Memo

Memo:
-
Last Update: 2025-12-01