Coronary heart disease prediction based on improved Borderline-Smote-GBDT(PDF)
《中国医学物理学杂志》[ISSN:1005-202X/CN:44-1351/R]
- Issue:
- 2023年第10期
- Page:
- 1278-1284
- Research Field:
- 医学信号处理与医学仪器
- Publishing date:
Info
- Title:
- Coronary heart disease prediction based on improved Borderline-Smote-GBDT
- Author(s):
- LI Ruiping1; ZHU Junjie2
- 1. School of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo 454003, China 2. Henan Key Laboratory of Intelligent Detection and Control of Coal Mine Equipment, Jiaozuo 454003, China
- Keywords:
- Keywords: coronary heart disease Borderline-Smote gradient boosting decision tree
- PACS:
- R318;TP391
- DOI:
- DOI:10.3969/j.issn.1005-202X.2023.10.015
- Abstract:
- A Borderline-Smote oversampling algorithm which is improved based on the Euclidean distance is proposed to address the problem of sample imbalance. The category of minority class samples is determined according to the Euclidean distance. Then, the k nearest neighbor data of minority class samples on the boundary is used to find the linear straight-line, and the noise is removed after identifying whether it is the noise misrecognized as boundary samples based on the ipsilateral neighbor data. Finally, the category of the remaining minority class samples is re-determined, and new samples are synthesized through the oversampling for minority class samples on the boundary and those in the dense non-boundary region. The feature datasets extracted from the isomagnetic field map and the two-dimensional current density map are processed with the improved Borderline-Smote oversampling, and the results show that compared with Borderline-Smote-GBDT model, the improved Borderline-Smote-GBDT model for coronary heart disease prediction enhances the accuracy, precision, recall rate and AUC by 8.4%, 2.9%, 9.1%, and 4.6%, respectively. Through the comparison with logistic regression, random forest, k nearest neighbor and extremely randomized tree, it is found that GBDT performs best, and that improved Borderline-Smote-GBDT model has an accuracy, recall rate, precision and AUC of 91.7%, 91.7%, 81.8%, and 87.1%, respectively, which verifies the model feasibility.
Last Update: 2023-10-27