|Table of Contents|

Coronary heart disease prediction based on improved Borderline-Smote-GBDT(PDF)

《中国医学物理学杂志》[ISSN:1005-202X/CN:44-1351/R]

Issue:
2023年第10期
Page:
1278-1284
Research Field:
医学信号处理与医学仪器
Publishing date:

Info

Title:
Coronary heart disease prediction based on improved Borderline-Smote-GBDT
Author(s):
LI Ruiping1 ZHU Junjie2
1. School of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo 454003, China 2. Henan Key Laboratory of Intelligent Detection and Control of Coal Mine Equipment, Jiaozuo 454003, China
Keywords:
Keywords: coronary heart disease Borderline-Smote gradient boosting decision tree
PACS:
R318;TP391
DOI:
DOI:10.3969/j.issn.1005-202X.2023.10.015
Abstract:
A Borderline-Smote oversampling algorithm which is improved based on the Euclidean distance is proposed to address the problem of sample imbalance. The category of minority class samples is determined according to the Euclidean distance. Then, the k nearest neighbor data of minority class samples on the boundary is used to find the linear straight-line, and the noise is removed after identifying whether it is the noise misrecognized as boundary samples based on the ipsilateral neighbor data. Finally, the category of the remaining minority class samples is re-determined, and new samples are synthesized through the oversampling for minority class samples on the boundary and those in the dense non-boundary region. The feature datasets extracted from the isomagnetic field map and the two-dimensional current density map are processed with the improved Borderline-Smote oversampling, and the results show that compared with Borderline-Smote-GBDT model, the improved Borderline-Smote-GBDT model for coronary heart disease prediction enhances the accuracy, precision, recall rate and AUC by 8.4%, 2.9%, 9.1%, and 4.6%, respectively. Through the comparison with logistic regression, random forest, k nearest neighbor and extremely randomized tree, it is found that GBDT performs best, and that improved Borderline-Smote-GBDT model has an accuracy, recall rate, precision and AUC of 91.7%, 91.7%, 81.8%, and 87.1%, respectively, which verifies the model feasibility.

References:

Memo

Memo:
-
Last Update: 2023-10-27