[1]徐淑坦,王俊豪,陈明.基于序列的蛋白质-GDP结合位点预测[J].中国医学物理学杂志,2022,39(11):1425-1430.[doi:DOI:10.3969/j.issn.1005-202X.2022.11.017]
 XU Shutan,WANG Junhao,et al.Sequence-based prediction of protein-GDP binding site[J].Chinese Journal of Medical Physics,2022,39(11):1425-1430.[doi:DOI:10.3969/j.issn.1005-202X.2022.11.017]
点击复制

基于序列的蛋白质-GDP结合位点预测()
分享到:

《中国医学物理学杂志》[ISSN:1005-202X/CN:44-1351/R]

卷:
39卷
期数:
2022年第11期
页码:
1425-1430
栏目:
其他(激光医学等)
出版日期:
2022-11-25

文章信息/Info

Title:
Sequence-based prediction of protein-GDP binding site
文章编号:
1005-202X(2022)11-1425-06
作者:
徐淑坦12王俊豪12陈明12
1.上海海洋大学信息学院, 上海 201306; 2.农业农村部渔业信息重点实验室, 上海 201306
Author(s):
XU Shutan1 2 WANG Junhao1 2 CHEN Ming1 2
1. College of Information Technology, Shanghai Ocean University, Shanghai 201306, China 2. Key Laboratory of Fisheries Information, Ministry of Agriculture and Rural Affairs of the Peoples Republic of China, Shanghai 201306, China
关键词:
蛋白质-GDP结合位点位置特异性得分矩阵下采样滑动窗口支持向量机
Keywords:
Keywords: protein-GDP binding site position-specific scoring matrix under-sampling sliding window support vector machine
分类号:
R318;Q811.4
DOI:
DOI:10.3969/j.issn.1005-202X.2022.11.017
文献标志码:
A
摘要:
蛋白质-GDP(Guanosine Diphosphate)结合位点的预测对蛋白质功能注释与新药发现有非常重要作用。为了提高预测蛋白质-GDP结合位点的准确度,提出一种基于序列的蛋白质-GDP结合位点预测方法,使用位置特异性迭代算法进行多序列对比得到位置特异性得分矩阵,通过镜像残基可变滑动窗口方法选取蛋白质序列中每个残基的特征向量,利用CNMW(Clustering NearMiss-2 Weighted)下采样解决数据集正负样本的不平衡问题,最后使用支持向量机进行预测。实验结果显示与传统方法相比,本文方法在马修斯相关系数上有显著提升,表明本文方法的有效性和可行性。
Abstract:
Abstract: The prediction of protein-GDP (Guanosine Diphosphate) binding site is significant for protein function annotation and new drug discovery. A sequence-based protein-GDP binding site prediction method is proposed for improving the accuracy of protein-GDP binding site prediction. The method uses a position-specific iterative algorithm for multiple sequence comparison to obtain a position-specific scoring matrix, selects the feature vector of each residue in the protein sequence through the mirror residue-based variable sliding window, solves the imbalance problem of the positive and negative samples of the data set using CNMW (Clustering NearMiss-2 Weighted) under-sampling, and finally realizes the prediction via support vector machine. The experimental results showed that compared with traditional methods, the proposed method has a significantly higher Matthews correlation coefficient, indicating its effectiveness and feasibility.

备注/Memo

备注/Memo:
【收稿日期】2022-05-10 【基金项目】上海海洋大学科研专项(A2-2006-21-200327;A2-2006-21-200208) 【作者简介】徐淑坦,博士后,副教授,研究方向:机器学习、生物信息,E-mail: stxu@shou.edu.cn;王俊豪,硕士在读,研究方向:机器学习、生物信息,E-mail: w18621978045@163.com(徐淑坦和王俊豪共同为第一作者) 【通信作者】陈明,博士,教授,研究方向:机器学习、生物信息、区块链技术,E-mail: mchen@shou.edu.cn
更新日期/Last Update: 2022-11-25