ATP与蛋白质结合位点预测（学位论文-工学）

资源描述

《ATP与蛋白质结合位点预测（学位论文-工学）》由会员分享，可在线阅读，更多相关《ATP与蛋白质结合位点预测（学位论文-工学）（53页珍藏版）》请在金锄头文库上搜索。

1、号: 码:：I分类号: 学校代码: 10128UDC: 学号: 20091071类题硕士学位论文别：全日制硕士研究生目：ATP 与蛋白质结合位点预测英文题目：Identification of ATP binding residues of a protein研究生：包文荣学科名称：物理电子学指导教师：赵巨东教授二一三年六月原创性声明本人声明：所呈交的学位论文是本人在导师的指导下进行的研究工作及取得的研究成果。除文中已经注明引用的内容外，论文中不包含其他人已经发表或撰写过的研究成果，也不包含为获得内蒙古工业大学及其他教育机构的学位或证书而使用过的材料。与我一同工作的同志对本研

2、究所做的任何贡献均已在论文中作了明确的说明并表示谢意。学位论文作者签名：指导教师签名：日期：日期：学位论文版权使用授权书本学位论文作者完全了解学校有关保留、使用学位论文的规定，即：内蒙古工业大学有权将学位论文的全部或部分内容保留并向国家有关机构、部门送交学位论文的复印件和磁盘，允许编入有关数据库进行检索，也可以采用影印、缩印或其它复制手段保存、汇编学位论文。为保护学校和导师的知识产权，作者毕业后涉及该学位论文的主要内容或研究成果用于发表学术论文须征得内蒙古工业大学就读期间导师的同意，并且版权单位必须署名为内蒙古工业大学方可投稿或公开发表。本学位论文属于保密，在年解密后适用本授权书

3、。不保密（请在以上方框内打“” ）学位论文作者签名：指导教师签名：日期：日期：内蒙古工业大学硕士论文摘要随着对蛋白质的不断深入研究发现,蛋白质与小分子或配体结合的现象普遍存在,尤其蛋白质与能量分子的结合更是广泛存在与各种生命现象中，因此研究蛋白质与配体结合的特性和规律是十分必要的。本文基于从蛋白质与 ATP 结合的氨基酸序列出发，采用了前人整理的数据库，并对数据库进行统计分析提取出序列特征信息。选取恰当的特征参数，用不同的分类算法对 ATP 结合位点进行识别，最后得到较好的预测结果。本文共分为三个部分：第一部分是对数据库的介绍和特征分析。本文选用的是经 Dr.G.P.S.Ragha

4、va 整理和使用过的 168 条非冗余的 ATP 与蛋白质结合氨基酸序列。在这个数据库中以小写字母表示 ATP 与蛋白质的结合位点。本文是对具体位置的预测因此需要把序列截成片段。因为对位点的预测是个二分类问题，以长度为 11 的片段为例：本文将片段的中心位置是结合位点的片段作为数据的正集，其他片段作为负集。本文共选取了从 5到 23 的 10 个不同长度片段。在特征分析方面，本文将片段的氨基酸组分、位点氨基酸组分、紧邻二联体组分、氨基酸的物化性这些参数进行了统计分析和比较。第二部分是理论方法。本文选用了多样性增量方法（ID）和支持向量机方法（SVM ）,多样性增量方法的好处是不需要学习和记忆，

5、最重要的问题就是参数的选取。而支持向量机方法需要学习和记忆但却很适合对小样本进行分类，且分类效果很好。第三部分是对 ATP 与蛋白质结合的位置进行预测。首先用 ID 方法选取片段的氨基酸组分和片段紧邻二联体组分作为参数预测，结果表明用 ID 方法预测精度较低，其次用 SVM 方法选取同样的参数进行预测，结果表明精度比用 ID 方法有所提高。在此基础上我们对片段的氨基酸组分用多样性增量方法进行了降维处理，并对 20 种氨基酸重新约化为 6 种，用约化后的氨基酸组分 ID 值和约化后的氨基酸紧邻二联体的 ID 值共同作为特征参数，再用 SVM 预测，得到较好预测精度。关键词：三磷酸腺苷（ATP）

6、；结合位点；多样性增量；支持向量机内蒙古工业大学硕士论文AbstractWith continuous in-depth study of the proteins found widespread protein and smallmolecule or ligand binding, especially the combination of molecules of protein and energyare more widespread and various life phenomena, therefore, the study of protein and theligand

7、binding characteristicsand the law is very necessary. This article is based ondeparture from the amino acid sequence of the protein and ATP-binding, the previousdatabase finishing, statistical analysis and database information extracted sequence features.Select the appropriate characteristic paramet

8、ers, using different classification algorithms toidentify the ATP binding sites, and finally get a better prediction result.This article is divided into three parts:The first part is the introduction and features of the database. This selection is byDr.GPSR aghava finishing and used 168 non-redundan

9、t ATP and protein combined withthe amino acid sequence. ATP and protein binding sites instead of lowercase letters in thisdatabase. This is the prediction of the specific location need to cut into the sequencefragment. On site prediction is a binary classification problem, the fragment with a length

10、of 11, for example: the center position of the fragment is the fragment of the binding sitesas a positive set of data, the other fragment as a negative collector. Selected a total of 10different length fragments from 5-23. Feature analysis, amino acid composition of theamino acid composition of the

11、fragment sites, close to two conjoined component aminoacids of physical and chemical parameters were statistically analyzed and compared.The second part is a theoretical method. This selection of the increment of diversity(ID) and support vector machine (SVM), benefits of diversity incremental metho

12、d does notrequire learning and memory, the most important issue is the selection of parameters. Thesupport vector machine need to learn and remember but very suitable for small sampleclassification and the classification works well.The third part is to predict the position of the ATP and protein bin

13、ding. First IDmethod selected amino acid composition of the fragment and fragment close to twoconjoined components as a parameter to predict results show that the prediction accuracy islow, followed by SVM method to select the same parameters to predict the results showthat the ID method more accura

14、te than the ID method increased. Based on the amino acidcomposition of the fragment with the diversity of the incremental method of reducing the内蒙古工业大学硕士论文dimension of the re approximately 20 amino acids into 6 kinds, with the amino acids of theamino acid component ID value after approximately Reduc

15、tive close to the ID value of twoconjoined together as characteristic parameters, and then SVM predict better predictionaccuracy.Key words: adenosine triphosphate (ATP); binding sites; increment of diversity; supportvector machine内蒙古工业大学硕士论文目录第一章引言.11.1 研究背景. 11.2 国内外研究现状. 11.3 论文内容安排. 21.4 ATP 相关概念 .

展开阅读全文