外文翻译--Improved LogitBoost Classifier Based Prediction of GPCR-G-Protein Coupling with Self-Adaptive Immune Algorithm

资源描述

《外文翻译--Improved LogitBoost Classifier Based Prediction of GPCR-G-Protein Coupling with Self-Adaptive Immune Algorithm》由会员分享，可在线阅读，更多相关《外文翻译--Improved LogitBoost Classifier Based Prediction of GPCR-G-Protein Coupling with Self-Adaptive Immune Algorithm（4页珍藏版）》请在金锄头文库上搜索。

1、Improved LogitBoost Classifier Based Prediction of GPCR-G-Protein Coupling with Self-Adaptive Immune Algorithm Quan Gu College of Information Sciences and Technology Donghua University Shanghai 201620, China Yong-Sheng Ding* College of Information Sciences and Technology Engineering Research Center

2、of Digitized Textile & Fashion Technology, Ministry of Education, Donghua University Shanghai 201620, China Abstract G-Protein coupled receptors (GPCRs) constitute the largest group of membrane receptors with great pharmacological interest. The signal transduction within cells is leaded by a wide r

3、ange of native ligands interact and activate GPCRs. Most of these responses are mediated through the interaction of GPCRs with coupling GTP-binding proteins (G-proteins). For the reason of the information explosion in biological sequence databases, the development of software algorithms that could p

4、redict properties of GPCRs is important. In this paper, we have developed an intensive exploratory approach to predict the coupling preference of GPCRs to heterotrimeric G-proteins. An integrated recognition method combined with Self-Adaptive Immune Algorithm and LogitBoost classifier has been appli

5、ed in prediction. The result indicates that the proposed method might become a potentially useful tool for GPCR-G-protein coupling prediction, or play a complimentary role to the existing methods in the relevant areas. The method predicts the coupling preferences of GPCRs to three kinds of G-protein

6、 subclasses, Gs, Gi/o and Gq/11, but not G12/13 for the limited amount. Keywords- G-Protein coupled receptors, G-Proteins, Self-Adaptive Immune Algorithm, Improved LogitBoost classifier I. INTRODUCTION G-protein-coupled receptors (GPCRs), which are the largest cell surface receptors with nearly 450

7、genes identified to date, constitute the largest family of cell surface receptors and play a central role in cellular signaling pathways. The signal transduction within cells is leaded by a wide range of native ligands interact and activate GPCRs 1. Most of these responses are mediated through the i

8、nteraction of GPCRs with coupling GTP-binding proteins (G-proteins) 2. The Figure 1 shows the interaction between GPCR and G- proteins. From the view of drug design, it will be important to screen a drug for its ability to effectively control the activation of a specific G-protein, by monitoring the

9、 stimulation by different ligands3. In summary, it is quite difficult to develop such a high -throughput experimental system; hence, it is essential to develop a bioinformatics method to predict GPCRG-protein interaction when both the GPCR sequence and ligands information are given. Figure 1. GPCR-G

10、-protein coupling Previous computational methods of GPCR coupling specificity to G-protein families have been applied on selected intracellular regions of GPCR sequences. Some conventional sequence search method such as BLAST (GrDB) and FASTA 4 are lack of high accuracy for the reason that the funct

11、ion-similarity relationship is unclear in the case of GPCRs. In recent years, some pattern recognition method have been applied for prediction such as Bayes model 5, SVM 3 and HMM 6. However, the result is not inspiring for the shortage of the classification engine. As proved by the prior research,

12、LogitBoost Classifier 7, an integrated method, has yielded an exciting performance result on prediction research in many Bioinformatics areas. In our paper, we develop an improved LogitBoost classifier (ILC) based on Boosting algorithm and intelligent Computation. A Self- Adaptive Immune Algorithm i

13、s developed for optimizing parameters of the classifier. The This work was supported in part by the National Nature Science Foundationof China (No. 60975059, 60775052), Specialized Research Fund for theDoctoral Program of Higher Education from Ministry of Education of China(No. 20090075110002), Proj

14、ect of the Shanghai Committee of Science andTechnology (No. 09JC1400900, 08JC1400100), Shanghai Talent DevelopingFoundation (No. 001), Specialized Foundation for Excellent Talent fromShanghai, and the Open Fund from the Key Laboratory of MICCAI ofShanghai (06dz22103) 978-1-4244-4713-8/10/$25.00 2010

15、 IEEEresult achieved indicates that the proposed method might become a potentially useful tool for GPCR-G-protein interaction prediction, or play a complimentary role to the existing methods in the relevant areas. II. MATERIALS AND METHOD A. Dataset. According to a commonly used classification schem

16、e, most of GPCRs are divided into three main classes. As demonstrated in previous works 2, 3, Class B, C can be directly assigned to a G-protein type. Class A may couple with many different kinds of G-proteins. However, the G12/13 type is not considered in our research for the limited amount. In our

17、 paper, we chose the dataset from GRIFFN for training dataset. The redundancy of these sequences was evaluated by analyzing clusters formed under sequence similarity set to decrease from 100% to 30% using from the BLAST-CLUST software package reason, 132 sequences are used in this work without a pro

18、cess of elimination of redundancy3. What is more, a 479 dataset with GPCR-G-protein binding from PRED-COUPLE online system 6 is used for validating our method. B. Characteristic quantities In this research paper, we extract the structural characteristics comprehensively for developing the method. Th

19、e characteristics are from the ligands, extracellular loops, intracellular loops and transmembrane domain of GPCRs. The feature quantities are used as our classifier training input. To calculate parameters above mentioned, the boundaries of the transmembrane helix and loop regions of GPCR sequences

20、were determined from multiple alignments of known Class A families using CLUSTAL W 8. The feature quantities are listed in the Table I. TABLE I. FEATRUE QUANTITIES OF DATASET No. Featrue selected 1 Averaged hydrophobicity of TMH1 2 Averaged hydrophobicity of TMH2 3 Averaged hydrophobicity of TMH3 4

21、Averaged hydrophobicity of TMH4 5 Averaged hydrophobicity of TMH5 6 Averaged hydrophobicity of TMH6 7 Averaged hydrophobicity of TMH7 8 Length between TM1 and TM2 9 Length between TM2 and TM3 10 Length between TM3 and TM4 11 Length between TM4 and TM5 12 Length between TM5 and TM6 13 Length between

22、TM6 and TM7 14 Length of N-terminal loop 15 Length of C-terminal loop TMH: transmembrane helix. C. Self-adaptive immune algorithm Artificial immune system is a computational intelligence paradigm inspired by the biological immune system, and has also been applied successfully to a variety of optimiz

23、ation problems 9. In our paper, a selfadaptive immune algorithm is developed for optimize the parameter of LogitBoost classifier. The algorithm can be described as: 1. Randomly generate population POP with size POPSIZE according to encoded mode. 2. Compute the antibodies affinity and rank them, take

24、 out N superior antibodies. Set t = 1. The algorithm uses Frufertree structure for multicast tree. Each antibody in the population represents a Frufer multicast tree. The affinity function controls the convergence process. Here, the objective function of the model is transferred as the affinity, def

25、ined as (1). )(11)(tCtAff+= (1) where ,C(t) is information transferring cost function. 3. Perform clonal selection and hypermutation on superior antibodies, and then generate cloned population. The total size of the offspring and elder can be defined as POPSIZE and be ranked according to the affinit

26、y value, then take out the superior POPSIZE/2 to perform clonal selection operator, the selection probability P(i) of antibody i can be defined as the equation (2). =POPttAffiAffiP)(/ )()( (2) In (2), POP is the all of the antibodies in the selection operation. The size of the cloning population is

27、defined as cpopsize, cpopsize = m POPSIZE , where m is the population coefficient. In this paper, we choose m = 2. The cloning coefficient C(i) of antibody i can be defined as (3). cpopsizeiPiC=)()( (3) 4. Compute the crossover probability and execute crossover operation on clone antibody of differe

28、nt elder. 5. Compute the mutation probability and execute bit mutation on the antibody. Probabilities of crossover and mutation, Pc and Pm, have large effects on immune algorithms performances, With these two indexes, Pc and Pm are designed to adapt themselves in the following ways as (4) and (5). m

29、ax/trtcicePP= (4) max/trtmimePP= (5) where, Pci and Pmi are initial values of Pc and Pm respectively, in our paper we set as 0.8 and 0.1. The parameter r is chosen as 2. 6. t=t+1； Introduce the immune elimination to keep the stability of population size, and the immune memory to reserve the superior

30、 antibodies. 7. If termination conditions are satisfied, go to Step 8, else go to Step 2. 8. Output the result. D. Improved LogitBoost classifier LogitBoost is one of the boosting algorithms developed in recent years 7. Boosting was originally proposed to combine several weak classifiers to improve

31、the classification performance, which has been used to solve various recognition problems. AdaBoost is a practical boosting algorithm validated in our prior work 10. However, it also suffered from the over-fit problem when dealing with very noisy data. To cope with this situation, LogitBoost is adop

32、ted to reduce training errors linearly and hence yield better generalization 11. In our paper, an improved LogitBoost classifier (ILC) with Self adaptive Immune algorithm and LogitBoost is adopted as classification engine, the steps can be demonstrated as follows: 1. Input dataset),(),.,(11NNyxyxS?=

33、 , where 1 , 1,=YyXxii? .Input number of iterations T. 2. Initialise the weight ),.,1(/1NiNwi= ; initialize committee function )(xF?and probabilities)(xP?.The function are denoted as follows: 0)(=xF? (6) 2/1)|1()(=xyPxP? (7) 3. Repeat t=1,T Compute the weights and working response with follow equati

34、ons: )(1)(iiixPxPw?= (8) yiiiwxpyz)(*?=, where 2/ ) 1(*+=iiyy (9) 4. Fit the function )(xft? using weightsiw In our study we use selfadaptive immune algorithm instead of Regression decision method in primary LogitBoost classifier. The Self-Adaptive Immune method is adopted to optimize the parameter

35、iw for fitting the data),(),.,(11NNzxzx?. 5. Update )(xF?and )(xP?with: )(21)()(xfxFxFt?+ (10) )()()()(xFxFxFeeexP?+ (11) 6. Output the final classifier )()(XFsignxLF=? (12) Since it is the Multi-class problem, LogitBoost classifiers will output a vector of classification probability: )(),.,(),()(21

36、xPxPxPxPk?= (13) The testing data will be predicted to belong to the class with the highest probability )(arg)(xPMAXxCii?= (14) III. Results and Discussion In the classification system, the classification result is evaluated by Accuracy, Sensitivity and Specificity. The formulas are as follows: Qk=T

37、P(i) / N (15) SEk = TP(i) / (TP(i)+FN(i) (16) SPk = TN(i) / (FP(i)+TN(i) (17) In our paper, we adopt Qk, SEk , SPk to represent the accuracy of the i-th class prediction result, sensitivity, specificity respectively. Where, TP is numbers of actual GPCR-G-protein coupling predicted as the same coupli

38、ng, FP is the number of actual non-GPCR-G-protein coupling predicted as the coupling, TN is the numbers of actual coupling predicted as non-coupling and FN are the numbers of actual GPCRs predicted as non- coupling. In order to demonstrate the principle we put forward, the ILC for prediction is used

39、 on GRIFFN dataset3. We use Yabukiss method SVM as comparison on the same dataset. Correct classification rate results obtained from the three main G-protein coupling groups, in a five-fold cross-validation procedure. The training set was randomly divided to five equally balanced sets. Afterwards, w

40、e trained the model using the sequences in the four sets whereas the last set was used for testing. This procedure was repeated five times. As shown in this Table II, we find that the accuracy, sensitivity and specificity of ILC are the highest as the datasets and the homology are same. It also demo

41、nstrates the feasibility and superiority of ILC on prediction performance. TABLE II. RESUIT COMPARISON ON GRIFFN DATASET Algorithm Gi/o type Gq/11 type Gs type Q1 SE1 SP1 Q2 SE2 SP2 Q3 SE3 SP3 SVM 0.853 0.77 0.78 0.834 0.68 0.73 0.923 0.86 0.90 ILC 0.910 0.78 0.84 0.853 0.70 0.81 0.923 0.86 0.90 As

42、well known, the jackknife test is also perceived as a rigorous and reliable testing method 7, 12. From the principle mentioned above, we also use the test method to verify our results. TABLE III. RESULT COMPARISON OF LOGITBOOST AND OUR METHOD Test Method Algorithm Q1 Q2 Q3 Total Five-fold cross-vali

43、dation test LogitBoost 0.920 0.856 0.758 0.902 ILC 0.938 0.876 0.934 0.935 Jackknife test LogitBoost 0.870 0.895 0.910 0.879 ILC 0.914 0.876 0.884 0.905 TABLE IV. PREDICTION RESULT COMPARISON ON VALIDATING DATASET Test Method Algorithm Q1 Q2 Q3 Total Five-fold cross-validation test SVM 0.898 0.895 0

44、.804 0.886 PHMM 0.914 0.882 0.934 0.910 ILC 0.938 0.876 0.934 0.935 Jackknife test SVM 0.855 0.856 0.758 0.851 PHMM 0.870 0.856 0.858 0.855 ILC 0.914 0.876 0.884 0.905 From the Tables III , we find that the forecast accuracy of ILC achieves 93.5% and 90.5% respectively, which are higher than LogitBo

45、ost method with the datasets and the homology of GPCR sequence are same. The reason is the Self-Adaptive Immune algorithm has a better performance than Regression decision method to optimize the weight of LogitBoost Classifier. As shown in Table IV, PRED-COUPLE dataset 6 is used for our validating d

46、ataset. Some representative works based on SVM 3, PHMM 6 are tested on the same dataset. It obviously shows that our algorithm has the best performance both on different testing methods. The reason is that the Boosting algorithm has its irreplaceable capability on pattern recognition. Hence, it will

47、 be higher practical value for using of the Improved LogitBoost classifier to deal with the GPCR-G-protein coupling issue and other problem in Bioinformatics. IV. CONCLUSION Prediction of GPCR-G-protein coupling is a very important and challenging problem for the pharmacology design. Over the last d

48、ecades many attempts, with varying degrees of success and novelty, have been made to propose such prediction methods. Based on the past works, the study uses the more objective dataset to exclude the interference of the sequence similarity. For a better performance, a novel integrated algorithm is d

49、eveloped to predict GPCR-G-protein coupling based on LogitBoost Classifier and Self adaptive Immune algorithm as the weight of the classifier is optimized by the Self adaptive Immune algorithm we developed. The overall accuracies on the training and validating dataset are are higher than those of re

50、ported works on the same dataset, respectively. The test results indicate that our method of Improved LogitBoost algorithm, which is verified as be fit for prediction of GPCR-G-protein coupling, has the better classification effect. After the analysis we draw the conclusion that it will be higher pr

51、actical value for using of the Improved LogitBoost classifier to deal with the bioinformatics classification issue. REFERENCES 1 M. Flizola and H. Wenisten, The study of G-protein coupled receptor oligomerization with computational modeling and bioinformatics, FEBS journal, vol. 272,pp. 2926-2938, 2

52、005. 2 Y. Huang et al., Classifying G-protein coupled receptors with bagging classification tree. Comput. Biol. Chem., vol.28,pp.275280,2004 3 S. Moler, J. Vilo and M.D. Corning, Prediction of the coupling specificity of G protein coupled receptors to their G proteins, Bioinformatics, vol. 17,pp. 17

53、4-181, 2001. 4 Y. Yabuki , T. Muramatsu, T Hirokawa, H Mukai, M Suwa, A database for G proteins and their interaction with GPCRs, BMC Bioinformatics, vol. 5,pp. 208-216, 2004. 5 J. Cao et al. A naive Bayes model to predict coupling between seven transmembrane domain receptors and G-proteins , Bioinf

54、ormatics,vol.19,234-240,2003. 6 A. Elefsinioti and P.G. Bagos, I.C. Spyropoulos and S.J. Hamoodracas, A method for the prediction of GPCRs coupling specificity to G-proteins using refined profile Hidden Markov Models, BMC Bioinformatics, vol. 6,pp. 104-115, 2005. 7 Y.D.Cai, K.Y.Feng, W.C.Lu and K.C.

55、Chou, Using LogitBoost classifier to predict protein structural classes, Journal of Theoretical Biology, vol.238,pp. 172-176, 2006. 8 J.D.Thompson, G. Higgins and T.J. Gibson, CLUSTAL W:improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific

56、 gap penalties and weight matrix choice. Nucleic Acids Res., vol.22, pp.46734680,1994 9 Z.H. Hu and Y.S. Ding, Immune co-evolutionary algorithm based partition balancing optimization for tobacco distribution system, Journal of Theoretical Biology, vol.238,pp. 172-176, 2006. 10 Q. Gu and Y.S. Ding, P

57、rediction of G-Protein-Coupled Receptor Classes in Low Homology Using Chous Pseudo Amino Acid Composition with Approximate Entropy and Hydrophobicity Patterns , Protein & Peptide Letters, in press. 11 J. Friedman and T. Tibshirani, Additive logistic regression: a statistical view of boosting, Ann.Stat., pp. 337-407, 2000. 12 Q. Gu, Y.S. Ding, X.Y. Jiang and T.L. Zhang, Prediction of Subcellular location apoptosis proteins with ensemble classifier and feature selection., Amino Acids, in press.

展开阅读全文

外文翻译--Improved LogitBoost Classifier Based Prediction of GPCR-G-Protein Coupling with Self-Adaptive Immune Algorithm

最新文档