外文翻译-- VAMA a versatile web-based tool for variability

上传人:m**** 文档编号:570649391 上传时间:2024-08-05 格式:PDF 页数:4 大小:570.04KB
返回 下载 相关 举报
外文翻译-- VAMA a versatile web-based tool for variability_第1页
第1页 / 共4页
外文翻译-- VAMA a versatile web-based tool for variability_第2页
第2页 / 共4页
外文翻译-- VAMA a versatile web-based tool for variability_第3页
第3页 / 共4页
外文翻译-- VAMA a versatile web-based tool for variability_第4页
第4页 / 共4页
亲,该文档总共4页,全部预览完了,如果喜欢就下载吧!
资源描述

《外文翻译-- VAMA a versatile web-based tool for variability》由会员分享,可在线阅读,更多相关《外文翻译-- VAMA a versatile web-based tool for variability(4页珍藏版)》请在金锄头文库上搜索。

1、VAMA: a versatile web-based tool for variability analysis in multiply-aligned amino acid sequences VAMA:Variability Analysis of Multiple Alignments Aditi Gupta, Aridaman Pandit and Somdatta Sinha Mathematical Modeling and Computational Biology Group, Centre for Cellular and Molecular Biology (CSIR)

2、Hyderabad 500007, India Email: sinhaccmb.res.in AbstractQuantifying residue variability at each column in a multiple sequence alignment of amino acids helps in indicating their similarities, and is useful to highlight information about the significances of each position from the perspective of their

3、 structure, function, and evolution. It is becoming increasingly clear that the groups of amino acids that allow conserved replacement vary with the position of the residue in the protein. Most multiple alignment algorithms cater to general users and hence do not address this specific feature. A too

4、l for scoring variability in multiply-aligned amino acid sequences, that allows different conservation groups, is highly desirable. VAMA (Variability Analysis of Multiple Alignments) is a simple yet versatile program that calculates and plots residue variability in a given set of aligned sequences b

5、ased on known conservation groups specific for different functionally important regions of a protein, and also allows user-defined groups for new discoveries. VAMA is available at http:/203.200.217.184/VAMA/Overview.html Keywords- Multiple Sequence Alignment, Variability Analysis, Residue Conservati

6、on Groups I. INTRODUCTION Alignment of amino acid sequences is widely applied to identify conservation of residues 1. Variability is a measure of the extent of variation of amino acids at a position in multiple sequence alignment. Functionally important residues are known to exhibit higher conservat

7、ion, or lower variability. Multiple sequence alignment (MSA) tools align sequences based on some predetermined classification of amino acids, depending on their physicochemical properties 2. Recent studies have shown that the same group of amino acids may not always be useful, as sequence conservati

8、on classifications vary with the structure and functional of the protein. For example, different classifications have been shown to exist for residues involved in ligand binding 3, in protein-protein interaction interfaces 4, in maintaining structure 5, and in determining protein functional specific

9、ity 6 - 8. Classification by Mirny and Shakhnovich (MS) was intended for protein structure cores 6; Williamsons (W) was tailored to deal with transporter proteins 8, while Guharoy and Chakrabarti (GC) classification was meant for the study of interfacial amino acids 4. This suggests that different s

10、ubstitution classifications should be applied in MSA based upon structure/function of the proteins. Programmes available for studying conservation or variability in multiple alignments based on different scoring methods are rather complex 9, and none of them consider all the above-mentioned position

11、-specific features in proteins 10-15. A tool for scoring variability in multiply-aligned amino acid sequences, that is simple but address this specific feature of allowing different conservation groups, is highly desirable. We have developed a web-based program (VAMA) that quantifies the variability

12、 of the residues at each aligned position in MSA using a simple symbol diversity score 16. Here, all different amino acid residues present in a particular column of the aligned sequences are considered, and the score, v, is calculated using the sum of the frequency of each residue as ()1211iinNvN= (

13、1) where, ni is the frequency of each residue, and N the total number of residues at this position. The sum is over all the different types of amino acids present in the column. For complete conservation of amino acids at a particular column in the MSA (i.e., n = N), the score, v = 0, indicating no

14、variability; whereas, for no conservation or total diversity, the score is v = 1. Since, v varies between 0 and 1, a normalized score is obtained, which is useful for comparison of different MSAs. Because of the generic nature of scoring in VAMA, which is not based on any stereochemical property of

15、the amino acids, v can be used to calculate the relative frequencies of amino acids for any given classification. Here, along with the basic scoring methodology adopted by commonly used MSA programme CLUSTAL 17, several options are provided for function-specific classifications as mentioned earlier.

16、 Thus, VAMA provides the user with flexibility to quantify variability using these different classifications as required. II. FEATURES OF VAMA Fig. 1 shows the VAMA interface. There are three ways in which variability analysis is done in VAMA: Basic, Group Based and Reference Sequence Based. The inp

17、ut for VAMA can be multiple alignment files in CLUSTAL and FASTA 978-1-4244-4713-8/10/$25.00 2010 IEEE Figure 1. VAMA interface. formats. These files can be pasted on to the VAMA work window, or may be uploaded using the “Browse” button. VAMA also addresses the common problem of existence of “gaps”

18、in the alignment. The user can define a Gap cutoff for including residue positions having gaps in the alignment. For columns, having more gaps than Gap cutoff, variability is not calculated and a blank space is displayed in the output. For columns having gaps less than or equal to the defined Gap cu

19、toff, the value of N is adjusted by subtracting from N, the number of gaps at that position. These positions are indicated by a # in the output. VAMA calculates the variability score for each column in the alignment based on the conservation groups chosen by the user, and the output file consists of

20、 the following parts - (i) the variability score at each aligned position; (ii) statistics of the variability data displaying the mean, standard deviation and range for the same; and (iii) the plot of variability values versus alignment positions. The data can be saved both in text and EXCEL format

21、for further analysis. The variability analysis in VAMA can be done in the following three ways A. Basic Variability Analysis Here all amino acids are considered to be in separate classes. Hence, it does not take into account any substitutions, and any non-identity contributes to the variability. B.

22、Group Based Variability Analysis Conservative substitutions can be accounted for by classifying amino acids according to their physicochemical properties and positional attributes. Variability is assigned 0 for a particular position if the amino acids belong to a group in the specific classification

23、. VAMA provides the following group based analysis options: i. Default Classification: This classification is same as the one used in CLUSTAL 17. Amino acids are classified into strong and weak groups based on the physicochemical properties and the Gonnet Pam250 matrix 18. ii. MS, GC, and W Classifi

24、cations: Several different classifications are proposed depending upon the functional constraints applicable. MS classification is applicable to protein structure cores 6, W classification is applicable to transporter proteins 8, and GC classification is applicable to the interface amino acids 4. Th

25、e classifications are described in the User Guide. iii. User Defined Classification: Various other classification schemes have been proposed 7. VAMA offers users the option to use any or their own classification. C. Reference Sequence Based Variability Analysis Reference sequence is the sequence wit

26、h respect to which the residues are numbered. Here x = 0 in the variability plot corresponds to first residue of the reference sequence. This is useful when the 3-dimensional structure of the reference sequence is available to help analyze the results for position-specific changes in other amino aci

27、d sequences in the alignment. The calculation is first done based upon different classifications, and then the residues are numbered according to the Reference Sequence given by the user. By this method the user can access the alignment against the Reference Sequence. An Example of analysis, using V

28、AMA, is shown in Fig. 2. A set of 25 amino acid sequences representing the Rosmann fold 19 were extracted from Protein Data Bank 20. CLUSTAL-aligned sequences were pasted as input to VAMA. We used MS classification to calculate the variability, and compared it with the Default (CLUSTAL) classificati

29、on. Fig. 2A is the variability plot for MS and Default classifications, showing lower score for MS classification than that of the Default classification. In Fig. 2A, Gap cutoff of 0 is used to calculate the scores. Thus, for positions with 1 or more gaps, variability is not calculated. Rosmann fold

30、 being an example of protein structure core, the results with MS are more reliable. Fig. 2B also shows that the MS classification gives the lowest statistics when compared to others. Here we show only the first subset of positions (92-100) for which the variability scores were calculated. Clearly, t

31、he minimum variability score given by MS classification advocates the applicability of this tool. Importantly, in case of a protein lacking well-defined function, VAMA allows calculation of the variability score using the given classifications to identify functionally and structurally important resi

32、dues based on their comparative score. This feature, where several different classifications can be used to calculate the variability in MSA, is unique to VAMA. Figure 2. Analysis of VAMA output for 25 amino acid sequences of the Rossman Fold. (A) Variability plot comparing MS and Default classifica

33、tion scores, (B) Variability values of residues 92 to 100, using different classifications, along with mean and standard deviation (SD). VAMA includes several useful features for the user. The “User Guide” explains all features clearly with example. The “Related Links” provides links to other MSA to

34、ols (e.g. CLUSTALW, T-Coffee, CINEMA, etc.), and useful websites such as, NCBI, PDB, Swiss-Prot, and KEGG. A “Search” button allows search within VAMA and the World Wide Web. III. DISCUSSION VAMA is a simple, user-friendly, yet versatile, tool for calculating the variability in multiply aligned prot

35、ein sequences. VAMA supports both basic and group based analysis, by considering the physicochemical properties of amino acids, as well as their differential usage for different topological determinants in the protein. It quantifies variability in the sequences based on amino acid classification gro

36、ups depending on the above factors. Gap-cutoff feature acts as an additional tool to score the sequences. Statistical analysis performed on the variability data, like mean, standard deviation and range, can also be helpful in further analysis. VAMA is, thus, a useful function-specific variability an

37、alysis tool that allows a comparative analysis a feature lacking in other similar tools. ACKNOWLEDGMENT SS thanks Department of Biotechnology, India for financial support. AG thanks the Indian Academy of Sciences for a summer fellowship to work at the CCMB. AP thanks the Council of Scientific and In

38、dustrial Research (CSIR) for fellowship. REFERENCES 1 T. F. Smith, and M. S. Waterman, “Identification of Common Molecular Subsequences.” J. Mol. Biol., vol. 147, pp. 195-197, 1981. 2 D. J. Lipman, S. F. Altschul, and J. D. Kececioglu, “A tool for multiple sequence alignment.” Proc. Natl. Acad. Sci.

39、 USA, vol. 86, pp. 4412-4415, 1989. 3 T. J. Magliery, and L. Regan, “Sequence variation in ligand binding sites in proteins.” BMC Bioinformatics, vol. 6, p. 240, 2005. 4 M. Guharoy, and P. Chakrabarti, “Conservation and relative importance of residues across protein-protein interfaces.” Proc. Natl.

40、Acad. Sci. USA, vol. 102, pp. 15447-15452, 2005. 5 O. Schueler-Furman, and D. Baker, “Conserved residue clustering and protein structure prediction.” Proteins: Structure Function and Bioinformatics, vol. 52, pp. 225-235, 2003. 6 L. A. Mirny, and E. I. Shakhnovich, “Evolutionary conservation of the f

41、olding nucleus.” J. Mol. Biol., vol. 308, pp. 123-129, 2001. 7 W. R. Taylor, “The classification of amino acid conservation.” J. Theor. Biol., vol. 119, pp. 205-218, 1986. 8 R. M. Williamson, “Information theory analysis of the relationship between primary sequence structure and ligand recognition a

42、mong a class of facilitated transporters.” J. Theor. Biol., vol. 174, pp. 179-188, 1995. 9 W. S. J. Valdar, “Scoring residue conservation.” Proteins: Structure, Function and Bioinformatics, vol. 48, pp. 227-241, 2002. 10 J. A. Capra, and M. Singh, “Predicting functionally important residues from seq

43、uence conservation.” Bioinformatics, vol. 23, pp. 1875-1882, 2007. 11 M. Clamp, J. Cuff, S. M. Searle, and G. J. Barton, “The Jalview Java Alignment Editor.” Bioinformatics, vol. 20, pp. 426-427, 2004. 12 C. D. Livingstone, and G. J. Barton, “Protein Sequence Alignments: A Strategy for the Hierarchi

44、cal Analysis of Residue Conservation.” Comp. Appl. Biosci., vol. 9, pp. 745-756, 1993. 13 D. J. Parry-Smith, and T. K. Attwood, “SOMAP: a novel interactive approach to multiple protein sequences alignment.” Comp. Appl. Biosci., vol. 7, pp. 233-235, 1991. 14 D. J. Parry-Smith, A. W. Payne, A. D. Mich

45、ie, and T. K. Attwood, “CINEMA - a novel Colour INteractive Editor for Multiple Alignments.” Gene, vol. 221, pp. GC57-63, 1998. 15 M. R. Southern, and A. P. Lewis, “JavaShade: multiple sequence alignment box-and-shading on the World Wide Web.” Bioinformatics, vol. 14, pp. 821-822, 1998. 16 I. P. Cra

46、wford, “Evolution of a Biosynthetic Pathway: The Tryptophan Paradigm.” Annu. Rev. Microbiol., vol. 43, pp. 567-600, 1989. 17 F. Jeanmougin, J. D. Thompson, M. Gouy, D. G. Higgins, and T. J. Gibson, “Multiple sequence alignment with Clustal X.” Trends Biochem. Sci., vol. 23, pp. 403-405, 1998. 18 G.

47、Gonnet, M. A. Cohen, and S. A. Benner, “Exhaustive matching of the entire protein sequence database.” Science, vol. 256, pp. 1443-1445, 1992. 19 J. E. Donald, I. A. Hubner, V. M. Rotemberg, E. I. Shakhnovich, and L. A. Mirny, “CoC: a database of universally conserved residues in protein folds.” Bioinformatics, vol. 21, pp. 2539-2540, 2005. 20 PDB: http:/www.rscb.org

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 大杂烩/其它

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号