AutomatingOntologicalFunctionAnnotationTowardsaCommonMethodologicalFramework

上传人:re****.1 文档编号:485441683 上传时间:2022-08-04 格式:DOC 页数:3 大小:55.50KB
返回 下载 相关 举报
AutomatingOntologicalFunctionAnnotationTowardsaCommonMethodologicalFramework_第1页
第1页 / 共3页
AutomatingOntologicalFunctionAnnotationTowardsaCommonMethodologicalFramework_第2页
第2页 / 共3页
AutomatingOntologicalFunctionAnnotationTowardsaCommonMethodologicalFramework_第3页
第3页 / 共3页
亲,该文档总共3页,全部预览完了,如果喜欢就下载吧!
资源描述

《AutomatingOntologicalFunctionAnnotationTowardsaCommonMethodologicalFramework》由会员分享,可在线阅读,更多相关《AutomatingOntologicalFunctionAnnotationTowardsaCommonMethodologicalFramework(3页珍藏版)》请在金锄头文库上搜索。

1、Automating Ontological Function Annotation: Towards a Common Methodological FrameworkAutomating Ontological Function Annotation: Towards a Common Methodological FrameworkCliff A Joslyn*, Judith D Cohn, Karin M Verspoor, and Susan M MniszewskiLos Alamos National Laboratory, Los Alamos, NM, USA3* To w

2、hom correspondence should be addressed: MS B265 LANL, Los Alamos, NM 87545 USA, joslynlanl.govABSTRACTMotivation: Our work in the use of ontology categorization for functional annotation is motivating our focus on an overall methodological framework for ontological function annotation (OFA). We draw

3、 on our experiences to discuss test set selection, annotation mappings, evaluation metrics, and structural ontology measures for general OFA.1 INTRODUCTION A new paradigm for functional protein annotation is the use of automated knowledge discovery algorithms mapping sequence, structure, literature,

4、 and/or pathway information about proteins whose functions are unknown into a functional ontology, typically (a portion of) the Gene Ontology (GO, GO Consortium 2000) http:/www.geneontology.org. For example, our own work (Verspoor et al. 2004, 2005) involves analyzing collections of GO nodes (e.g. a

5、nnotations of protein BLAST neighborhood) using the POSet Ontology Categorizer (POSOC, Joslyn et al. 2004) http:/www.c3.lanl.gov/joslyn/posoc.html to produce new annotations. Both in executing this work and in examining similar efforts (e.g. Pal and Eisenberg 2005, Martin et al 2004), we have uncove

6、red a variety of methodological issues which we believe could be valuable for the community to focus on. Here we first explicate our sense of a generic architecture for automated ontological functional annotation (OFA) into the GO, and then discuss specific methodological issues which are generic to

7、 OFA, illustrated by our own experience. 2 GENERIC AUTOMATED OFAA simple formulation for protein function annotation into the GO assumes a collection of genes or proteins X and a set of GO nodes (perhaps for a particular branch) P. Then in the most general sense, annotation is a function assigning e

8、ach protein a collection of GO nodes . So while a known protein x may have a known set of annotations F(x), a new protein y may not have any known annotations, and instead we wish to build some method G returning a predicted set of GO nodes . Typically, we have information about y such as sequence,

9、structure, interactions, pathways, or literature citations, and to build G we exploit knowledge of the proteins “near” y in that space which have known functions. In a testing situation, we take a known protein x and compare its known annotations F(x) against its predicted annotations G(x). Thus to

10、measure the accuracy of our prediction G, we need to compare two different sets of GO nodes, F(x) and G(x), against each other over the set of known proteins X.3 METHODOLOGICAL ISSUES We now briefly survey the methodological issues we will explicate completely in the presentation and full paper.3.1

11、Protein Test SetsFirst we select one or more gold standard test sets X of proteins with trusted annotations in the GO. While any such test set should be shared within the community, nonetheless requirements for a gold standard will vary among research groups. POSOC currently needs a test set contain

12、ing both sequence and structure data, and so we use Swiss-Prot protein sequences with existing PDB structures http:/www.rcsb.org/pdb. Other groups have used a variety of test sets, for example Pal and Eisenberg (2005) use a set of protein sequences from the FSSP structure library http:/www.chem.admu

13、.edu.ph/nina/rosby/fssp.htm to evaluate their ProKnow system; Martin et al (2004) use sequence data from seven complete genomes to test GOtcha. A further consideration is non-redundant test data which is sampled to avoid over-representation in any part of the test space. For example, the non-redunda

14、nt Astral subsets of SCOP domains are designed to cover the variation in SCOP structure space while ensuring that no two SCOP domains in a particular subset have a sequence homology greater than a specified cutoff value (e.g. 95% or 40%) (Chandonia et al. 2004). We propose development of a non-redun

15、dant test set covering GO function space. 3.2 Annotation MappingsThe value of any gold standard is very much tied to the accuracy of their known annotations F. POSOC uses the GOA http:/www.ebi.ac.uk/GOA UniProt http:/www.ebi.ac.uk/uniprot/index.html annotation set for protein sequences, and it could

16、 be useful for this set, or other annotations for other data types, to be regularized as a community standard to provide a means of comparing various studies, including studies attempting to create better annotation sets. Extension to include the source of annotations for a particular type of data and a common ranking for the evidence codes included in GO annotation files (e.g. IC = inferred by curator, IEA = inferred from el

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 办公文档 > 工作计划

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号