ClusteringTemporalGeneExpressionData

上传人:M****1 文档编号:569735549 上传时间:2024-07-30 格式:PPT 页数:20 大小:500.50KB
返回 下载 相关 举报
ClusteringTemporalGeneExpressionData_第1页
第1页 / 共20页
ClusteringTemporalGeneExpressionData_第2页
第2页 / 共20页
ClusteringTemporalGeneExpressionData_第3页
第3页 / 共20页
ClusteringTemporalGeneExpressionData_第4页
第4页 / 共20页
ClusteringTemporalGeneExpressionData_第5页
第5页 / 共20页
点击查看更多>>
资源描述

《ClusteringTemporalGeneExpressionData》由会员分享,可在线阅读,更多相关《ClusteringTemporalGeneExpressionData(20页珍藏版)》请在金锄头文库上搜索。

1、Temporal Probabilistic Concepts from Heterogeneous Data SequencesTitle & AuthorsSally McCleanBryan ScotneyFiona PalmerSchool of Information & Software Engineering, University of Ulster.Gene ExpressionBackgroundScientists have now sequenced the entire human genome -approximately 30,000 genes. Each of

2、 these genes when active results in the production of a protein -proteins have a variety of functions. In order to understand the function of the genes, and the related proteins, scientists are interested in determining where and when the genes are active.The steps involved in producing a protein fr

3、om a gene.Gene(DNA)RNAProteinBackgroundGene Expression Results. The DNA microarray is a microscope slide which enables scientists to determine the activity or expression of genes Scientists place on each of the microarray spots an extract of the cells along with an extract from a reference sample .T

4、he more RNA produced the more active the gene, (green for the sample and red for the reference).Fluorescence of the spot is then measured to give the expression of the gene compared to the reference.The Gene Expression Data SetBackgroundThe gene expression data set analysed describes the expression

5、of 112 genes in the rat cervical spinal cord over 9 time points through the development of the rat from embryo to adult.Only specific genes were analysed which are considered important in the development of the central nervous system in the rat. E11E13E15E18E21P0P7P14AEmbryo:Days since conceptionPos

6、tNatal: Days since birthAdultThe temporal nature of the gene expression dataClusteringMutual InformationClusteringClustering is usually based on a distance metric - in this case mutual information. Before clustering, the continuous gene expressions were discretised by partitioning the expression int

7、o 3 equal sized bins. GeneE11E13 E15 E18 E21P0P7P14AnAChRa2000122221mAChR2000222221mAChR3000122110nAChRa3002122110EGFR010122220NFL011112210nAChRa7100222221MK2222111110PDGFR222000001Time PointGene Expression Sequences for Cluster 3The ClustersIn this paper we mainly use data from cluster 2.The Proces

8、sThe Process1. Cluster2.Learn Mappings5. Learn Temporal Probabilistic Concepts4. Learn Local Temporal ComceptsSet ofSequencesHomogenised sequencesThe steps used to learn the temporal semantics of sequencesCharacterisation of the cluster Clusters &Mappings3. Map sequencesAn ExampleThe ProblemSequence

9、 1001111110Sequence 2000111122Sequence 3000111122Sequence 4000221111The sequences are heterogeneous in the sense that they represent different attributesThe codes (0, 1, or 2) should be regarded as symbolic We re-label to emphasise this. Gene Expression DataSequence 1AABBBBBBaSequence 2CCCDDDDEESequ

10、ence 3FFFHHHHGGSequence 4IIIKKJJJJRelabelled Gene Expression DataMappingsThe schema mappings are between each sequence (local ontologies) and the hidden variable (L, M, N)We represent the underlying concept (global ontology) via a temporal probabilistic concept model. ALBM A NCLDMENFLGNHMILJNKMLCANE

11、MBDSequence 1Sequence 2Hidden ConceptSequence 1L/N L/NMMMMMML/NSequence 2LLLMMMMNNSequence 3LLLMMMMNNSequence 4LLLMMNNNNMapped sequencesSchema MappingsCorrespondence Graph for cluster containing sequences 1 and 2.The Mapping AlgorithmMappingChoose one of the sequences whose number of symbols is maxi

12、mal (S* say); these symbols act as a proxy for values of the global ontology. For each remaining sequence Si, of length L, determine the mapping of the rth value of Si onto one of the values of the global ontology so as to maximise the number of co-occurrences.Repeat for each r and i.In the ith sequ

13、ence, the value r is then mapped to a set of values (partial value), if it is not unique. Concept DefinitionsConcept LearningThe concepts we are concerned with may be thought of as symbolic objects which are described in terms of discrete-valued features, e.g. features: expression level, function wi

14、th respective domains low, medium, high and growth, control. concept: C1=expression level = high; function = growthProbabilistic Concepts have been used to extend the definition of a concept to uncertain situations where we must associate a probability with the values of each feature vector e.g. C3

15、= expression level = high:0.8, expression level = medium:0.2, function = growth:1.0. Local & Temporal Probabilistic ConceptsConcept LearningA localprobabilisticconcept (LPC) is defined on a time interval e.g.In time-interval S =t1, t2 we have a local probabilistic concept C4 = Time = S, expression l

16、evel = high:0.8, expression level = medium:0.2,i.e. during time interval S there is a high expression level with probability 0.8 and medium expression level with probability 0.2. A temporalprobabilisticconcept (TPC) is defined in terms of a time attribute with domain T = t1 , tk and discrete-valued

17、features Xj, where Xj has domain Dj=vj1, Learning LocalProbabilistic ConceptsConcept LearningThe algorithm for learning LPCs takes account of the fact that the schema mappings may map a local value in the local ontology onto a set of global values (partial values) . We use the EM algorithm to learn

18、LPCs with values that are expressed as a local probabilistic concepts.Sequence 1CCDDDDDDCSequence 2CCCDDDDEESequence 3CCCDDDDEESequence 4CCCDDD/E D/E D/E D/EThen, for example, using only the data at the eighth time point, (column 9 of Table 5) we obtain:Iteration yields the solution Learning Tempora

19、lProbabilistic ConceptsConcept LearningOnce we have learned the local probabilistic concepts, the next task is to learn the TPC. This is carried out using temporal clustering. This is done via log-likelihood ratios and chi-squared tests.Sequence 1CCDDDDDDCSequence 2CCCDDDDEESequence 3CCCDDDDEESequen

20、ce 4CCCDDD/E D/E D/E D/EThe values for the first two time points (columns) are identical so the distance d12 is zero and we combine LC1 and LC2 to form LC12. We now must decide whether LC12 should be combined with LC3 or whether LC3 is part of a new LPC. The distance between LPC12 and LPC3 is then 1

21、.193. Since this value is inside the chi-squared threshold, we therefore decide to combine LPC12 and LPC3 etc.Cluster 2 Rat Gene Sequence DataCluster 2 Mapped Rat DataCluster 2 The LPCs and TPCTheLPCs there are 4 LPCs represented by the 4 coloursThese clusters are then characterised by the local pro

22、babilistic concepts E11, E13: (0.961, 0, 0.039)E15: (0.28, 0. 0.72)E18, E21, P0, P14, A: (0, 0, 1) P7:(0, 0.154, 0.846)ConclusionConclusionWe have described a methodology for describing and learning temporal concepts from heterogeneous sequences that have the same underlying temporal pattern. The da

23、ta are heterogeneous with respect to classification schemes. However, because the sequences relate to the same underlying concept, the mappings between values may be learned. On the basis of these mappings we use statistical learning methods to describe the localprobabilisticconcepts. A temporal pro

24、babilistic concept that describes the underlying pattern is then learned. This concept may be matched with known genetic processes and pathways.Further WorkFurther WorkFor the moment we have not considered performance issues since the problem we have identified is both novel and complex. Our focus,

25、therefore, has been on defining terminology and providing a preliminary methodology. In addition to addressing such performance issues, future work will also investigate the related problem of associating clusters with explanatory data.For example our gene expression sequences could be related to the growth process. Temporal Probabilistic Concepts from Heterogeneous Data SequencesTitle & AuthorsSally McCleanBryan ScotneyFiona PalmerSchool of Information & Software Engineering, University of Ulster.

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 办公文档 > 工作计划

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号