基因芯片数据分析简介

资源描述

《基因芯片数据分析简介》由会员分享，可在线阅读，更多相关《基因芯片数据分析简介（71页珍藏版）》请在金锄头文库上搜索。

1、基因芯片数据分析简介,什么是基因芯片,将高密度DNA片段阵列通过高速机器人或原位合成方式以一定的顺序或排列方式使其附着在如玻璃片等固相表面，以荧光标记的DNA探针，借助碱基互补杂交原理，进行大量的基因表达及监测等方面研究的最新革命性技术。,背景：各个物种的基因组测序计划，指数性增长的DNA序列,优点：高通量,大规模,高度平行性,快速高效, 高度自动化,推荐材料,生物芯片分析/(美) M.谢纳著 Microarray Gene Expression Data Analysis: A Beginners Guide / Helen C., et al生物芯片技术与实践/(英)科学出版社R http

2、:/www.r-project.orgBioconductor http:/www.bioconductor.org/,大纲,基因芯片实验寡核苷酸芯片单通道cDNA芯片双通道数据处理寻找差异表达基因聚类寻找显著生物过程,基因芯片实验的步骤,收集样本提取mRNA逆转录为cDNA，并且标记杂交激光扫描数据分析,STEP 1: Collect Samples.,This can be from a variety of organisms. Well use two samples cancerous human skin tissue & healthy human skin tissue,S

3、TEP 2: Isolate mRNA.,Extract the RNA from the samples. Using either a column, or a solvent such as phenol-chloroform.After isolating the RNA, we need to isolate the mRNA from the rRNA and tRNA. mRNA has a poly-A tail, so we can use a column containing beads with poly-T tails to bind the mRNA.Rinse w

4、ith buffer to release the mRNA from the beads. The buffer disrupts the pH, disrupting the hybrid bonds.,STEP 3: Create Labelled DNA.,Add a labelling mix to the RNA. The labelling mix contains poly-T (oligo dT) primers, reverse transcriptase (to make cDNA), and fluorescently dyed nucleotides.We will

5、add cyanine 3 (fluoresces green) to the healthy cells and cyanine 5 (fluoresces red) to the cancerous cells.The primer and RT bind to the mRNA first, then add the fluorescently dyed nucleotides, creating a complementary strand of DNA,STEP 4: Hybridization.,Apply the cDNA we have just created to a mi

6、croarray plate.When comparing two samples, apply both samples to the same plate. The ssDNA will bind to the cDNA already present on the plate.,STEP 5: LASERS!,STEP 5: Microarray Scanner.,The scanner has a laser, a computer, and a camera.The laser causes the hybrid bonds to fluoresce.The camera recor

7、ds the images produced when the laser scans the plate.The computer allows us to immediately view our results and it also stores our data.,STEP 6: Analyze the Data.,GREEN the healthy sample hybridized more than the diseased sample.RED the diseased/cancerous sample hybridized more than the nondiseased

8、 sample.YELLOW - both samples hybridized equally to the target DNA.BLACK - areas where neither sample hybridized to the target DNA.By comparing the differences in gene expression between the two samples, we can understand more about the genomics of a disease.,两种类型芯片,动画，cDNA芯片实验原理,基因芯片数据分析,数据结构（表达值矩阵

9、）,用到的术语：Treatment/Experiment: 实验组的荧光值Control: 对照组的荧光值Ratio: treatment/control表达值: log2Ratio,数据展示：热图 heatmap,数据预处理,不同波长荧光的特异性背景与前景剔除不可信的点对缺失的点进行估计数据转化以更适合下游分析样本之间的归一化使得样本之间可比基因之间的标准化使得基因之间可比基因ID的选择,对不同波长荧光的偏向,芯片的质量,位置尺度的归一化,寻找差异表达基因,寻找差异表达基因,在不同组织中表达发生明显变化的基因是导致细胞状态发生变化的关键基因是芯片分析的主要对象,寻找差异表达基因,寻找差异表达

10、基因倍数变化方法,倍数变化阈值(一般设置为2倍)差异基因数目比例(top5%，即最上调的2.5%,最下调的2.5%)具体方法：1.计算出所有基因的表达变化率(fold change)2.按照表达变化率排序3.上调两倍或者下调两倍算作差异表达基因。适用条件：实验重复数极少,寻找差异表达基因假设检验,假设检验: 先提出一个零假设，然后推断实际数据属于零假设下分布的可能性（置信度，p值）零假设：基因的不是差异表达的,Ho,Ha,Population 1,Population 2,Population 1,Population 2,Volcano plot (Significance),p 0.1

11、(1 decimal place),p 0.01(2 decimal places),Volcano plot (Effect),Using log10 for Y axis:,Using log2 for X axis:,Effect has doubled21 (2 raised to the power of 1)Two Fold Change,Effect has halved20.5 (2 raised to the power of 0.5),Fold Change=Technical Jargon for comparing gene expression values,Volc

12、ano plot,进阶工具：SAM,聚类分析,Eisen et al. (1998), PNAS, 95(25): 14863-14868,什么是聚类分析,将不同元素按照彼此相似性的大小按照一定的规则进行组织或者分类元素相似性等级聚类/归类,聚类在生物分析中的应用,进化树不同物种同源蛋白序列的相似性,聚类在基因芯片数据分析中的应用,基因之间存在共表达共表达的基因可能具有相似的生物功能从具有相似表达谱的基因去推测其功能更好的可视化,聚类分析,在聚类分析中，基因被看作是一个向量通过元素与元素之间的距离，将不同的元素归类,数据结构（表达值矩阵，log2Ratio）,Gene 1: (0.5, -1.

13、8, 0.8, 1.2)Gene 2: (-0.2, 1.2, -0.4, 0.1),聚类分析：距离的定义,欧式距离相似性距离皮尔森相关系数,选取何种距离,聚类分析：k均值聚类,将所有点放入k个不重叠的类中，使得每个类中基因相似度高，而类之间基因的相似度低。,K均值聚类,选择合适的聚类数据 k初始化k个聚类中心 1, k从所有数据点中挑选k个点将数据随机分为k类，以每类的中心作为聚类中心计算每个数据点与每个中心的相似性，将数据点归类到最相似聚类中心所属的类中去当所有数据归类完毕后，重新计算每个聚类的中心，作为新的聚类中心重复计算所有数据点与新的聚类中心的相似性，并且再次归类当聚类中心不在发生

14、变化时，聚类停止,聚类分析：k均值聚类,聚类分析：k均值聚类,缺点：依赖于初始点的设置，可能不是全局最优解需要预先知道分类的个数可以通过比较类内部的距离和类之间的距离来评价聚类的质量。探索性的选择k,该分为多少类？,对k均值聚类的结果可视化,2维,3维,对k均值聚类的结果可视化,对于超过2维的数据，提取最高的两个主成分(PC),K均值聚类实验,5个点：0, 2, 4, 5, 72类，初始化中心 3.5, 5.5|1, 2.13类，初始化中心1.5, 4.5, 5.1|1, 2.1, 2.5,聚类分析：等级聚类,结果为一个嵌套型的形式，适合于关注不同水平的分类细节，在聚类系统中较小的类嵌套在较

15、大的类中，形成层层包含的组织形式。,等级聚类,将数据重新排序,组与组之间的距离,Single Linkage ClusteringMinimum distance between members of two clustersComplete Linkage ClusteringGreatest distance between members of relevant clustersAverage Linkage ClusteringUPGMACentroid or Median,Comparison of Linkage Methods,Single,Average,Complete,Join bymin averagemax,如何进行分类,A,B,聚类分析：等级聚类,等级聚类实验,5个点：0, 2, 4, 5, 7使用average linkage作为类与类之间的距离,软件介绍：cluster+maple tree,寻找显著的生物过程基因集合分析,基因集合分析基因集合的定义,基因集合是一系列具有相似生物学属性的基因Gene ontologyPathwaysChromosome locusRegulatory motifCancer relatedTissue specificNetwork modulesCluster,

展开阅读全文

基因芯片数据分析简介

最新文档