hg19 (grch37) 与 hg38 (grch38) 数据差异比较

上传人:简****9 文档编号:95473742 上传时间:2019-08-19 格式:PPT 页数:29 大小:2.18MB
返回 下载 相关 举报
hg19 (grch37) 与 hg38 (grch38) 数据差异比较_第1页
第1页 / 共29页
hg19 (grch37) 与 hg38 (grch38) 数据差异比较_第2页
第2页 / 共29页
hg19 (grch37) 与 hg38 (grch38) 数据差异比较_第3页
第3页 / 共29页
hg19 (grch37) 与 hg38 (grch38) 数据差异比较_第4页
第4页 / 共29页
hg19 (grch37) 与 hg38 (grch38) 数据差异比较_第5页
第5页 / 共29页
点击查看更多>>
资源描述

《hg19 (grch37) 与 hg38 (grch38) 数据差异比较》由会员分享,可在线阅读,更多相关《hg19 (grch37) 与 hg38 (grch38) 数据差异比较(29页珍藏版)》请在金锄头文库上搜索。

1、hg19 (GRCh37) vs. hg38 (GRCh38) Human Genome Reference Comparison,Zuotian Tatum Department of Human Genetics Leiden University Medical Center,Timeline,GRCh37: First release: Feb 27, 2009 Latest patch: Jun 28, 2013 (p13),GRCh38: First release: Dec 24, 2013 Latest patch: Oct 14, 2014 (p1),http:/www.nc

2、bi.nlm.nih.gov/projects/genome/assembly/grc/human/data/,Content,GRCh37.p13: Total bases: 3.23 Billion 2.99 Billion (without N) N50: 46 Million Number of alternative loci: 9 Non-nuclear genome: No,GRCh38.p2: Total bases: 3.21 Billion 3.05 Billion (without N) N50: 67 Million Number of alternative loci

3、 : 261 Non-nuclear genome: Yes,http:/www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/data/,UCSC tracks for GRCh38,UCSC RefSeq available since April 2014. Ensembl regulatory build available since September 2014. dbSNP 141 available since October 2014. ENCODE and FANTOM5 track hubs are still n

4、ot available (Nov 2014).,New in GRCh38 release,Three new sequence files, in addition to the standard assembly files: - GCA_000001405.15_GRCh38_top-level.fna.gz - GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz - GCA_000001405.15_GRCh38_full_analysis_set.fna.gz The analysis set files are created t

5、o avoid false mapping in NGS alignment pipelines.,GCA_000001405.15_GRCh38_top-level.fna.gz,All the top-level objects in the full-assembly Chromosomes unlocalized scaffolds unplaced scaffolds alternate locus scaffolds mitochondrial genome The sequence identifiers are International Sequence Database C

6、ollaboration (INSDC) accession.versions and the definition lines are GenBank style. No sequences have been hard-masked.,GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz,Chromosomes from the GRCh38 Primary Assembly unit. Note: the two PAR regions on chrY have been hard-masked with Ns. The chromosom

7、e Y sequence provided therefore has the same coordinates as the GenBank sequence but it is not identical to the GenBank sequence. Similarly, duplicate copies of centromeric arrays and WGS on chromosomes 5, 14, 19, 21 & 22 have been hard-masked with Ns. Mitochondrial genome from the GRCh38 non-nuclea

8、r assembly unit. Unlocalized scaffolds from the GRCh38 Primary Assembly unit. Unplaced scaffolds from the GRCh38 Primary Assembly unit. Epstein-Barr virus (EBV) sequence Note: The EBV sequence is not part of the genome assembly but is included in the analysis set as a sink for alignment of reads tha

9、t are often present in sequencing samples.,GCA_000001405.15_GRCh38_full_analysis_set.fna.gz,= GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz + alt-scaffolds from the GRCh38 ALT_REF_LOCI_* assembly units,Alt-loci add complexity to RNASeq quantification,Ideogram of GRCh38.p2,RNASeq quantification,

10、- Fragments (reads) per million per killobase (FPKM/RPKM) values to quantify gene expression - Unique mapping only Analysis tools do not distinguish allelic duplication from paralogous duplication - Non overlapping gene regions,To understand the effect of alt-loci on RNASeq quantification,Compare al

11、ignment of chromosome 6 MHC region between - hg19 full set with 7 alt-loci - hg38 analysis set without alt-loci Sequence content are largely unchanged between hg19 and hg38.,Mapping/alignment for RNASeq,hg19: with alt loci hg38: without alt loci,Effect of alt loci in RNASeq alignments,Gene RPKM (hg3

12、8),Distribution of RPKM difference,Major Histocompatibility complex region on chromosome 6,HLA-A,hg19 full set chr6,D1,HLA-A,hg19 full set chr6,hg38 analysis set,HLA-C,hg19 full set,D1,D2,D3,HLA-DRA,hg19 full set,D1,D2,D3,Major Histocompatibility complex region on chromosome 6,MHC Class III,700kb st

13、retch, 60 genes. The most gene-dense region of the human genome 14% coding 72% transcribed Highly conserved Only a free have clearly defined and proven function,TNF,hg19 full set chr6,D1.control,D1.treated,Highly variant immune regions retiled,LILRA3 moved to alt-loci in hg38,hg19,hg38,LILRB2 LILRA3

14、 LILRA5,LILRB2 LILRA5,Phantom LILRA3,LILRA3 in hg19,Intergenic,LILRB3,LILRA4,LILRB5,Gene length calculation,We need gene length for calculating RPKM. If alignment uses alt loci RPKM would be artificially lowered for alt loci genes. If alignment does not alt loci Remove alt loci annotations from the

15、official set.,Need more comprehensive approach to genome variation.,Assembly model is neither haploid nor diploid Analysis tools penalize reads mapping to 1 location do not distinguish allelic duplication from paralogous duplication A graph structure is a natural way to represent a population-based

16、genome assembly,Conclusions,RPKM values are highly correlated between hg19 and hg38. Analysis set is preferred for expression analysis. Additional analysis may be performed to use the alt-loci separately. Annotations for hg38 is still lacking and need contribution from the community. Improve modeling of genome variability in population.,Questions?,

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 商业/管理/HR > 管理学资料

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号