兰州大学生物信息学课件:6-基因组组装- 王明成

上传人:wox****ang 文档编号:157234216 上传时间:2020-12-21 格式:PPTX 页数:19 大小:20.09MB
返回 下载 相关 举报
兰州大学生物信息学课件:6-基因组组装- 王明成_第1页
第1页 / 共19页
兰州大学生物信息学课件:6-基因组组装- 王明成_第2页
第2页 / 共19页
兰州大学生物信息学课件:6-基因组组装- 王明成_第3页
第3页 / 共19页
兰州大学生物信息学课件:6-基因组组装- 王明成_第4页
第4页 / 共19页
兰州大学生物信息学课件:6-基因组组装- 王明成_第5页
第5页 / 共19页
点击查看更多>>
资源描述

《兰州大学生物信息学课件:6-基因组组装- 王明成》由会员分享,可在线阅读,更多相关《兰州大学生物信息学课件:6-基因组组装- 王明成(19页珍藏版)》请在金锄头文库上搜索。

1、基因组组装,王明成 2015.10.29,一、Genome survey, Kmer: a continuous nucleic acid sequences, the length is K bp. Suppose the genome is unique to K, we can get G different kmers. when generate a read, the possibility of a certain kmer be sequenced is (L-K+1)/G. L/G is very small, the n_r is very large, this is o

2、bey to Poisson distribution. So, d_k = (L-K+1)/G*n_r n_k = (L-K+1)*n_r then, G =n_k/d_k,Quality control and filtering, Reads having a N over 10% of its length. Reads from short insert-size libraries having more than 65% bases with the quality 7, and the reads from large insert-size libraries that co

3、ntained more than 80% bases with the quality 7. Read 1 and read 2 of two paired-end reads that were completely identical (and thus considered to be the products of PCR duplication).,Error correction before assembly,二、SOAPdenovo algorithm,SOAPdenovo was developed to assemble large genomes, such as hu

4、man, it also works well for small genomes like bacteria. Include five major steps: De bruijn graph construction Graph simplification and obtain contigs Pair-end reads mapping to contigs Construct scaffolds Gap filling with pair-end reads,Sequence assembly refers to aligning and merging fragments to

5、a much longer DNA sequence in order to reconstruct the original sequence.,Overlap:,contig,Ge+en+no+om+mi+ic+cs,Genomics,Pair-end:,scaffold,nom Genome,sem assembly,Genome*assembly,22,1、De bruijn graph construction,Reads : AGATCTTGTTATT,GTTATTGATCTCC,De bruijn graph construction,liding to take Kmer fr

6、om reads,storing the links between neighboring Kmers. If the Kmer is already existent,merge the links of it with the first ones.,AGATC,ATCTT,CTTGT,TTGTT,TGTTA,GTTAT,ATCTC,TCTCC,GATCT,TCTTG,TTATT,TATTG,TTGAT,ATTGA,TGATC,De bruijn graph,2、Graph simplification,Contigs: GATCTTGTTATTGATCT GATCTCC AGATCT,

7、set -R parameter,Contigs: AGATCTTGTTATTGATCTCC,Read1:AGATCTTGTTATT Read2:GTTATTGATCTCC,AGATC 1,GATCT,ATCTTGTTATTGATC,ATCTCC,2,3,4,AGATC,GATCT,ATCTT,TCTTG,CTTGT,TTGTT,TGTTA,GTTAT,ATCTC,TCTCC,TTATT,TATTG,ATTGA,TTGAT,TGATC,3、Pair-end mapping to contig,4、Construct scaffolds,Note: For mate-pair(=2Kb), th

8、e order is just opposite. A reliable link will be built between two contigs, when pair-end/mate-pair reads support larger than the number be set. The gap size is estimated from the insert size of each reads pair.,5、Gap closure,Get reads located in the gap and then do local assembly.,(1) Close gap by

9、 pair-end information (One end mapped on the contig, the other end fall in the gap),(2)Do a local assembly using the reads fall in the gap to get a sequence connect with the both edges of two contigs. Note: Gap closure here also means extend contigs.,Schematic overview,三、Evaluation of assembly resul

10、t,Length contig (scaffold) N50 size, N90 size, total length, coverage ratio of genome. Accuracy Coverage of gene sequences, compare to EST or transcriptome sequences. Compare with golden standard (such as BAC/fosmid) .,Evaluation of Gene Region Coverage,Compare with golden standard,Comparative genomic analysis,Accuracy of gene structures,Thank you for listening!,

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 高等教育 > 大学课件

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号