lecture-13(宾夕法尼亚大学二代测序数据分析教程).pdf

上传人:小** 文档编号:89069722 上传时间:2019-05-17 格式:PDF 页数:20 大小:4.38MB
返回 下载 相关 举报
lecture-13(宾夕法尼亚大学二代测序数据分析教程).pdf_第1页
第1页 / 共20页
lecture-13(宾夕法尼亚大学二代测序数据分析教程).pdf_第2页
第2页 / 共20页
lecture-13(宾夕法尼亚大学二代测序数据分析教程).pdf_第3页
第3页 / 共20页
lecture-13(宾夕法尼亚大学二代测序数据分析教程).pdf_第4页
第4页 / 共20页
lecture-13(宾夕法尼亚大学二代测序数据分析教程).pdf_第5页
第5页 / 共20页
点击查看更多>>
资源描述

《lecture-13(宾夕法尼亚大学二代测序数据分析教程).pdf》由会员分享,可在线阅读,更多相关《lecture-13(宾夕法尼亚大学二代测序数据分析教程).pdf(20页珍藏版)》请在金锄头文库上搜索。

1、2013%&%BMMB%597D:%Analyzing%Next%Generaon%Sequencing%Data% % %Week%7,%Lecture%13% IstvnAlbert Biochemistry%and%Molecular%Biology% and%Bioinformacs%Consulng%Center% % Penn%State% Paired%end%sequencing% More%informaon:%connect%reads%that%belong%to% the%original%fragment% Nomenclature:%paired1endand%ma

2、ted1pairsare% diff erent%technologies% The%technology%is%vendor%specifi c%with%quirks%and% tacit%assumpons% Paired%end%(PE)%sequencing% (most%common)% reverse%strand% forward%strand% sequencing%direcon% sequencing%direcon% DNA%FRAGMENT% insert%size% We%end%up%with%two%reads%that%are%known%to%have%co

3、me%from%the%diff erent%strands%of% the%same%DNA%fragment%insert%sizes%200&600bp% % Sequences%both%end%of%the%same%DNA%fragment% Paired%end%(PE)%sequencing% short%fragments,%long%reads% reverse%strand% forward%strand% sequencing%direcon% sequencing%direcon% DNA%FRAGMENT% insert%size% We%end%up%with%t

4、wo%reads%that%are%known%to%have%come%from%the%diff erent%strands%of% the%same%DNA%fragment%insert%sizes%200&600bp% % Sequences%both%end%of%the%same%DNA%fragment% overlap% Read%merging/stching% Mated&pair%(MP)%sequencing% mated%pair%insert%sizes%!%2000%5000bp%long% % % (may%change%as%new%protocols%ar

5、e%developed)% SOLiD%Mate&Pair%protocol% Same%strand% F3%R3% Dealing%with%paired%data% Make%sure%to%understand%which%parts%of%the%DNA%fragments%have% been%sequenced.% Consult%your%sequencing%operator%for%details%on%the%library% preparaon.% When%in%doubt%you%can%operate%in%single%end%mode,%then%visual

6、ize% the%results%(covered%in%later%lectures)% Verify%how%the%pairs%are%located%relave%to%one%another.%(sanity% check)% Consult%vendor%materials%!%comprehensive%but%will%also%contain%a% lot%of%details%that%are%not%relevant% More%strategies% Just%about%all%aligners%can%deal%with%standard%paired%end% (

7、PE)%sequencing%data% % A%few%can%deal%with%mate&pair%(MP)%and%their%variaons%!% see%Novoalign,checkvendor%recommended%tools% Finally%you%may%turn%the%pairs%into%standard%PE%by%reverse% complemenng%the%proper%reads.% Compeng%representaons% SE%single%end%reads,%PE%paired%end%reads% % Paired%end%reads%

8、come%in%either% % two%fi les%with%the%exact%same%number%of%lines%and% IDs,%where%a%pair%is%present%on%the%same%line”% % a%single%fi le%where%pairs%are%consecuve%records% (interleaved)% The%read%order%is%now%also%essenal% Regardless%of%representaon%one%now%needs%to% ensure%that%the%order%of%reads%wil

9、l%keep% matching% % Read%removal%needs%to%take%place%on%both%fi les% or%both%lines%if%the%fi le%is%interleaved.% Quick%PE%checklist% How%are%my%pairs%oriented?% How%is%the%data%formaeed?% are%the%reads%in%the%same%fi le%(interleaved?)% are%the%reads%in%separate%fi les?% what%is%the%naming%convenon?%

10、 what%is%the%expected%insert%(fragment)%size%and%its% distribuon%(minimum,%maximum%insert%sizes)% % Summary:%paired%end%vs%mated%pairs% Paired%ends%is%supported%by%some%technologies%where%it%is%possible% to%sequence%from%both%ends%of%a%clone.% Mate%pairs%involves%making%circular%fragments%using%a%li

11、nker% sequence,%and%fragmenng%them%around%the%linker,%and%then% sequencing%the%result% The%distance%between%mate%pairs%are%much%longer%(2&5kb),%while% paired&end%fragments%are%rarely%more%than%500bp%apart% The%technologies%keep%evolving%within%a%year%!%make%sure%to%ask% quesons%from%the%facility%man

12、agers!% Install%Trimmomac% It%is%a%great%tool%to%deal%with%pairedendreads Lacks%some%opons%that%cutadapt%has% But%it%has%opons%cutadapt%does%not%directly% support% % Install%Flash% Flash%(Fast%Length%Adjustment%of%SHort%reads)% Sches%reads%together% Use%stching%if:% 1. short%library%size%cause%most%

13、reads%to%overlap% signifi cantly%and% 2. genomic%rearrangements%are%not%a%focus%of%the%study% % Shell%scripts% Collect%mulple%commands%into%a%single% program% Run%the%same%commands%again%or%on%other% data% Document%the%steps%and%describe%the%thought% process% % Add%commands%to%a%fi le% Parameterize%

14、with%variables% Take%parameters%from%command%line% Error%catching% “Dead%programs%tell%no%lies”% Bash%has%lots%of%features% We%will%slowly%introduce%some%features%along%the%way% Homework%13% Use%dataset%lect12.tar.gz,it%contains%a%paired%end%read%dataset% Present%a%shell%script%that%takes%the%paired

15、%read%fi les%from%the% command%line%and%produces%output%that% % 1. Have%the%polyA%tail%and%adapters%removed%while%keeping%the%reads%in% paired%order% 2. Are%stched%together%into%a%single%fi le% 3. Produces%a%two%fastqc%reports%for%the%original%fi les%and%a%fastqc% report%for%the%sched%and%unsched%fi les% % Have%one%line%explanaons%in%the%shell%script%that%describe%what%each% step%does.% % % %

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 商业/管理/HR > 管理学资料

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号