DNAsequencingmethods

上传人:鲁** 文档编号:568555145 上传时间:2024-07-25 格式:PPT 页数:89 大小:3.12MB
返回 下载 相关 举报
DNAsequencingmethods_第1页
第1页 / 共89页
DNAsequencingmethods_第2页
第2页 / 共89页
DNAsequencingmethods_第3页
第3页 / 共89页
DNAsequencingmethods_第4页
第4页 / 共89页
DNAsequencingmethods_第5页
第5页 / 共89页
点击查看更多>>
资源描述

《DNAsequencingmethods》由会员分享,可在线阅读,更多相关《DNAsequencingmethods(89页珍藏版)》请在金锄头文库上搜索。

1、DNA sequencing: methodsI. Brief history of sequencingII. Sanger dideoxy method for sequencingIII. Sequencing large pieces of DNAVI. The “$1,000 dollar genome”On WebCT- “The $1000 genome”- review of new sequencing techniques by George Church1Why sequence DNA?All genes available for an organism to use

2、 - a very important tool for biologistsNot just sequence of genes, but also positioning of genes and sequences of regulatory regionsNew recombinant DNA constructs must be sequenced to verify construction or positions of mutationsEtc.2History of DNA sequencing3MC chapter 12History of DNA sequencing4M

3、ethods of sequencingA.Sanger dideoxy (primer extension/chain-termination) method: most popular protocol for sequencing, very adaptable, scalable to large sequencing projectsB.Maxam-Gilbert chemical cleavage method: DNA is labelled and then chemically cleaved in a sequence-dependent manner. This meth

4、od is not easily scaled and is rather tediousC.Pyrosequencing: measuring chain extension by pyrophosphate monitoring5for dideoxy sequencing you need:1)Single stranded DNA template2)A primer for DNA synthesis3)DNA polymerase4)Deoxynucleoside triphosphates and dideoxynucleotide triphosphates6Primers f

5、or DNA sequencingOligonucleotide primers can be synthesized by phosphoramidite chemistry-usually designed manually and then purchasedSequence of the oligo must be complimentary to DNA flanking sequenced regionOligos are usually 15-30 nucleotides in length7DNA templates for sequencing:Single stranded

6、 DNA isolated from recombinant M13 bacteriophage containing DNA of interestDouble-stranded DNA that has been denaturedNon-denatured double stranded DNA (cycle sequencing)8One way for obtaining single-stranded DNA from a double stranded source-magnets9Reagents for sequencing: DNA polymerasesShould be

7、 highly processive, and incorporate ddNTPs efficiently Should lack exonuclease activityThermostability required for “cycle sequencing”10Single stranded DNA5353Sanger dideoxy sequencing-basic methoda) Anneal the primer11Sanger dideoxy sequencing: basic methodb) Extend the primer with DNA polymerase i

8、n the presence of all four dNTPs, with a limited amount of a dideoxy NTP (ddNTP)53Direction of DNA polymerase travel12DNA polymerase incorporates ddNTP in a template-dependent manner, but it works best if the DNA pol lacks 3 to 5 exonuclease (proofreading) activity13Sanger dideoxy sequencing: basic

9、method5353TTTTddAddAddAddAddATP in the reaction: anywhere theres a T in the template strand, occasionally a ddA will be added to the growing strand14How to visualize DNA fragments?RadioactivityRadiolabeled primers (kinase with 32P)Radiolabelled dNTPs (gamma 35S or 32P)FluorescenceddNTPs chemically s

10、ynthesized to contain fluorsEach ddNTP fluoresces at a different wavelength allowing identification15Analysis of sequencing products:Polyacrylamide gel electrophoresis-good resolution of fragments differing by a single dNTPSlab gels: as previously describedCapillary gels: require only a tiny amount

11、of sample to be loaded, run much faster than slab gels, best for high throughput sequencing16DNA sequencing gels: old schoolAnalyze sequencing products by gel electrophoresis, autoradiographyDifferent ddNTP used in separate reactionsRadioactively labelled primer or dNTP in sequencing reaction1718cyc

12、le sequencing: denaturation occurs during temperature cycles94C:DNA denatures45C: primer anneals60-72C: thermostable DNA pol extends primerRepeat 25-35 timesAdvantages: dont need a lot of template DNADisadvantages: DNA pol may incorporate ddNTPs poorly19Animation of cycle sequencing: seeClick on:“ma

13、nipulation”“techniques”“sorting and sequencing”20An automated sequencerThe output21Current trends in sequencing:It is rare for labs to do their own sequencing:-costly, perishable reagents-time consuming-success rate variesInstead most labs send out for sequencing:-You prepare the DNA (usually plasmi

14、d, M13, or PCR product), supply the primer, company or university sequencing center does the rest-The sequence is recorded by an automated sequencer as an “electropherogram”22160 kbp1 kbpAssemble sequences by matching overlapsBAC sequenceBAC overlaps give genome sequenceBREAK UP THE GENOME, PUT IT B

15、ACK TOGETHER23Sequencing large pieces of DNA:the “shotgun” methodBreak DNA into small pieces (typically sizes of around 1000 base pairs is preferable)Clone pieces of DNA into M13Sequence enough M13 clones to ensure complete coverage (eg. sequencing a 3 million base pair genome would require 5x to 10

16、x 3 million base pairs to have a reliable representation of the genome)Assemble genome through overlap analysis using computer algorithms, also “polish” sequences using mapping information from individual clones, characterized genes, and genetic markersThis process is assisted by robotics24Sequencin

17、g done by TIGR (Maryland) and The Sanger Institute (Cambridge, UK)“Here we report an analysis of the genome sequence of P. falciparum clone 3D7, including descriptions of chromosome structure, gene content, functional classification of proteins, metabolism and transport, and other features of parasi

18、te biology.”25Sequencing strategyA whole chromosome shotgun sequencing strategy was used to determine the genome sequence of P. falciparum clone 3D7. This approach was taken because a whole genome shotgun strategy was not feasible or cost-effective with the technology that was available at the begin

19、ning of the project. Also, high-quality large insert libraries of (A - T)-rich P. falciparum DNA have never been constructed in Escherichia coli, which ruled out a clone-by-clone sequencing strategy. The chromosomes were separated on pulsed field gels, and chromosomal DNA was extracted26The shotgun

20、sequences were assembled into contiguous DNA sequences (contigs), in some cases with low coverage shotgun sequences of yeast artificial chromosome (YAC) clones to assist in the ordering of contigs for closure. Sequence tagged sites (STSs)10, microsatellite markers11,12 and HAPPY mapping7 were also u

21、sed to place and orient contigs during the gap closure process. The high (A /T) content of the genome made gap closure extremely difficult79. Chromosomes 15, 9 and 12 were closed, whereas chromosomes 68, 10, 11, 13 and 14 contained 337 gaps (most less than 2.5 kb) per chromosome at the beginning of

22、genome annotation. Efforts to close the remaining gaps are continuing.27Methods: Sequencing, gap closure and annotationThe techniques used at each of the three participating centres for sequencing, closure and annotation are described in the accompanying Letters79. To ensure that each centres annota

23、tion procedures produced roughly equivalent results, the Wellcome Trust Sanger Institute (Sanger) and the Institute for Genomic Research (TIGR) annotated the same100-kb segment of chromosome 14. The number of genes predicted in this sequence by the two centres was 22 and 23; the discrepancy being du

24、e to the merging of two single genes by one centre. Of the 74 exons predicted by the two centres, 50 (68%) were identical, 9 (2%) overlapped, 6 (8%) overlapped and shared one boundary, and the remainder were predicted by one centre but not the other. Thus 88% of the exons predicted by the two centre

25、s in the 100-kb fragment were identical or overlapped.28The $1000 dollar genomeVenter Foundation (2003): The first group to produce a technology capable of a $1000 human genome will win $500,000 X - Prize Foundation: no, $5 - 20 million National Institutes of Health (2004): $70 million grant program

26、 to reach the $1000 genome29Previous sequencing techniques: one DNA molecule at a timeNeeded: many DNA molecules at a time - arraysOne of these: “pyrosequencing”Cut a genome to DNA fragments 300 - 500 bases longImmobilize single strands on a very small plastic bead (one piece of DNA per bead)Amplify

27、 the DNA on each bead to cover each bead to boost the signalSeparate each bead on a plate with up to 1.6 million wells30Sequence by DNA polymerase -dependent chain extension, one base at a time in the presence of a reporter (luciferase)Luciferase is an enzyme that will emit a photon of light in resp

28、onse to the pyrophosphate (PPi) released upon nucleotide addition by DNA polymeraseFlashes of light and their intensity are recorded31Extension with individual dNTPs gives a readoutABABThe readout is recorded by a detector that measures position of light flashes and intensity of light flashes32APS =

29、 Adenosine phosphosulfate25 million bases in about 4 hours33Height of peak indicates the number of dNTPs addedThis sequence: TTTGGGGTTGCAGTT34DNA sequencing: methodsI. Brief history of sequencingII. Sanger dideoxy method for sequencingIII. Sequencing large pieces of DNAVI. The “$1,000 dollar genome”

30、On WebCT- “The $1000 genome”- review of new sequencing techniques by George Church35Introduction to bioinformatics1)Making biological sense of DNA sequences2)Online databases: a brief survey3)Database in depth: NCBI4)What is BLAST?5)Using BLAST for sequence analysis6)“Biology workbench”, etc.36There

31、s plenty of DNA to make sense of(2006)37Making sense of genome sequences:1)Genesa)Protein-codingWhere are the open reading frames?What are the ORFs most similar to? (What is the function/structure/evolution history?)b)RNA2)Non-genesa)Regulation: promoters and factor-binding sitesb)Transactions: repl

32、ication, repair, and segregation, DNA packaging (nucleosomes)38Sequence outputComputer calls GNNTNNTGTGNCGGATACAATTCCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACATATGCACCACCACCACCACCACCCCATGGGTATGAATAAGCAAAAGGTTTGTCCTGCTTGTGAATCTGCGGAACTTATTTATGATCCAGAAAGGGGGGAAATAGTCTGTGCCAAGTGCGGTTATGTAATAGAAGAGAACA

33、TAATTGATATGGGTCCTAAGTGGCGTGCTTTTGATGCTTCTCAAAGGGAACGCAGGTCTAGAACTGGTGCACCAGAAAGTATTCTTCTTCATGACAAGGGGCTTTCAACTGCAATTGGAATTGACAGATCGCTTTCCGGATTAATGAGAGAGAAGATGTACCGTTTGAGGAAGTGGCANTCCANATTANGAGTTAGTGATGCAGCANANAGGAACCTAGCTTTTGCCCTAAGTGAGTTGGATAGAATTNCTGCTCAGTTAAAACTTCCNNGACATGTAGAGGAAGAAGCTGCAANGCTGN

34、ACANAGANGCAGNGNGANAGGGACTTATTNGANGCAGATCTATTGAGAGCGTTATGGCGGCANGTGTTTACCCTGCTTGTAGGTTATTAAAAGNTCCCGGGACTCTGGATGAGATTGCTGATATTGCTAGAGCRaw data39atgttgtatttgtctgaagaaaataaatccgtatccactccttgccctcctgataagattatctttgatgcagagaggggggagtacatttgctctgaaactggagaagttttagaagataaaattatagatcaagggccagagtggagggccttca

35、cgccagaggagaaagaaaagagaagcagagttggagggcctttaaacaatactattcacgataggggtttatccactcttatagactggaaagataaggatgctatgggaagaactttagaccctaagagaagacttgaggcattgagatggagaaagtggcaaattagaWhat does this sequence do?Could it encode a protein?40Looking for ORFs (Open Reading Frames)using “DNA Strider”41ORF map1)Where a

36、re the potential starts (ATG) and stops (TAA, TAG, TGA)?2)Which reading frame is correct?= ATG= stopcodonReading frame #1 appears to encode a protein42Cautions in ORF identificationNot all genes initiate with ATG, particularly in certain microbes (archaea)What is the shortest possible length of a re

37、al ORF? 50 amino acids? 25 amino acids? Cut-off is somewhat arbitrary.In eukaryotes, ORFs can be difficult to identify because of intronsAre there other sequences surrounding the ORF that indicate it might be functional?promoter sequences for RNA polymerase bindingShine-Dalgarno sequences for riboso

38、me binding?43What is the function of the sequenced gene?Classical methods:- mutate gene, characterize phenotype for clues to function (genetics)- purify protein product, characterize in vitro (biochemistry)Comparison to previously characterized genes:- genes sequences that have high sequence similar

39、ity usually have similar functions- if your gene has been previously characterized (using classical methods) by someone else, you want to know right away! (avoid duplication of labor)44NCBINCBI home page -Go to for the following pages Pubmed: search tool for literature-search by author, subject, tit

40、le words, etc.All databases: “a retrieval system for searching several linked databases”BLAST: Basic Local Alignment Sequence ToolOMIM: Online Mendelian Inheritance in ManBooks: many online textbooks availableTax Browser: A taxonomic organization of organisms and their genomesStructure: Clearinghous

41、e for solved molecular structures45What does BLAST do?1)Searches chosen sequence database and identifies sequences with similarity to test sequence2)Ranks similar sequences by degree of homology (E value)3)Illustrates alignment between test sequence and similar sequences46Alignment of sequences:The

42、principle: two homologous sequences derived from the same ancestral sequence will have at least some identical (similar) amino acid residuesFraction of identical amino acids is called “percent identity”Similar amino acids: some amino acids have similar physical/chemical properties, and more likely t

43、o substitute for each other-these give specific similarity scores in alignments Gaps in similar/homologous sequences are rare, and are given penalty scores47Homology of proteinsHomology: similarity of biological structure, physiology, development, and evolution, based on genetic inheritanceHomologou

44、s proteins: statistically similar sequence, therefore similar functions (often, but not always)Alignment of TFB and TFIIB sequences48High sequence similarity correlates with functional similarity40-20% identity: fold can be predicted by similarity but precise function cannot be predicted (the 40% ru

45、le)enzymesNon-enzymes49Programs available for BLAST searchesProtein sequence (this is the best option)blastp-compares an amino acid query sequence against a protein sequence databasetblastn-compares a protein query sequence against a nucleotide sequence database translated in all reading framesDNA s

46、equenceblastn-compares a nucleotide query sequence against a nucleotide sequence databaseblastx-compares a nucleotide query sequence translated in all reading frames against a protein sequence databasetblastx-compares the six-frame translations of a nucleotide query sequence against the six-frame tr

47、anslations of a nucleotide sequence database.50BLAST considers all possible combinations of matchesmismatchesgapsin any given alignmentGives the “best” (highest scoring) alignment of sequencesThree scores1) percent identity2) similarity score3) E-value-probability that two sequences will have the si

48、milarity they have by chance (lower number, higher probability of evolutionary homology, higher probability of similar function)51What is the E-value? The E value represents the chance that the similarity is random and therefore insignificant. Essentially, the E value describes the random background

49、 noise that exists for matches between sequences. For example, an E value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size one might expect to see 1 match with a similar score simply by chance.You can change the Expect value threshold on most main BLAST sea

50、rch pages. When the Expect value is increased from the default value of 10, a larger list with more low-scoring hits can be reported. 52E values (continued)From the BLAST tutorial:Although hits with E values much higher than 0.1 are unlikely to reflect true sequence relatives, it is useful to examin

51、e hits with lower significance (E values between 0.1 and 10) for short regions of similarity. In the absence of longer similarities, these short regions may allow the tentative assignment of biochemical activities to the ORF in question. The significance of any such regions must be assessed on a cas

52、e by case basis.53Relationship between E-value and functionSingle domain proteinsMulti-domain proteinsE value greater than 10-10, similar structure but possibly different functions54Computer calls GNNTNNTGTGNCGGATACAATTCCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACATATGCACCACCACCACCACCACCCCATGGGTATGAA

53、TAAGCAAAAGGTTTGTCCTGCTTGTGAATCTGCGGAACTTATTTATGATCCAGAAAGGGGGGAAATAGTCTGTGCCAAGTGCGGTTATGTAATAGAAGAGAACATAATTGATATGGGTCCTAAGTGGCGTGCTTTTGATGCTTCTCAAAGGGAACGCAGGTCTAGAACTGGTGCACCAGAAAGTATTCTTC TTCATGACAAGGGGCTTTCAACTGCAATTGGAATTGACAGATCGCTTTCCGGATTAATGAGAGAGAAGATGTACCGTTTGAGGAAGTGGCANTCCANATTANGAGTTA

54、GTGATGCAGCANANAGGAACCTAGCTTTTGCCCTAAGTGAGTTGGATAGAATTNCTGCTCAGTTAAAACTTCCNNGACATGTAGAGGAAGAAGCTGCAANGCTGNACANAGANGCAGNGNGANAGGGACTTATTNGANGCAGATCTATTGAGAGCGTTATGGCGGCANGTGTTTACCCTGCTTGTAGGTTATTAAAAGNTCCCGGGACTCTGGATGAGATTGCTGATATTGCTAGAGCRaw dataWhat does this sequence do? Cue up BLAST.55MKCPYCKSRDL

55、VYDRQHGEVFCKKCGSILATNLVDSELSRKTKTNDIPRYTKRIGEFTREKIYRLRKWQKKISSERNLVLAMSELRRLSGMLKLPKYVEEEAAYLYREAAKRGLTRRIPIETTVAACIYATCRLFKVPRTLNEIASYSKTEKKEIMKAFRVIVRNLNLTPKMLLARPTDYVDKFADELELSERVRRRTVDILRRANEEGITSGKNPLSLVAAALYIASLLEGERRSQKEIARVTGVSEMTVRNRYKELAFind the open reading frame(s)Translate it:56BLAST a

56、gainst (go to genomes page):- Microbial genomes- environmental sequences (genomes)Results:1)Distribution of hits: query sequence and positions in sequence that gave alignments2)Sequences producing significant alignments1)Accession number (this takes you to the sequence that yielded the hit: gene or

57、contig)2)Name of sequence (sometimes identifies the gene)3)Similarity score4)E-value3)Alignments arranged by E value, with links to gene reports572) Large percentages of coding proteins cannot be assigned function based on homology1) Homology? the function is only inferred (NOT known) Two problems w

58、ith BLAST58For a current list of databases and bioinformatics tools see: Nucleic Acids Research annual bioinformatics issue (comes out every January).List of all the databases described, by category:Guide to NCBI: see Webct59Bioinformatics:making sense of biological sequenceNew DNA sequences are ana

59、lyzed for ORFs (Open Reading Frames: protein)Any DNA or protein sequence can then be compared to all other sequences in databases, and similar sequences identifiedThere is much more - a great diversity of programs and databases are available60Massively parallel measurements of gene expression: micro

60、arraysDefining the “transcriptome”The northern blot revisitedDetecting expression of many genes: arraysA typical array experimentWhat to do with all this data?Brown and Botstein (1999) “Exploring the new world of the genome with DNA microarrays” Nature Genetics 21, p. 33-37.61DNARNAproteingenome“tra

61、nscriptome”“proteome”(we have this)(we want these)62The value of DNA microarrays for studying gene expression1)Study all transcripts at same time2)Transcript abundance usually correlates with level of gene expression-much gene control is at level of transcription3) Changes in transcription patterns

62、often occur as a response to changing environment-this can be detected with a microarray63Detection of mRNA transcriptsNorthern Blot - immobilize mRNA on membrane, detect specific sequence by hybridization with one labeled probe-requires a separate blotting for each probeDNA microarray - immobilize

63、many probes (thousands) in an ordered array, hybridize (base pair) with labelled mRNA or cDNA64Generating an array of probesIdentify open reading frames (orfs)1)PCR each orf (several for each orf), attach (spot) each PCR product to a solid support in a specific order (pioneered by Pat Browns lab, St

64、anford)2)Chemically synthesize orf-specific oligonucleotide probes directly on microchip (Affymetrix)65(Derisi Lab at UCSF)The chip defines the genes you are measuringThe hybridization represents the measurementThe RNA comes from the cells and conditions you are interested in6667A print head for gen

65、erating arrays of probesPrint head travels from DNA probe source (microtiter plate) to solid support (treated glass slide)Small amount of DNA probe is put on a specific spot at a specific locationEach spot (DNA probe sequence) has a specific “address”Print headPrinting needles686970A yeast array exp

66、erimentvegetativesporulatingIsolate mRNAPrepare fluorescently labeled cDNA with two different-colored fluorshybridizeread-out71Example microarray dataGreen: mRNA more abundant in vegetative cellsRed: mRNA more abundant in sporulating cellsYellow: equivalent mRNA abundance in vegetative and sporulati

67、ng cells72What to do with all that data?Overarching patterns may become apparent1)Organize data by hierarchical clustering, profiling to find patterns2)Display data graphically to allow assimilation/comprehension73low mRNA levelsHigh mRNA levels(Cell synchronization method)All yeast cell cycle-regul

68、ated genes(phase in which gene is expressed)74MIAME:The Minimum Information About a Microarray Experiment(#6 helps correct for variations in the quantity of starting RNA, and for variable labelling and detection efficiencies)75DNARNAproteingenome“transcriptome”“proteome”(we have this)(we want these)

69、76Analysis of the proteome: “proteomics”Which proteins are present and when?What are the proteins doing?What interacts with what?Protein-DNA interactions (chromatin immunoprecipitation) Protein-protein interactionsFunctions of proteins?Phizicky et al. (2003) “Protein analysis on a proteomic scale” N

70、ature 422, p. 208-21577Which proteins are expressed?Classical methodDetect presence of a specific proteinUsing antibodies or specific assayMeasure changes in protein levels with changing environment, in different tissuesVery labor intensive, expensive to scale up to proteome78Massively parallel dete

71、ction and identification of proteins2D gel electrophoresisSeparate proteins in a given organism or tissue type by migration in gel electrophoresisIdentify protein (cut out of gel, sequence or mass-spec) Pattern of spots like a barcode for hi-throughput studiesMass spectrometry Separate individual pr

72、oteins from cell by charge and mass, individual proteins can be identified (but need genome sequence information for this)Microarrays: isolate things that bind proteins792D gel electrophoresis1) Separate proteins on the basis of isoelectric pointThis technique is usually done on a long, narrow gel41

73、0802D gel electrophoresisLay gel containing isoelectrically focused protein on SDS page gel, separate on the basis of sizeE.coli protein profileFrom swissprot database, 81Mass spectrometry for identifying proteins in a mixtureFrom J.R. Yates 1998 “Mass spectrometry and the age of the proteome” J Mas

74、s Spec. 33, p 1-19Liquid chromatography and tandem mass spectrometrySoftware for processing data82Defining protein functionClassical methods:Define activity of protein, develop an assay for activityBiochemistry: use assay to purify protein from cell, characterize structure/function of protein in vit

75、roGenetics: obtain mutants with change in activity, characterize phenotype of mutant, obtain suppressors to identify genes that interact with protein of interestTime intensive, expensive83Protein activity at the proteome levelProtein-DNA interactions: identifying binding sites for DNA-binding protei

76、ns: regulation of gene expressionMassively parallel screens for activity-protein arrays84“chromatin immunoprecipitation” (ChIP)1) Grow cells, add formaldehyde to cross-link everything to everything (including DNA to protein)2) Lyse cells, break up DNA by shearing3) Retrieve protein of interest (and

77、the DNA it is bound to) using specific antibody to that protein (immunoprecipitation)4) Determine presence of DNA by quantitative PCRV. Orlando (2000) TIBS 25, p. 9985Massively parallel Ch-IPPCR, label with fluorescent dyes86Protein arrays for functionProteins immobilized, usually by virtue of a tag

78、 sequence (6 x his tag, biotin, etc.)Probe all proteins at once for a specific activity87Example of a protein microarray Proteins fused to GST with 6 x histidine tags, immobilized on Ni+ matrixAnti-GST tells how much protein is immobilized on surfaceSpecific assays identify proteins with specific activities-calmodulin binding, phosphoinositide binding88DNARNAproteingenome“transcriptome”“proteome”(we have this)(we want these)89

展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 资格认证/考试 > 自考

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号