《pedigreesfromgenomescomparativegenomicsofalternative从基因组的比较基因组学的替代系》由会员分享,可在线阅读,更多相关《pedigreesfromgenomescomparativegenomicsofalternative从基因组的比较基因组学的替代系(24页珍藏版)》请在金锄头文库上搜索。
1、5 Open Problems in Bioinformatics,Pedigrees from Genomes Comparative Genomics of Alternative Splicing Viral Annotation Evolving Turing Patterns Protein Structure Evolution,Three Processes,Recombination Choosing Parents The Mutational Process,Pedigree process,Coalescent Rebombination process,Seqeunce
2、/Individual Boundary,From Yun Song,From genomes to pedigrees,Probability of Data given a pedigree.,Elston-Stewart (1971) -Temporal Peeling Algorithm:,Lander-Green (1987) - Genotype Scanning Algorithm:,Mother,Father,Condition on parental states Recombination and mutation are Markovian,Mother,Father,C
3、ondition on paternal/maternal inheritance Recombination and mutation are Markovian,Comment: Obvious parallel to Wiuf-Hein99 reformulation of Hudsons 1983 algorithm,Genomes with r and m/r infinity r - recombination rate, m - mutation rate,Counting within a small interval would reveal the length of th
4、e path connecting the two segments. Siblings are readily revealed, since they will have segments with 2m density of mutations The distribution of path lengths are readily observable between two sequences All embedded phylogenies are observable,Benevolent Mutation and Recombination Process,From Phylo
5、genies to Pedigrees Mikes counter example, linkage and individuals,grandparents,Individual 1,Individual 2,Different Pedigrees Same Phylogenies,Gluing Phylogenies together,?,Sibling Sequences come from different parents.,A recombinants parent are sister sequences.,Comparative Genomics of Alternative
6、Splicing,From Transcripts to the AS-Graph,How well known is the AS-graph as a function number of transcripts? A family and distribution of transcripts, can they be explained an AS-graph with probabilities at donor sites or do we need probabilities for (donor,acceptor) pairs? Or possibly even more co
7、mplicated situations. And is sampling transcripts good enough to distinguish these situations.,Mini-project: reliability of AS-detection.,Choose Idealized AS-Graph: Genome Choose donor and acceptor sites in random pairs. For each possible splice pair assign probability for choosing it. This should d
8、efine a probability for all transcripts.,Generate a set of transcripts. Reconstruct AS-Graph.,Key questions: How many transcripts must be sampled to detect AS. How well will the AS-Graph be recovered?,Optimal DAG (directed acyclic graph) under restrictions,Finding a set of annotations: Find set of p
9、aths, maximizing sum of scores. The score of minimal path must be above threshold. Two paths must differ significantly: An enclosed area, the maximal height must be d higher than the boundary defining it. Height(i,j) = di,j + di,j,Does known AS genes have more CTO structure than non-AS genes? Do the
10、 AS correspond to the CTO structure Is the CTO structure evolutionary conserved?,Phylogenetically related ASGs,Is ASG conserved? What is conserved? How is selection along position dependent on splicing status?,http:/www.tulane.edu/dmsander/WWW/335/Diarrhoea.html,http:/www.tulane.edu/dmsander/WWW/335
11、/Papovaviruses.html,http:/www.tulane.edu/dmsander/WWW/335/Retroviruses.html,Virus Annotation,Classes of Gene Structures,Retroviridae Arrangements,Papoviridae Arrangement,Diarrhoea Causing Arrangements,Illustrating the 3 main classes of gene structures: Unidirectional, Convergent and Divergent.,The P
12、roblems of Viral Annotation,HMM gene structure generator (McCauley) Gene Structure Evolution (de Groot) Alignment (Caldeira, Lunter, Rocco) Recombination (Lyngs, Song) Multiple constraints: RNA secondary structure, gene conservation, binding/transcriptional instructional sites.,Our 8 State HMM which
13、 allows for Unidirectional overlapping gene structures,HMM States Non-coding Coding RF1 Coding RF2 Coding RF3 Coding RF1,2 Coding RF1,3 Coding RF2,3 Coding RF1,2,3,Combining Levels of Selection.,Protein-Protein Hein & Stvlbk, 1995 Codon Nucleotide Independence Heuristic Jensen & Pedersen, 2001 Conta
14、gious Dependence,Assume multiplicativity: fA,B = fA*fB,Protein-RNA,Doublets,Singlet,Contagious Dependence,Table illustrating the performance benefit in Sensitivity we obtain utilizing a Phylogenetic HMM. We extend the HMM model to include evolutionary information from 13 aligned HIV2 sequences.,http
15、:/www.ncbi.nlm.nih.gov/Genbank/,http:/www.ncbi.nlm.nih.gov/genomes/VIRUSES/viruses.html,Entrez Genomes currently contains 2120 Reference Sequences for 1510 viral genomes and 36 Reference Sequences for viroids.,Properties of overlapping genes are conserved across microbial genomes. Genome Res. 2004 Nov;14(11):2268-72.,GenBank: Centralized resource for publicly available viral sequence data.,Within microbial genomes, one third of annotated genes contain some degree of overlap, and one third of these are either Convergent or Divergent.,Krakauer, D.C