随机文法在RNA二级结构预测中的应用

飞***

实名认证

店铺

DOC

1.04MB

约66页

文档ID:43015489

1/66页

点击查看更多>>

文本预览下载提示常见问题

分类号密级 UDC 编号中南大学 CENTRAL SOUTH UNIVERSITY硕硕士士学学位位论论文文论文题目随机文法在 RNA 二级结构预测中的应用研究学科专业名称计算机应用技术作者姓名唐四薪导师姓名及其专业技术职务李义兵教授何红波教授 2006 年 4 月原创性声明本人声明，所呈交的学位论文是本人在导师指导下进行的研究工作及取得的研究成果尽我所知，除了论文中特别加以标注和致谢的地方外，论文中不包含其他人已经发表或撰写过的研究成果，也不包含为获得中南大学或其他单位的学位或证书而使用过的材料与我共同工作的同志对本研究所作的贡献均已在在论文中作了明确的说明作者签名：日期：年月日关于学位论文使用授权说明本人了解中南大学有关保留、使用学位论文的规定，即：学校有权保留学位论文，允许学位论文被查阅和借阅；学校可以公布学位论文的全部或部分内容，可以采用复印、缩印或其它手段保存学位论文；学校可根据国家或湖南省有关部门规定送交学位论文。

作者签名：导师签名日期：年月日I摘要RNA是一类重要的生物大分子，对RNA二级结构的研究是当今计算分子生物学的一个前沿课题RNA单链由四种碱基(A、C、G、U)排列组成，RNA二级结构是指由RNA单链通过自身回折而形成部分碱基配对和单链交替出现的茎环结构，当RNA单链中碱基出现交叉配对现象时就构成假结RNA的功能与其二级结构密切相关本文采用随机文法的方法预测RNA的二级结构随机文法方法把RNA序列看成是具有一定语法规则的语句，通过这些语法规则来分析RNA序列中存在的碱基配对关系，也就是它的语义，从而得到该序列的二级结构由于它是一种基于已有序列的先验知识的方法，需要拥有一定数量的相关序列样本，而且需要确保这些序列具有某些一致的二级结构和一些共同的基本结构单元这样就能通过一种概率模型，把序列样本所具有的保守二级结构的统计信息加以利用，使预测结果具有很高的精度通过扩展随机上下文无关文法使这种方法能考虑RNA二级结构中假结的存在，预测结果将更加真实，而单纯的最小自由能方法是无法预测假结的本文提出了一种语法二次分析的预测方法，先采用词条方法对RNA二级结构进行预处理，将RNA序列划分成词条结构，再使用随机文法模型利用已获得的词条结构信息识别出各种RNA的二级结构。

关键词随机文法，RNA，二级结构，结构预测，假结IIABSTRACTRNA is an important biological macro-molecular, thus the problem of predicting RNA secondary structures is a hot topic in the current research field of computational molecular biology. RNA is a single strand made of the ribonucleotides A(adenine), C(cytosine) ,G(guanine), and U(uracil). RNA sequences fold over onto themselves to form double-stranded regions of base pairings and unpaired single strand called RNA secondary structure which like a stem-loop structure. When crossing patterns of base-pairings appear on RNA sequence, we call it forming pseudoknots. RNA secondary structures are important in the molecular mechanisms involving their functions.In this paper we adopt a method based on Stochastic grammars to predict RNA secondary structures. Stochastic grammar for RNA modeling regards RNA sequences as a type of sentences which possess a set of syntax rules, and analysing their base-paired relationships in RNA sequences through these syntax rules. Consequently we could find the sequence corresponding secondary structure. Because this method is based on a prior knowledge of sequences, so firstly we must have a number of relevant sequences set, and ensure this set of sequences have some consistent structure units. Secondly we can use a probabilistic models making use of the statistical information of consensus secondary IIIstructures. Therefore the prediction results are much more precise. Besides, through expanding stochastic context-free grammar, this method can easily take pseudoknots into account, so the results are more realistic, by contrast with, the free-energy minimization method cannot consider pseudoknots. This dissertation proposed a twice analyzing based on grammar prediction method, the first step is to divide the sequence into lemma structure, then employ stochastic grammar method to identify distinct secondary structure units. KEYWORDS Stochastic Grammar，RNA，Secondary Structure， Structure Prediction，PseudoknotsIV目目录录第一章绪论........................................................1 1.1 本课题研究的领域和目的.......................................1 1.2 RNA 二级结构的特点及研究意义.................................2 1.2.1 RNA 二级结构的特点 .....................................2 1.2.2 RNA 二级结构研究的意义 .................................2 1.3 国内外研究的现状 ............................................3 1.3.1 最小自由能方法.........................................3 1.3.2 随机文法方法...........................................3 1.3.3 评价预测算法的标准.....................................4 1.4 本文完成的主要工作 ..........................................4 第二章 RNA 二级结构及其数学模型 .....................................5 2.1 RNA 结构的划分...............................................5 2.2 RNA 二级结构的定义...........................................6 2.3 子结构的定义 ................................................7 2.4 假结的定义 ..................................................7 第三章生物序列信息的语法学.........................................9 3.1 引言 ........................................................9 3.2 形式文法与随机文法 ..........................................9 3.2.1 形式文法的定义.........................................9 3.2.2 二义性和语法分析......................................10 3.2.3 相关性和自动机........................................11 3.2.4 用 CFG 对 RNA 茎环结构进行建模的过程....................11 3.2.5 随机文法的定义........................................13 3.2.6 随机文法和 HMM 的关系..................................13 3.3 并行通信文法模型——对假结建模 .............................14 3.3.1 并行通信文法模型......................................14 3.3.2 用并行通信文法对假结建模..............................15 3.3.3 并行通信文法的随机模型................................17 第四章 RNA 二级结构的词条 ..........................................19 4.1 RNA 二级结构的词条定义......................................19 4.1.1 RNA 的词条类型 ................................。

下载提示

点击查看常见问题

相似文档

正为您匹配相似的精品文档