编译原理-词法分析

上传人:宝路 文档编号:6914615 上传时间:2017-08-09 格式:PPT 页数:71 大小:2.54MB
返回 下载 相关 举报
编译原理-词法分析_第1页
第1页 / 共71页
编译原理-词法分析_第2页
第2页 / 共71页
编译原理-词法分析_第3页
第3页 / 共71页
编译原理-词法分析_第4页
第4页 / 共71页
编译原理-词法分析_第5页
第5页 / 共71页
点击查看更多>>
资源描述

《编译原理-词法分析》由会员分享,可在线阅读,更多相关《编译原理-词法分析(71页珍藏版)》请在金锄头文库上搜索。

1、Chapter 2ScanningInstructor Jianhui YueSoftware College SCU,Introduction,The scanning, or lexical analysis phase, reads the source program as a stream of characters and divides them into tokens.This is a special case of pattern matchingNeed to study methods of pattern specification and recognitionRe

2、gular expressionsFinite automataThe scanner must operate as efficiently as possible.,Introduction (cont ),Scanner,Parser,Semantic Analyzer,Source Code Optimizer,Code Generator,Target Code Optimizer,Source Code,Target Code,Tokens,Syntax Tree,Annotated Tree,Intermediate code,Target Code,Literal Table,

3、Symbol Table,Error Handler,Tokens,Logical units that the scanner generates.Example: aindex = 4 + 2; a identifier left bracket index identifier right bracket = assignment 4 number + plus sign 2 number,Tokens (cont),The tokens fall into three categoriesReserved words: if , then, etc.Special symbols: +

4、, -, ;, etc. Numbers and identifiers: 3.14, myName, myGrade, etc.Literals: “Hello, world!”, a, b, etc.Tokens are usually defined as an enumerated typetypedef enum IF, THEN, ELSE, PLUS, MINUS, NUM, ID, TokenType;,Token Attributes,The string of characters represented by a token is called its string va

5、lue or its lexeme.Token must be distinguished from the string of characters it represent.Some tokens have one lexemeReserved words and special charactersPotentially, a token can represent infinitely many lexemes.ID token represents all identifiers.Any value associated with a token is called an attri

6、bute.Lexeme, numerical value for a NUM token.,Token Record,Token attributes are collected into a single structured data type.,typedef struct TokenType tokenval; char * stringval; int numval; TokenRecord;,Regular Expressions,Regular expressions represent patterns.A set of strings that matches a regul

7、ar expression r is called the language generated by r and written as L(r).The set of characters or symbols that are available is called the alphabet. The alphabet is denoted by .A regular expression contains symbols of , but they indicate a pattern.Pattern will be written in boldfaceA regular expres

8、sion may contain metacharacters or metasymbols,Basic Regular Expressions,These are just the single characters from the alphabet, which match themselves.r = a, L(a)=a denotes an empty string: L()= matches no string: L( ) = .,Token Function,The scanner returns the next token from the input on demand v

9、ia a function TokenType getToken( );The function computes additional attributes as well.The string of input characters is not passed as a parameter, but is kept in a buffer provided by the system facilities.,Regular Expression Operations,Choice among Alternatives,Let r and s be regular expressions.r

10、|s is a regular expression that matches any string that matched either by r or by s.L(r|s) = L(r) U L(s)Examples: a|b matches either a or b. L(a|b) = a,b.a| matches either a or . L(a|) = a, .L(a|b|c|d) = a, b, c, d,Concatenation,Examples:ab matches the only string ab.(a|b)c matches either ac or bc.G

11、iven two sets of string S1 and S2, the concatenated set of strings S1S2 is the set of string of S1 appended by all the strings of S2.Example: S1=aa, b and S2=a, bbS1S2=aaa, aabb, ba, bbbL(rs) = L(r)L(s), where r and s are two regular expressions.,Repetition,Also called (Kleene) closure.r* matches an

12、y finite concatenation of strings, each of which matches r.Example: a* matches , a, aa, aaa, aaaa, L(r*) = L(r)*Example: L(a|bc)*) = L(a|bc)* = a, bb* =, a, bb, aa, abb, bbbb, aaa, aabb, abba, abbbb, bbabb, bbbba, bbbbbb,Precedence of Operations,Repetition Concatenation AlternativeExample: a|cb* is

13、interpreted as a|(c(b*)Parentheses are used to indicate the precedence as usual.,Highest,Lowest,Names for Regular Expressions,Often,it is helpful to use regular definitions of names.Example: digit=0|1|2|9The name becomes a metasymbol and should be distinguished from the string (digit in the example)

14、.We can use names in regular expressions like digit digit*,Example 2.1, = a, b, cFind a regular expression for the strings of that contain exactly one b.(a|c)*b(a|c)*b does not have to be a center of a matched string.,Example 2.2, = a, b, cFind a regular expression for the strings of that contain at

15、 most one b.We can use a solution to the previous example and add another case matching the strings with no b. (a|c)*|(a|c)*b(a|c)* A|B A: no b B : Just only one b Another solution: (a|c)*(b|)(a|c)* (Its structure is similar to 2.1),Example 2.3, = a, bConsider a set of strings consisting of a single b surrounded by the same number of as.S = b, aba, aabaa, aaabaaa, Regular expression a*ba* does not work.This set of strings cannot be described by a regular expression.This can be proved using a famous theorem called the pumping lemma.,

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 高等教育 > 大学课件

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号