第4章字符串与正则表达式概要

资源描述

《第4章字符串与正则表达式概要》由会员分享，可在线阅读，更多相关《第4章字符串与正则表达式概要（60页珍藏版）》请在金锄头文库上搜索。

1、第4章字符串与正则表达式,最早的字符串编码是美国标准信息交换码ASCII，仅对10个数字、26个大写字英文字母、26个小写字英文字母及一些其它符号进行了编码。ASCII采用8位即1个字节，因此最多只能对256个字符进行编码。随着信息技术的发展，各国的文字都需要进行编码，常见的编码有UTF-8，GB2312，GBK，CP936。采用不同的编码意味着把同一字符存入文件时，写入的内容可能不同。 UTF-8编码是国际通用的编码，以8位，即1字节表示英语(兼容ASCII)，以24位即3字节表示中文及其它语言，UTF-8对全世界所有国家需要用到的字符进行了编码。,GB2312是中国制定的中文编码，使

2、用1个字节表示英语，2个字节表示中文； GBK是GB2312的扩充； CP936是微软在GBK基础上完成的编码； GB2312、GBK和CP936都是使用2个字节表示中文，UTF-8使用3个字节表示中文； Unicode是编码转换的基础。在Windows平台上，input()函数从键盘输入的字符串默认为GBK编码，而Python程序的字符串编码使用#coding指定，如 #coding=utf-8 #coding:GBK #-*-coding:utf-8 -*-,Python 2.7.8环境： s1=中国 s1 xd6xd0xb9xfa len(s1) 4 s2=s1.decode(GBK)

3、 s2 uu4e2du56fd len(s2) 2 s3=s2.encode(UTF-8) s3 xe4xb8xadxe5x9bxbd len(s3) 6 print s1,s2,s3 中国中国中国,Python3.4.2环境： s = 中国山东烟台 len(s) 6 s = SDIBT len(s) 5 s = 中国山东烟台SDIBT len(s) 11,4.1 字符串,在Python中，字符串也属于序列类型，除了支持序列通用方法（包括分片操作）以外，还支持特有的字符串操作方法。字符串属于不可变序列类型,4.1 字符串,Python字符串驻留机制：对于短字符串，将其赋值给多个不同的对象

4、时，内存中只有一个副本，多个对象共享该副本。长字符串不遵守驻留机制。判断一个变量s是否为字符串，应使用isinstance(s,basestring)。在Python3之前，字符串有str和unicode两种，其基类都是basestring。在Python3之后合二为一了。在Python3中，程序源文件默认为UTF-8编码，全面支持中文，字符串对象不再有encode和decode方法。,4.1.1 字符串格式化,4.1.1 字符串格式化,常用格式字符,4.1.1 字符串格式化, x=1235 so=“%o“ % x so “2323“ sh=“%x“ % x sh “4d3“ se=“%e

5、“ % x se “1.235000e+03“ chr(ord(“3“)+1) “4“ “%s“%65 “65“ “%s“%65333 “65333“ “%d“%“555“ Traceback (most recent call last): File “, line 1, in “%d“%“555“ TypeError: %d format: a number is required, not str,4.1.1 字符串格式化,使用format方法进行格式化 print “The number 0:, in hex is: 0:#x, the number 1 in oct is 1:#o“

6、.format(5555,55) print “The number 1:, in hex is: 1:#x, the number 0 in oct is 0:#o“.format(5555,55) print “my name is name, my age is age, and my QQ is qq“.format(name = “Dong Fuguo“,age = 37,tel = “306467355“) position = (5,8,13) print “X:00;Y:01;Z:02“.format(position) weather = (“Monday“,“rain“),

7、(“Tuesday“,“sunny“),(“Wednesday“, “sunny“),(“Thursday“,“rain“),(“Friday“,“Cloudy“) formatter = “Weather of 00 is 01“.format for item in map(formatter,weather): print item,4.1.2 字符串常用方法,find( )、rfind()、index()、rindex()、count() find()和rfind方法分别用来查找一个字符串在另一个字符串指定范围（默认是整个字符串）中首次和最后一次出现的位置，如果不存在则返回-1；ind

8、ex()和rindex()方法用来返回一个字符串在另一个字符串指定范围中首次和最后一次出现的位置，如果不存在则抛出异常；count()方法用来返回一个字符串在另一个字符串中出现的次数。,4.1.2 字符串常用方法, s=“apple,peach,banana,peach,pear“ s.find(“peach“) 6 s.find(“peach“,7) 19 s.find(“peach“,7,20) -1 s.rfind(p) 25 s.index(p) 1 s.index(pe) 6 s.index(pear) 25 s.index(ppp) Traceback (most recent c

9、all last): File “, line 1, in s.index(ppp) ValueError: substring not found s.count(p) 5 s.count(pp) 1 s.count(ppp) 0,4.1.2 字符串常用方法,split()、rsplit()、partition()、rpartition() split()和rsplit()方法分别用来以指定字符为分隔符，将字符串左端和右端开始将其分割成多个字符串，并返回包含分割结果的列表；partition()和rpartition()用来以指定字符串为分隔符将原字符串分割为3部分，即分隔符前的字符串、分隔

10、符字符串、分隔符后的字符串，如果指定的分隔符不在原字符串中，则返回原字符串和两个空字符串。,4.1.2 字符串常用方法, s=“apple,peach,banana,pear“ li=s.split(“,“) li “apple“, “peach“, “banana“, “pear“ s.partition(,) (apple, , peach,banana,pear) s.rpartition(,) (apple,peach,banana, , pear) s.rpartition(banana) (apple,peach, banana, ,pear) s = “2014-10-31“ t

11、=s.split(“-“) print t 2014, 10, 31 print map(int, t) 2014, 10, 31,4.1.2 字符串常用方法,对于split()和rsplit()方法，如果不指定分隔符，则字符串中的任何空白符号（包括空格、换行符、制表符等等）都将被认为是分隔符，返回包含最终分割结果的列表。 s = hello world nn My name is Dong s.split() hello, world, My, name, is, Dong s = nnhello world nnn My name is Dong s.split() hello, worl

12、d, My, name, is, Dong s = nnhellott world nnn My namet is Dong s.split() hello, world, My, name, is, Dong,4.1.2 字符串常用方法,split()和rsplit()方法还允许指定最大分割次数，例如： s = nnhellott world nnn My name is Dong s.split(None,1) hello, world nnn My name is Dong s.rsplit(None,1) nnhellott world nnn My name is, Dong s.s

13、plit(None,2) hello, world, My name is Dong s.rsplit(None,2) nnhellott world nnn My name, is, Dong s.split(None,5) hello, world, My, name, is, Dong s.split(None,6) hello, world, My, name, is, Dong,4.1.2 字符串常用方法,字符串联接join( ) 例子： li=“apple“, “peach“, “banana“, “pear“ sep=“,“ s=sep.join(li) s “apple,pea

14、ch,banana,pear“ 不推荐使用+连接字符串，优先使用join()方法 CompareJoinAndPlusForStringConnection.py,4.1.2 字符串常用方法,lower()、upper()、capitalize()、title()、swapcase() 这几个方法分别用来将字符串转换为小写、大写字符串、将字符串首字母变为大写、将每个单词的首字母变为大写以及大小写互换。 s=“What is Your Name?“ s2=s.lower() s2 “what is your name?“ s.upper() “WHAT IS YOUR NAME?“ s2.cap

15、italize() “What is your, name?“ s.title() What Is Your Name? s.swapcase() wHAT IS yOUR nAME?,4.1.2 字符串常用方法,查找替换replace( ) s=“中国，中国“ print s 中国，中国 s2=s.replace(“中国“, “中华人民共和国“) print s2 中华人民共和国，中华人民共和国,4.1.2 字符串常用方法,生成映射表函数maketrans和按映射表关系转换字符串函数translate import string table=string.maketrans(“abcdef123“,“uvwxyz#$“) s=“Python is a greate programming language. I like it!“ s.translate(table) “Python is u gryuty progrumming lunguugy. I liky it!“ s.translate(table,“gtm“) #第二个参数表示要删除的字符 “Pyhon

展开阅读全文