python字符串和文本

Python CookBook总结 用Shell 通配符匹配字符串 你想使用Unix Shell 中常用的通配符(比如.py , Dat[0-9].csv 等) 去匹配文
本字符串

>>> from fnmatch import fnmatch, fnmatchcase >>> fnmatch('foo.txt', '*.txt') True >>> fnmatch('foo.txt', '?oo.txt') True >>> fnmatch('Dat45.csv', 'Dat[0-9]*') True >>> names = ['Dat1.csv', 'Dat2.csv', 'config.ini', 'foo.py'] >>> [name for name in names if fnmatch(name, 'Dat*.csv')] ['Dat1.csv', 'Dat2.csv']

fnmatch() 函数使用底层操作系统的大小写敏感规则(不同的系统是不一样的) 来
匹配模式。比如:
>>> # On OS X (Mac) >>> fnmatch('foo.txt', '*.TXT') False >>> # On Windows >>> fnmatch('foo.txt', '*.TXT') True

字符串匹配和搜索 如果你想匹配的是字面字符串,那么你通常只需要调用基本字符串方法就行
>>> text = 'yeah, but no, but yeah, but no, but yeah' >>> # Exact match >>> text == 'yeah' False >>> # Match at start or end >>> text.startswith('yeah') True >>> text.endswith('no') False >>> # Search for the location of the first occurrence >>> text.find('no') 10

【python字符串和文本】更复杂一些,需要使用正则表达式模块re
>>> text1 = '11/27/2012' >>> import re >>> re.match(r'\d+/\d+/\d+', text1): >>> datepat = re.compile(r'\d+/\d+/\d+') >>> datepat.match(text1) >>> text = 'Today is 11/27/2012. PyCon starts 3/13/2013.' >>> datepat.findall(text) ['11/27/2012', '3/13/2013']

字符串搜索和替换 对于简单的字面模式,直接使用str.repalce() 方法即可,比如:
>>> text = 'yeah, but no, but yeah, but no, but yeah' >>> text.replace('yeah', 'yep') 'yep, but no, but yep, but no, but yep'

对于复杂的模式,请使用re 模块中的sub() 函数。为了说明这个,假设你想将形
式为11/27/2012 的日期字符串改成2012-11-27 。示例如下:
>>> text = 'Today is 11/27/2012. PyCon starts 3/13/2013.' >>> import re >>> re.sub(r'(\d+)/(\d+)/(\d+)', r'\3-\1-\2', text) 'Today is 2012-11-27. PyCon starts 2013-3-13.' #如果你打算用相同的模式做多次替换,考虑先编译它来提升性能 >>> datepat = re.compile(r'(\d+)/(\d+)/(\d+)') >>> datepat.sub(r'\3-\1-\2', text) 'Today is 2012-11-27. PyCon starts 2013-3-13.' #对于更加复杂的替换,可以传递一个替换回调函数来代替 >>> from calendar import month_abbr >>> def change_date(m): ... mon_name = month_abbr[int(m.group(1))] ... return '{} {} {}'.format(m.group(2), mon_name, m.group(3)) ... >>> datepat.sub(change_date, text) 'Today is 27 Nov 2012. PyCon starts 13 Mar 2013.'

如果除了替换后的结果外,你还想知道有多少替换发生了,可以使用re.subn()来代替。比如:
>>> newtext, n = datepat.subn(r'\3-\1-\2', text) >>> newtext 'Today is 2012-11-27. PyCon starts 2013-3-13.' >>> n 2

你需要以忽略大小写的方式搜索与替换文本字符串
>>> text = 'UPPER PYTHON, lower python, Mixed Python' >>> re.findall('python', text, flags=re.IGNORECASE) ['PYTHON', 'python', 'Python'] >>> re.sub('python', 'snake', text, flags=re.IGNORECASE) 'UPPER snake, lower snake, Mixed snake'

最短匹配模式
#*是贪婪的,会尽可能多的匹配 >>> str_pat = re.compile(r'\"(.*)\"') >>> text1 = 'Computer says "no."' >>> str_pat.findall(text1) ['no.'] >>> text2 = 'Computer says "no." Phone says "yes."' >>> str_pat.findall(text2) ['no." Phone says "yes.'] #?是不贪婪的,尽可能少的匹配 >>> str_pat = re.compile(r'\"(.*?)\"') >>> str_pat.findall(text2) ['no.', 'yes.']

字符串对齐
#使用字符串的ljust() , rjust() 和center()方法 >>> text = 'Hello World' >>> text.ljust(20) 'Hello World ' >>> text.rjust(20) ' Hello World' >>> text.center(20) ' Hello World ' >>> text.rjust(20,'=') '=========Hello World' >>> text.center(20,'*') '****Hello World*****' #函数format() 同样可以用来很容易的对齐字符串。 #你要做的就是使用<,> 或者? 字符后面紧跟一个指定的宽度。 >>> format(text, '>20') ' Hello World' >>> format(text, '<20') 'Hello World ' >>> format(text, '^20') ' Hello World ' >>> format(text, '=>20s') '=========Hello World' >>> format(text, '*^20s') '****Hello World*****'

以指定列宽格式化字符串
s = "Look into my eyes, look into my eyes, the eyes, the eyes, \ the eyes, not around the eyes, don't look around the eyes, \ look into my eyes, you're under." >>> import textwrap >>> print(textwrap.fill(s, 70)) Look into my eyes, look into my eyes, the eyes, the eyes, the eyes, not around the eyes, don't look around the eyes, look into my eyes, you're under. >>> print(textwrap.fill(s, 40)) Look into my eyes, look into my eyes, the eyes, the eyes, the eyes, not around the eyes, don't look around the eyes, look into my eyes, you're under. >>> print(textwrap.fill(s, 40, initial_indent=' '))

    推荐阅读