Python 中的正则表达式

Author: yifei / Created: June 22, 2018, 5 p.m. / Modified: June 22, 2018, 5 p.m. / Edit


re.sub(pattern, repl, string)

Notes: backreferencing is better with \g not \number

re.match(pattern, string)

match returns match object or None, and always try to match from the beginning, but do not check the end, match has methods group and groups

re.findall(pattern, string)

if there is no group, return a list of whole match if there is one group, return a list of string of the group if there is more than one group, return a list of tuple of all groups


use with flags=re.XXX re.IGNORECASE

Unicode \w will only match Chinese chars, if only re.UNICODE is set, the pattern is unicode, the string is unicode.

str.isalpha will match all characters including chinese.

when matching Chinese characters, keep everything unicode, and set the re.UNICODE flag. use unicode pattern, unicode string, unicode replacement,'utf-8')


all about groups

look behind requires fixed width pattern

有任何问题可以发邮件到 kongyifei (at) 讨论