re.sub(pattern, repl, string)
Notes: backreferencing is better with \g
match returns match object or None, and always try to match from the beginning, but do not check the end, match has methods group and groups
if there is no group, return a list of whole match if there is one group, return a list of string of the group if there is more than one group, return a list of tuple of all groups
use with flags=re.XXX re.IGNORECASE
Unicode \w will only match Chinese chars, if only re.UNICODE is set, the pattern is unicode, the string is unicode.
str.isalpha will match all characters including chinese.
when matching Chinese characters, keep everything unicode, and set the re.UNICODE flag. use unicode pattern, unicode string, unicode replacement, io.open(encoding='utf-8')
all about groups
look behind requires fixed width pattern