Xah said:
Thanks. Is it true that any unicode chars can also be used inside regex
literally?
e.g.
re.search(ur' +',mystring,re.U)
I tested this case and apparently i can.
Yes. In fact, when you write u"\u2003" or u" " doesn't matter
to re.search. Either way you get a Unicode object with U+2003
in it, which is processed by SRE.
But is it true that any
unicode char can be embedded in regex literally. (does this apply to
the esoteric ones such as other non-printing chars and combining
forms...)
Yes. To SRE, only the Unicode ordinal values matter. To determine
whether something matches, it needs to have the same ordinal value
in the string that you have in the expression. No interpretation
of the character is performed, except for the few characters that
have markup meaning in regular expressions (e.g. $, \, [, etc)
Regards,
Martin