F
Fuzzyman
Hello all,
Can someone confirm that compiled regular expressions from ascii
strings will always (and safely) yield unicode values when matched
against unicode strings ?
I've tested it and it works - but can someone confirm that this is
consistent and safe ? (No lurking encode errors - I assume it is only a
decode that is done, in which case is it safe on a system that has a
non-ascii compatible default encoding ? OTOH it would seem to me that
that would break *everything*.)
All the best,
Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
Can someone confirm that compiled regular expressions from ascii
strings will always (and safely) yield unicode values when matched
against unicode strings ?
I've tested it and it works - but can someone confirm that this is
consistent and safe ? (No lurking encode errors - I assume it is only a
decode that is done, in which case is it safe on a system that has a
non-ascii compatible default encoding ? OTOH it would seem to me that
that would break *everything*.)
£££import re
r = re.compile('(.*)=(.*)')
s = '£££=£££'.decode('cp1252') # yields a unicode string that can't be encoded as ascii
c = r.match(s)
c.groups() # yields two unicode strings (u'\xa3\xa3\xa3', u'\xa3\xa3\xa3')
print c.groups()[0].encode('cp1252') # which encode safely
All the best,
Fuzzyman
http://www.voidspace.org.uk/python/index.shtml