Reversing backslashed escape sequences

  • Thread starter Steven D'Aprano
  • Start date
S

Steven D'Aprano

I have a byte-string which is an escape sequence, that is, it starts with
a backslash, followed by either a single character, a hex or octal escape
sequence. E.g. something like one of these in Python 2.5:

'\\n'
'\\xFF'
'\\023'

If s is such a string, what is the right way to un-escape them to single
character byte strings?

I could decode them to unicode first, then encode to ASCII:
'\n'

but this fails for non-ASCII bytes:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
position 0: ordinal not in range(128)
 
C

Chris Rebert

I have a byte-string which is an escape sequence, that is, it starts with
a backslash, followed by either a single character, a hex or octal escape
sequence. E.g. something like one of these in Python 2.5:

'\\n'
'\\xFF'
'\\023'

If s is such a string, what is the right way to un-escape them to single
character byte strings?

I could decode them to unicode first, then encode to ASCII:

'\n'

but this fails for non-ASCII bytes:

Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
position 0: ordinal not in range(128)

Python 2.6.5 (r265:79063, May 25 2010, 18:21:57)'\xff'

Cheers,
Chris
 
M

Mark Tolonen

Steven D'Aprano said:
I have a byte-string which is an escape sequence, that is, it starts with
a backslash, followed by either a single character, a hex or octal escape
sequence. E.g. something like one of these in Python 2.5:

'\\n'
'\\xFF'
'\\023'

If s is such a string, what is the right way to un-escape them to single
character byte strings?

I could decode them to unicode first, then encode to ASCII:

'\n'

but this fails for non-ASCII bytes:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
position 0: ordinal not in range(128)

Use 'string-escape':
s=['\\n','\\xff','\\023']
for n in s: n.decode('string-escape')
....
'\n'
'\xff'
'\x13'

-Mark
 
S

Steven D'Aprano

Python 2.6.5 (r265:79063, May 25 2010, 18:21:57)
'\xff'

I knew unicode-escape, obviously, and then I tried just 'escape', but
never thought of 'string_escape'.

Thanks for the quick answer.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top