Translating unicode data

CaptainMcCrank · Mar 23, 2009

Hi list,

I'm struggling with a problem analyzing large amounts of unicode data
in an http wireshark capture.
I've solved the problem with the interpreter, but I'm not sure how to
do this in an automated fashion.

I'd like to grab a line from a text file & translate the unicode
sections of it to ascii. So, for example
I'd like to take
"\u003cb\u003eMar 17\u003c/b\u003e"

and turn it into

"Mar 17"

I can handle this from the interpreter as follows:

But I don't know what I need to do to automate this! The data that is
in the quotes from line 2 will have to come from a variable. I am
unable to figure out how to do this using a variable rather than a
literal string.

Please help!

Peter Otten · Mar 23, 2009

CaptainMcCrank said:
I'm struggling with a problem analyzing large amounts of unicode data
in an http wireshark capture.
I've solved the problem with the interpreter, but I'm not sure how to
do this in an automated fashion.

I'd like to grab a line from a text file & translate the unicode
sections of it to ascii. So, for example
I'd like to take
"\u003cb\u003eMar 17\u003c/b\u003e"

and turn it into

"Mar 17"

I can handle this from the interpreter as follows:

But I don't know what I need to do to automate this! The data that is
in the quotes from line 2 will have to come from a variable. I am
unable to figure out how to do this using a variable rather than a
literal string.

If wireshark uses the same escape codes as python you can use str.decode()
or open the file with codecs.open():
u'Mar 17'

Peter

CaptainMcCrank · Mar 24, 2009

If wireshark uses the same escape codes as python you can use str.decode()
or open the file with codecs.open():

'\\u003cb\\u003eMar 17\\u003c/b\\u003e'>>> s.decode("unicode-escape")

u'Mar 17'

Peter

This is a workable solution! Thank you Peter!

John Machin · Mar 25, 2009

You really need to say what version of Python you are working with,
how the code you tried, and the results you got.

Always very good advice, not often taken

Using Python 3.1, I get:
>>> "\u003cb\u003eMar 17\u003c/b\u003e" == 'Mar 17'
True

Using Python 2.1.3 I get: 1

But so what? AFAICT from the OP's description and his joyous response
to Peter's suggestion, what he has (in 3.0 syntax) is not
"\u003cb\u003e etc"
it's
b"\u003cb\u003e etc"

HTH,
John

Ascii to Unicode.	4	Jul 28, 2010
EEG stream data with mne and brainfolw	0	Jul 26, 2023
Collect Excel Data from Website	5	Apr 30, 2022
sqlalchemy and Unicode strings: errormessage	12	May 30, 2011
Unicode again ... default codec ...	0	Oct 20, 2009
Python and unicode	8	Sep 19, 2010
Ascii to Unicode.	16	Jul 28, 2010
Unicode blues in Python3	14	Mar 23, 2010

Translating unicode data

CaptainMcCrank

Peter Otten

CaptainMcCrank

John Machin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads