Translating unicode data

Discussion in 'Python' started by CaptainMcCrank, Mar 23, 2009.

  1. Hi list,

    I'm struggling with a problem analyzing large amounts of unicode data
    in an http wireshark capture.
    I've solved the problem with the interpreter, but I'm not sure how to
    do this in an automated fashion.

    I'd like to grab a line from a text file & translate the unicode
    sections of it to ascii. So, for example
    I'd like to take
    "\u003cb\u003eMar 17\u003c/b\u003e"

    and turn it into

    "<b>Mar 17</b>"

    I can handle this from the interpreter as follows:

    >>> import unicodedata
    >>> mystring = u"\u003cb\u003eMar 17\u003c/b\u003e"
    >>> print mystring

    <b>Mar 17</b>
    >>>


    But I don't know what I need to do to automate this! The data that is
    in the quotes from line 2 will have to come from a variable. I am
    unable to figure out how to do this using a variable rather than a
    literal string.

    Please help!
    CaptainMcCrank, Mar 23, 2009
    #1
    1. Advertising

  2. CaptainMcCrank

    Peter Otten Guest

    CaptainMcCrank wrote:

    > I'm struggling with a problem analyzing large amounts of unicode data
    > in an http wireshark capture.
    > I've solved the problem with the interpreter, but I'm not sure how to
    > do this in an automated fashion.
    >
    > I'd like to grab a line from a text file & translate the unicode
    > sections of it to ascii. So, for example
    > I'd like to take
    > "\u003cb\u003eMar 17\u003c/b\u003e"
    >
    > and turn it into
    >
    > "<b>Mar 17</b>"
    >
    > I can handle this from the interpreter as follows:
    >
    >>>> import unicodedata
    >>>> mystring = u"\u003cb\u003eMar 17\u003c/b\u003e"
    >>>> print mystring

    > <b>Mar 17</b>
    >>>>

    >
    > But I don't know what I need to do to automate this! The data that is
    > in the quotes from line 2 will have to come from a variable. I am
    > unable to figure out how to do this using a variable rather than a
    > literal string.


    If wireshark uses the same escape codes as python you can use str.decode()
    or open the file with codecs.open():

    >>> s = "\u003cb\u003eMar 17\u003c/b\u003e"
    >>> s

    '\\u003cb\\u003eMar 17\\u003c/b\\u003e'
    >>> s.decode("unicode-escape")

    u'<b>Mar 17</b>'


    >>> open("tmp.txt", "w").write(s)
    >>> import codecs
    >>> f = codecs.open("tmp.txt", "r", encoding="unicode-escape")
    >>> f.read()

    u'<b>Mar 17</b>'

    Peter
    Peter Otten, Mar 23, 2009
    #2
    1. Advertising

  3. On Mar 23, 4:16 pm, Peter Otten <> wrote:
    > CaptainMcCrank wrote:
    > > I'm struggling with a problem analyzing large amounts of unicode data
    > > in an http wireshark capture.
    > > I've solved the problem with the interpreter, but I'm not sure how to
    > > do this in an automated fashion.

    >
    > > I'd like to grab a line from a text file & translate the unicode
    > > sections of it to ascii.  So, for example
    > > I'd like to take
    > > "\u003cb\u003eMar 17\u003c/b\u003e"

    >
    > > and turn it into

    >
    > > "<b>Mar 17</b>"

    >
    > > I can handle this from the interpreter as follows:

    >
    > >>>> import unicodedata
    > >>>> mystring = u"\u003cb\u003eMar 17\u003c/b\u003e"
    > >>>> print mystring

    > > <b>Mar 17</b>

    >
    > > But I don't know what I need to do to automate this!  The data that is
    > > in the quotes from line 2 will have to come from a variable.  I am
    > > unable to figure out how to do this using a variable rather than a
    > > literal string.

    >
    > If wireshark uses the same escape codes as python you can use str.decode()
    > or open the file with codecs.open():
    >
    > >>> s = "\u003cb\u003eMar 17\u003c/b\u003e"
    > >>> s

    >
    > '\\u003cb\\u003eMar 17\\u003c/b\\u003e'>>> s.decode("unicode-escape")
    >
    > u'<b>Mar 17</b>'
    >
    > >>> open("tmp.txt", "w").write(s)
    > >>> import codecs
    > >>> f = codecs.open("tmp.txt", "r", encoding="unicode-escape")
    > >>> f.read()

    >
    > u'<b>Mar 17</b>'
    >
    > Peter


    This is a workable solution! Thank you Peter!
    CaptainMcCrank, Mar 24, 2009
    #3
  4. CaptainMcCrank

    John Machin Guest

    On Mar 24, 10:30 am, Scott David Daniels <>
    wrote:
    > CaptainMcCrank wrote:
    > > Hi list,

    >
    > > I'm struggling with a problem analyzing large amounts of unicode data
    > > in an http wireshark capture.
    > > I've solved the problem with the interpreter, but I'm not sure how to
    > > do this in an automated fashion.

    >
    > > I'd like to grab a line from a text file & translate the unicode
    > > sections of it to ascii.  So, for example
    > > I'd like to take
    > > "\u003cb\u003eMar 17\u003c/b\u003e"

    >
    > > and turn it into

    >
    > > "<b>Mar 17</b>"

    >
    > > I can handle this from the interpreter as follows:

    >
    > >>>> import unicodedata
    > >>>> mystring = u"\u003cb\u003eMar 17\u003c/b\u003e"
    > >>>> print mystring

    > > <b>Mar 17</b>

    >
    > > But I don't know what I need to do to automate this!  The data that is
    > > in the quotes from line 2 will have to come from a variable.  I am
    > > unable to figure out how to do this using a variable rather than a
    > > literal string.

    >
    > > Please help!

    >
    > You really need to say what version of Python you are working with,
    > how the code you tried, and the results you got.


    Always very good advice, not often taken :)

    > Using Python 3.1, I get:
    >      >>> "\u003cb\u003eMar 17\u003c/b\u003e" == '<b>Mar 17</b>'
    >      True


    Using Python 2.1.3 I get:
    >>> "\u003cb\u003eMar 17\u003c/b\u003e" == '<b>Mar 17</b>'

    0
    >>> u"\u003cb\u003eMar 17\u003c/b\u003e" == u'<b>Mar 17</b>'

    1

    But so what? AFAICT from the OP's description and his joyous response
    to Peter's suggestion, what he has (in 3.0 syntax) is not
    "\u003cb\u003e etc"
    it's
    b"\u003cb\u003e etc"

    HTH,
    John
    John Machin, Mar 25, 2009
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dylan Phillips
    Replies:
    0
    Views:
    362
    Dylan Phillips
    Nov 13, 2003
  2. Curt_C [MVP]
    Replies:
    1
    Views:
    415
    Curt_C [MVP]
    Jul 7, 2004
  3. J

    translating in asp.net

    J, Nov 19, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    339
    Scott Allen
    Nov 20, 2004
  4. Guest
    Replies:
    0
    Views:
    371
    Guest
    Aug 17, 2007
  5. Zach Dennis
    Replies:
    19
    Views:
    151
    Brian Candler
    Apr 14, 2005
Loading...

Share This Page