How to read strings cantaining escape character from a file and useit as escape sequences?

Discussion in 'Python' started by slomo, Dec 1, 2007.

  1. slomo

    slomo Guest

    How to read strings cantaining escape character from a file and use it
    as escape sequences?

    for example, a file 'unicodes.txt' has contents:

    \u0050\u0079\u0074\u0068\u006f\u006e

    Now,

    >>> file = open('unicodes.txt')
    >>> line = file.readline()
    >>> line

    '\\u0050\\u0079\\u0074\\u0068\\u006f\\u006e\n'
    >>> print line

    \u0050\u0079\u0074\u0068\u006f\u006e

    But I want to get a string:

    "\u0050\u0079\u0074\u0068\u006f\u006e"

    How do you make it?
    slomo, Dec 1, 2007
    #1
    1. Advertising

  2. slomo

    Duncan Booth Guest

    Re: How to read strings cantaining escape character from a file and use it as escape sequences?

    slomo <> wrote:

    >>>> print line

    > \u0050\u0079\u0074\u0068\u006f\u006e
    >
    > But I want to get a string:
    >
    > "\u0050\u0079\u0074\u0068\u006f\u006e"
    >
    > How do you make it?
    >


    line.decode('unicode-escape')
    Duncan Booth, Dec 1, 2007
    #2
    1. Advertising

  3. slomo

    slomo Guest

    Re: How to read strings cantaining escape character from a file anduse it as escape sequences?

    WOW! Great! Thanks, Duncan.


    On 12¿ù2ÀÏ, ¿ÀÀü12½Ã33ºÐ, Duncan Booth <> wrote:
    > slomo <> wrote:
    > >>>> print line

    > > \u0050\u0079\u0074\u0068\u006f\u006e

    >
    > > But I want to get a string:

    >
    > > "\u0050\u0079\u0074\u0068\u006f\u006e"

    >
    > > How do you make it?

    >
    > line.decode('unicode-escape')
    slomo, Dec 1, 2007
    #3
  4. slomo

    John Machin Guest

    Re: How to read strings cantaining escape character from a file anduse it as escape sequences?

    On Dec 2, 2:33 am, Duncan Booth <> wrote:
    > slomo <> wrote:
    > >>>> print line

    > > \u0050\u0079\u0074\u0068\u006f\u006e

    >
    > > But I want to get a string:

    >
    > > "\u0050\u0079\u0074\u0068\u006f\u006e"

    >
    > > How do you make it?

    >
    > line.decode('unicode-escape')


    Amazing what you can find in obscure corners of the obscure docs! BTW,
    how many folks know what "bijective" means ?

    Hmmm ... the encode is documented as "Produce a string that is
    suitable as Unicode literal in Python source code", but it *isn't*
    suitable. A Unicode literal is u'blah', this gives just blah. Worse,
    it leaves the caller to nut out how to escape apostrophes and quotes:

    >>> test = u'Python\'\'\'\'\"\"\"\"\u1234\n'
    >>> print repr(test)

    u'Python\'\'\'\'""""\u1234\n'
    >>> print test.encode('unicode-escape')

    Python''''""""\u1234\n
    >>>


    Why would someone bother writing this codec when repr() does the job
    properly?

    Anyhow, here's a solution to the OP's stated problem from first
    principles using basic building blocks:

    >>> line = '\\u0050\\u0079\\u0074\\u0068\\u006f\\u006e\n'
    >>> u''.join(unichr(int(x, 16)) for x in line.split(r'\u') if x and x != '\n') + u'\n'

    u'Python\n'
    >>>
    John Machin, Dec 1, 2007
    #4
  5. Re: How to read strings cantaining escape character from a file and use it as escape sequences?

    John Machin wrote:

    > Amazing what you can find in obscure corners of the obscure docs!
    > BTW, how many folks know what "bijective" means ?


    Everyone that can read and is smart enough to enter "bijective" into
    Wikipedia search.

    Regards,


    Björn

    --
    BOFH excuse #25:

    Decreasing electron flux
    Bjoern Schliessmann, Dec 2, 2007
    #5
  6. slomo

    Duncan Booth Guest

    Re: How to read strings cantaining escape character from a file and use it as escape sequences?

    John Machin <> wrote:

    > Hmmm ... the encode is documented as "Produce a string that is
    > suitable as Unicode literal in Python source code", but it *isn't*
    > suitable. A Unicode literal is u'blah', this gives just blah. Worse,
    > it leaves the caller to nut out how to escape apostrophes and quotes:
    >
    >>>> test = u'Python\'\'\'\'\"\"\"\"\u1234\n'
    >>>> print repr(test)

    > u'Python\'\'\'\'""""\u1234\n'
    >>>> print test.encode('unicode-escape')

    > Python''''""""\u1234\n
    >>>>

    >
    > Why would someone bother writing this codec when repr() does the job
    > properly?
    >

    I don't know why it was written, but if it helps I can tell you why I have
    had occasion to use it: precisely because it does leave the caller to 'nut
    out how to escape apostrophes and quotes'.

    repr() does a good enough job if you just want a Python source string, but
    you can't control whether repr will escape quotes or apostrophes - if the
    string contains an apostrophe and no double-quote then the repr will
    enclose it in double-quotes, otherwise it always uses single quotes.

    >>> u'"', u'"\'', u'\'"', u'\''

    (u'"', u'"\'', u'\'"', u"'")

    If you want to force a particular quoting convention then unicode-escape
    gets you half way there and you can get the rest of the way with a couple
    of replace calls.
    Duncan Booth, Dec 2, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Olaf Petzold

    vhdl textio and escape sequences

    Olaf Petzold, Nov 26, 2005, in forum: VHDL
    Replies:
    1
    Views:
    3,431
    Mike Treseler
    Nov 28, 2005
  2. Thomas Philips

    Modifying escape sequences in strings

    Thomas Philips, Mar 2, 2004, in forum: Python
    Replies:
    2
    Views:
    318
    Larry Bates
    Mar 2, 2004
  3. JJ
    Replies:
    4
    Views:
    13,803
    Mark Rae
    Jun 9, 2007
  4. Jeremy
    Replies:
    1
    Views:
    802
    Alex Willmer
    Jan 11, 2011
  5. Jeremy
    Replies:
    0
    Views:
    576
    Jeremy
    Jan 11, 2011
Loading...

Share This Page