Embedding a literal "\u" in a unicode raw string.

Discussion in 'Python' started by Romano Giannetti, Feb 25, 2008.

  1. Hi,

    while writing some LaTeX preprocessing code, I stumbled into this problem: (I
    have a -*- coding: utf-8 -*- line, obviously)

    s = ur"añado $\uparrow$"

    Which gave an error because the \u escape is interpreted in raw unicode strings,
    too. So I found that the only way to solve this is to write:

    s = unicode(r"añado $\uparrow$", "utf-8")

    or

    s = ur"añado $\u005cuparrow$"

    The second one is too ugly to live, while the first is at least acceptable; but
    looking around the Python 3.0 doc, I saw that the first one will fail, too.

    Am I doing something wrong here or there is another solution for this?

    Romano
     
    Romano Giannetti, Feb 25, 2008
    #1
    1. Advertising

  2. Romano Giannetti wrote:

    > Hi,
    >
    > while writing some LaTeX preprocessing code, I stumbled into this problem:
    > (I have a -*- coding: utf-8 -*- line, obviously)
    >
    > s = ur"añado $\uparrow$"
    >
    > Which gave an error because the \u escape is interpreted in raw unicode
    > strings, too. So I found that the only way to solve this is to write:
    >
    > s = unicode(r"añado $\uparrow$", "utf-8")
    >
    > or
    >
    > s = ur"añado $\u005cuparrow$"
    >
    > The second one is too ugly to live, while the first is at least
    > acceptable; but looking around the Python 3.0 doc, I saw that the first
    > one will fail, too.
    >
    > Am I doing something wrong here or there is another solution for this?


    Why don't you rid yourself of the raw-string? Then you need to do

    s = u"anando $\\uparrow$"

    which is considerably easier to read than both other variants above.

    Diez
     
    Diez B. Roggisch, Feb 25, 2008
    #2
    1. Advertising

  3. Romano Giannetti wrote:

    > Hi,
    >
    > while writing some LaTeX preprocessing code, I stumbled into this
    > problem: (I have a -*- coding: utf-8 -*- line, obviously)
    >
    > s = ur"añado $\uparrow$"
    >
    > Which gave an error because the \u escape is interpreted in raw
    > unicode strings, too. So I found that the only way to solve this is
    > to write:
    >
    > s = unicode(r"añado $\uparrow$", "utf-8")
    >
    > or
    >
    > s = ur"añado $\u005cuparrow$"
    >
    > The second one is too ugly to live, while the first is at least
    > acceptable; but looking around the Python 3.0 doc, I saw that the
    > first one will fail, too.
    >
    > Am I doing something wrong here or there is another solution for
    > this?


    I too encountered this problem, in the same situation (making
    strings that contain LaTeX commands). One possibility is to separate
    out just the bit that has the \u, and use string juxtaposition to attach
    it to the others:

    s = ur"añado " u"$\\uparrow$"

    It's not ideal, but I think it's easier to read than your solution
    #2.


    --
    --OKB (not okblacke)
    Brendan Barnwell
    "Do not follow where the path may lead. Go, instead, where there is
    no path, and leave a trail."
    --author unknown
     
    OKB (not okblacke), Feb 25, 2008
    #3
  4. Romano Giannetti

    Guest

    On Feb 25, 6:03 pm, "OKB (not okblacke)"
    <> wrote:
    >
    > I too encountered this problem, in the same situation (making
    > strings that contain LaTeX commands). One possibility is to separate
    > out just the bit that has the \u, and use string juxtaposition to attach
    > it to the others:
    >
    > s = ur"añado " u"$\\uparrow$"
    >
    > It's not ideal, but I think it's easier to read than your solution
    > #2.
    >


    Yes, I think I will do something like that, although... I really do
    not understand why \x5c is not interpreted in a raw string but \u005c
    is interpreted in a unicode raw string... is, well, not elegant. Raw
    should be raw...

    Thanks anyway
     
    , Feb 25, 2008
    #4
  5. > Yes, I think I will do something like that, although... I really do
    > not understand why \x5c is not interpreted in a raw string but \u005c
    > is interpreted in a unicode raw string... is, well, not elegant. Raw
    > should be raw...


    Right. IMO, this is just a plain design mistake in the Python Unicode
    handling. Unfortunately, there was discussion about this specific issue
    in the past, and the proponent of the status quo always defended it,
    with the rationale (IIUC) that a) without that, you can't put arbitrary
    Unicode characters into a string, and b) the semantics of \u in Java and
    C is so that \u gets processed even before tokenization even starts, and
    it should be the same in Python.

    Regards,
    Martin
     
    Martin v. Löwis, Feb 25, 2008
    #5
  6. Romano Giannetti

    rmano Guest

    On Feb 25, 11:27 pm, "Martin v. Löwis" <> wrote:
    > > Raw
    > > should be raw...

    >
    > Right. IMO, this is just a plain design mistake in the Python Unicode
    > handling. Unfortunately, there was discussion about this specific issue
    > in the past, and the proponent of the status quo always defended it,
    > with the rationale (IIUC) that a) without that, you can't put arbitrary
    > Unicode characters into a string, and b) the semantics of \u in Java and
    > C is so that \u gets processed even before tokenization even starts, and
    > it should be the same in Python.


    Well, I do not know Java, but C AFAIK has no raw strings, so you have
    nevertheless
    to use double backslashes. Raw strings are a handy shorthand when you
    can generate
    the characters with your keyboard, and this asymmetry quite defeat it.

    Is it decided or it is possible to lobby for it? :)

    Thanks,
    Romano

    BTW, 2to3.py should warn when a raw string (not unicode) with \u in
    it, I think.
    I tried it and it seems to ignore the problem...
     
    rmano, Feb 25, 2008
    #6
  7. Romano Giannetti

    NickC Guest

    On Feb 26, 8:45 am, rmano <> wrote:
    > BTW, 2to3.py should warn when a raw string (not unicode) with \u in
    > it, I think.
    > I tried it and it seems to ignore the problem...


    Python 3.0a3+ (py3k:61229, Mar 4 2008, 21:38:15)
    [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> r"\u"

    '\\u'
    >>> r"\uparrow"

    '\\uparrow'
    >>> r"\u005c"

    '\\u005c'
    >>> r"\N{REVERSE SOLIDUS}"

    '\\N{REVERSE SOLIDUS}'
    >>> "\u005c"

    '\\'
    >>> "\N{REVERSE SOLIDUS}"

    '\\'

    2to3.py may be ignoring a problem, but existing raw 8-bit string
    literals containing a '\u' aren't going to be it. If anything is going
    to have a problem with conversion to Py3k at this point, it is raw
    Unicode literals that contain a Unicode escape.
     
    NickC, Mar 4, 2008
    #7
  8. Romano Giannetti

    rmano Guest

    On Mar 4, 1:00 pm, NickC <> wrote:
    >
    > Python 3.0a3+ (py3k:61229, Mar 4 2008, 21:38:15)
    > [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
    > Type "help", "copyright", "credits" or "license" for more information.>>> r"\u"
    > '\\u'
    > >>> r"\uparrow"

    > '\\uparrow'


    Nice to know... so it seems that the 3.0 doc was not updated. I think
    this is the correct
    behaviour. Thanks
     
    rmano, Mar 7, 2008
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Walter L. Preuninger II

    literal escape sequence conversion to raw

    Walter L. Preuninger II, Jan 5, 2004, in forum: C Programming
    Replies:
    6
    Views:
    481
    Kevin Goodsell
    Jan 5, 2004
  2. Achim Domma
    Replies:
    3
    Views:
    617
    Sjoerd Mullender
    Feb 17, 2004
  3. ldng
    Replies:
    3
    Views:
    1,817
    Tim Golden
    May 10, 2007
  4. Romano Giannetti
    Replies:
    1
    Views:
    312
  5. Anonieko Ramos

    What's wrong with rpc-literal? Why use doc-literal?

    Anonieko Ramos, Sep 27, 2004, in forum: ASP .Net Web Services
    Replies:
    0
    Views:
    380
    Anonieko Ramos
    Sep 27, 2004
Loading...

Share This Page