RE: r'\' - python parser bug?

Discussion in 'Python' started by Tim Peters, May 24, 2004.

  1. Tim Peters

    Tim Peters Guest

    [Konstantin Veretennicov]
    > ActivePython 2.3.2 Build 232
    > >>> '\\'

    > '\\'
    > >>> r'\'

    > File "<stdin>", line 1
    > r'\'
    > ^
    > SyntaxError: EOL while scanning single-quoted string
    >
    > Is this a known issue?


    Yes, and a documented one: an r-string cannot end with an odd number of
    backslashes. Note that your first expression ('\\') did create a string
    with a single backslash, although the repr of that string may have fooled
    you into thinking you got two backslashes.

    >>> '\\'

    '\\'
    >>> len('\\')

    1
    >>> print '\\'

    \
    >>>


    > Should i submit a bug report to development?


    Nope: it's not a bug, and won't change.
     
    Tim Peters, May 24, 2004
    #1
    1. Advertising

  2. "Tim Peters" <> wrote in message news:<>...

    > > Should i submit a bug report to development?

    >
    > Nope: it's not a bug, and won't change.


    Ok. Does it mean i'm not encouraged to even try inventing a patch?
    It won't break anything, or will it? I agree we can live without r'\',
    but are there any reasons *against* r'\'?

    - kv
     
    Konstantin Veretennicov, May 25, 2004
    #2
    1. Advertising

  3. Tim Peters wrote:

    > Yup. Right now all tools (including Python itself) that scan over strings
    > in Python source can (and usually do) treat backslashes identically, whether
    > in loops or in regexps.


    Or in other words, the point here is that the prefix flag (u, r, whatever) doesn't
    affect how a string literal is *parsed*. When the parser sees a backslash inside
    a string literal, it always skips the next character. There's no separate grammar
    for "raw string literals".

    </F>
     
    Fredrik Lundh, May 25, 2004
    #3
  4. Tim Peters

    Fuzzyman Guest

    "Fredrik Lundh" <> wrote in message news:<>...
    > Tim Peters wrote:
    >
    > > Yup. Right now all tools (including Python itself) that scan over strings
    > > in Python source can (and usually do) treat backslashes identically, whether
    > > in loops or in regexps.

    >
    > Or in other words, the point here is that the prefix flag (u, r, whatever) doesn't
    > affect how a string literal is *parsed*. When the parser sees a backslash inside
    > a string literal, it always skips the next character. There's no separate grammar
    > for "raw string literals".
    >
    > </F>



    Wrong, surely ?

    >>> print '\\'

    \
    >>> print r'\\'

    \\
    >>> print r'c:\subdir\'

    SyntaxError: EOL while scanning single-quoted string
    >>>


    > When the parser sees a backslash inside
    > a string literal, it always skips the next character.

    In the above example the parser *only* skips the next character if it
    is at the end of the string... surely illogical. The reason given is
    effectively 'raw strings were created for regular expressions, so it
    doesn't matter if the behaviour is illogical' (and precludes other
    reasonable uses!!)..........

    Regards,


    Fuzzy
     
    Fuzzyman, May 26, 2004
    #4
  5. Tim Peters

    Duncan Booth Guest

    (Fuzzyman) wrote in
    news::

    >>>> print r'c:\subdir\'

    > SyntaxError: EOL while scanning single-quoted string
    >>>>

    >
    >> When the parser sees a backslash inside
    >> a string literal, it always skips the next character.

    > In the above example the parser *only* skips the next character if it
    > is at the end of the string... surely illogical. The reason given is
    > effectively 'raw strings were created for regular expressions, so it
    > doesn't matter if the behaviour is illogical' (and precludes other
    > reasonable uses!!)..........
    >


    In a python string, backslash is an escape character which gives the next
    character(s) special meaning, so '\n' is a single newline character. If the
    escaped character isn't a known escape then the parser simply passes
    through the entire sequence. So '\s' is a two character string. In all
    cases at least one character following the backslash is parsed when the
    backslash is encountered, and this character can never form part of the
    string terminator.

    Raw strings are processed in exactly the same way as normal strings, except
    that no escape sequences are recognised, however the character following
    the backslash is still prevented from terminating the string, just as it
    would in any other string. This *useful*? behaviour allows you to put
    single and double quotes into a raw string provided that they are preceded
    by a backslash.

    print r'c:\subdir\'file'

    Raw strings aren't intended for writing DOS pathnames, they are actually
    targetted for regular expressions where this behaviour makes more sense.

    If you need a lot of pathnames in your program you could consider using
    forward slash as the directory separator (use os.path.normpath to convert
    to backslashes if you feel the need), or put all your paths in a separate
    configuration file where you can choose what quoting, if any to interpret.

    Also, provided you use os.path.join to concatenate paths you never actually
    *need* to include a trailing separator:

    DIR = r'c:\subdir'
    FILE = os.path.join(DIR, 'filename')

    ducks the entire issue cleanly.
     
    Duncan Booth, May 26, 2004
    #5
  6. Fuzzyman wrote:

    > Wrong, surely ?


    nope.

    > >>> print '\\'

    > \


    the parser sees the first backslash, skips the second backslash,
    sees the end quote, and passes everything between the quotes
    to the next compiler stage.

    > >>> print r'\\'

    > \\


    the parser sees the first backslash, skips the second backslash,
    sees the end quote, and passes everything between the quotes
    to the next compiler stage.

    > >>> print r'c:\subdir\'

    > SyntaxError: EOL while scanning single-quoted string


    the parser sees the first backslash, skips the "s", and moves on.
    the parser then sees the second backslash, skips the end quote,
    and stumbles upon an EOL. syntax error (grammar violation).

    > > When the parser sees a backslash inside
    > > a string literal, it always skips the next character.

    >
    > In the above example the parser *only* skips the next character if it
    > is at the end of the string... surely illogical.


    you're confusing the string literal syntax (which is what the parser deals
    with) with the contents of the resulting string object (which is created by
    a later compiler stage). read the grammar (it's in the language reference)
    and try again.

    </F>
     
    Fredrik Lundh, May 26, 2004
    #6
  7. Tim Peters

    Fuzzyman Guest

    Duncan Booth <> wrote in message news:<Xns94F55D228C7AAduncanrcpcouk@127.0.0.1>...
    > (Fuzzyman) wrote in
    > news::
    >
    > >>>> print r'c:\subdir\'

    > SyntaxError: EOL while scanning single-quoted string
    > >>>>

    >
    > >> When the parser sees a backslash inside
    > >> a string literal, it always skips the next character.

    > > In the above example the parser *only* skips the next character if it
    > > is at the end of the string... surely illogical. The reason given is
    > > effectively 'raw strings were created for regular expressions, so it
    > > doesn't matter if the behaviour is illogical' (and precludes other
    > > reasonable uses!!)..........
    > >

    >
    > In a python string, backslash is an escape character which gives the next
    > character(s) special meaning, so '\n' is a single newline character. If the
    > escaped character isn't a known escape then the parser simply passes
    > through the entire sequence. So '\s' is a two character string. In all
    > cases at least one character following the backslash is parsed when the
    > backslash is encountered, and this character can never form part of the
    > string terminator.
    >
    > Raw strings are processed in exactly the same way as normal strings, except
    > that no escape sequences are recognised, however the character following
    > the backslash is still prevented from terminating the string, just as it
    > would in any other string. This *useful*? behaviour allows you to put
    > single and double quotes into a raw string provided that they are preceded
    > by a backslash.
    >
    > print r'c:\subdir\'file'
    >
    > Raw strings aren't intended for writing DOS pathnames, they are actually
    > targetted for regular expressions where this behaviour makes more sense.
    >

    [snip..]

    Yeah.. that's not an annoying feature.... I mean no-one would ever
    want to use strings to hold Windows pathnames in......


    Regards,

    Fuzzy
     
    Fuzzyman, May 26, 2004
    #7
  8. Tim Peters

    Peter Hansen Guest

    Fuzzyman wrote:

    > Duncan Booth <> wrote in message news:<Xns94F55D228C7AAduncanrcpcouk@127.0.0.1>...
    >>Raw strings aren't intended for writing DOS pathnames, they are actually
    >>targetted for regular expressions where this behaviour makes more sense.

    >
    > Yeah.. that's not an annoying feature.... I mean no-one would ever
    > want to use strings to hold Windows pathnames in......


    So use forward slashes. They're prettier anyway, and no need for
    the r strings.

    -Peter
     
    Peter Hansen, May 26, 2004
    #8
  9. Many thanks to everyone for enlightening. Now i can see the reasons
    behind "no odd number of trailing backslashes" decision.
    Maybe they deserve to be appended to FAQ section on raw strings?

    Interestingly, C# did it the other way. Trailing backslashes in
    verbatim (raw) strings are allowed, but quotes are not:

    @"\" // ok
    @"\"" // error, unterminated string literal

    For me, personally, trailing backslashes aren't as important as quotes.
    Python wins again :)

    - kv
     
    Konstantin Veretennicov, May 28, 2004
    #9
  10. Fuzzyman wrote:

    [snip]

    > Yeah.. that's not an annoying feature.... I mean no-one would ever
    > want to use strings to hold Windows pathnames in......


    So I guess I'm not the only one who tries to use a special class for
    paths as much as possible then? ;-)

    Working with pathnames as strings is painful, IMHO. Using objects makes
    it much clearer, for me anyway.

    path = Path(r'C:\documents\my\file.txt')
    if path.isfile():
    shutil.copyfile(path.get(), ....)
    print path.dir()
    other_path = path.parent() / 'subdir' / 'otherfile' + '.txt'
    ....


    Regards,

    Per Erik Stendahl
     
    Per Erik Stendahl, May 28, 2004
    #10
  11. Tim Peters

    Isaac To Guest

    >>>>> "Konstantin" == Konstantin Veretennicov <> writes:

    Konstantin> @"\" // ok @"\"" // error, unterminated string literal

    Konstantin> For me, personally, trailing backslashes aren't as important
    Konstantin> as quotes. Python wins again :)

    hm...... Quotes in Python can only happen if it follows a \. This is not
    that useful I believe. Of course, unless \ is really used as an escape
    character in the string, i.e., regex. Perhaps the "r" in r'\'' should be
    read as "regex" rather than "raw"? =)

    Regards,
    Isaac.
     
    Isaac To, May 28, 2004
    #11
  12. On Tue, 25 May 2004 02:37:06 -0700, Konstantin Veretennicov wrote:

    > Ok. Does it mean i'm not encouraged to even try inventing a patch?
    > It won't break anything, or will it?


    It would make impossible to insert backslash-quote in a raw string,
    unless this is a single quote in a double-quoted string or vice versa.

    --
    __("< Marcin Kowalczyk
    \__/
    ^^ http://qrnik.knm.org.pl/~qrczak/
     
    Marcin 'Qrczak' Kowalczyk, May 29, 2004
    #12
  13. Tim Peters

    Fuzzyman Guest

    Per Erik Stendahl <> wrote in message news:<>...
    > Fuzzyman wrote:
    >
    > [snip]
    >
    > > Yeah.. that's not an annoying feature.... I mean no-one would ever
    > > want to use strings to hold Windows pathnames in......

    >
    > So I guess I'm not the only one who tries to use a special class for
    > paths as much as possible then? ;-)
    >
    > Working with pathnames as strings is painful, IMHO. Using objects makes
    > it much clearer, for me anyway.
    >
    > path = Path(r'C:\documents\my\file.txt')
    > if path.isfile():
    > shutil.copyfile(path.get(), ....)
    > print path.dir()
    > other_path = path.parent() / 'subdir' / 'otherfile' + '.txt'
    > ...
    >
    >
    > Regards,
    >
    > Per Erik Stendahl



    *However* - more seriously - if you create a command line tool, then
    python *wrongly* handles pathnames.

    I've written a command line tool called filestruct that compares a
    file structure to a previous state and records the changes (for
    remotely syncing directories - you only have to transfer the changes
    and then filestruct will make the changes). If you give it a windows
    path ending in \" then python interprets it *wrongly*....

    e.g.
    D:\Python Projects\directory change >>> filestruct.py compare
    "D:\Python Projects\" b:\test.txt b:\test.zip
    filestruct needs at least three arguments when run from the command
    line. See :
    filestruct.py ?

    The python interpreter assumes that the entirely valid windows path
    supplied at the command line actually contains an escaped quote..... I
    may have to write a new command line parser to correct this python
    'feature'.....

    Regards,

    Fuzzyman

    http://www.voidspace.org.uk/atlantibots/pythonutils.html
     
    Fuzzyman, Jun 7, 2004
    #13
  14. (Fuzzyman) writes:

    > *However* - more seriously - if you create a command line tool, then
    > python *wrongly* handles pathnames.
    >
    > I've written a command line tool called filestruct that compares a
    > file structure to a previous state and records the changes (for
    > remotely syncing directories - you only have to transfer the changes
    > and then filestruct will make the changes). If you give it a windows
    > path ending in \" then python interprets it *wrongly*....
    >
    > e.g.
    > D:\Python Projects\directory change >>> filestruct.py compare
    > "D:\Python Projects\" b:\test.txt b:\test.zip
    > filestruct needs at least three arguments when run from the command
    > line. See :
    > filestruct.py ?
    >
    > The python interpreter assumes that the entirely valid windows path
    > supplied at the command line actually contains an escaped quote..... I
    > may have to write a new command line parser to correct this python
    > 'feature'.....


    That's not a python bug - a C program shows the same behaviour. You're
    most probably bitten by how cmd.exe handles quoted arguments.

    Thomas
     
    Thomas Heller, Jun 7, 2004
    #14
  15. Tim Peters

    Duncan Booth Guest

    (Fuzzyman) wrote in
    news::

    > I've written a command line tool called filestruct that compares a
    > file structure to a previous state and records the changes (for
    > remotely syncing directories - you only have to transfer the changes
    > and then filestruct will make the changes). If you give it a windows
    > path ending in \" then python interprets it *wrongly*....


    Not that you ever need to give a well written script a path ending in \"?
    Why not just give it the directory without the spurious path separator?

    >
    > e.g.
    > D:\Python Projects\directory change >>> filestruct.py compare
    > "D:\Python Projects\" b:\test.txt b:\test.zip
    > filestruct needs at least three arguments when run from the command
    > line. See :
    > filestruct.py ?
    >
    > The python interpreter assumes that the entirely valid windows path
    > supplied at the command line actually contains an escaped quote..... I
    > may have to write a new command line parser to correct this python
    > 'feature'.....
    >

    Python here seems to be in complete agreement with Microft, or at least
    with their C compiler (not entirely suprising since that's what Python
    uses). The quote is indeed escaped:

    C:\temp>type t.c
    #include <stdio.h>

    int main(int argc, char **argv)
    {
    int i;
    for (i = 0; i < argc; i++)
    {
    printf("arg %d= %s\n", i, argv);
    }
    }

    C:\temp>t "c:\Program Files\" c:\temp
    arg 0= t
    arg 1= c:\Program Files" c:\temp

    C:\temp>
     
    Duncan Booth, Jun 7, 2004
    #15
  16. Tim Peters

    Fuzzyman Guest

    Duncan Booth <> wrote in message news:<Xns95016EAEE7313duncanrcpcouk@127.0.0.1>...
    > (Fuzzyman) wrote in
    > news::
    >
    > > I've written a command line tool called filestruct that compares a
    > > file structure to a previous state and records the changes (for
    > > remotely syncing directories - you only have to transfer the changes
    > > and then filestruct will make the changes). If you give it a windows
    > > path ending in \" then python interprets it *wrongly*....

    >
    > Not that you ever need to give a well written script a path ending in \"?
    > Why not just give it the directory without the spurious path separator?
    >


    It's not necessary - I just want to be able to handle it correctly if
    it happens... after all, it's a perfectly valid path.

    Hmm.. if other command line programs do the same then maybe I won'
    trip up other users... but it seems weird..........

    Regards,

    Fuzzy


    > >
    > > e.g.
    > > D:\Python Projects\directory change >>> filestruct.py compare
    > > "D:\Python Projects\" b:\test.txt b:\test.zip
    > > filestruct needs at least three arguments when run from the command
    > > line. See :
    > > filestruct.py ?
    > >
    > > The python interpreter assumes that the entirely valid windows path
    > > supplied at the command line actually contains an escaped quote..... I
    > > may have to write a new command line parser to correct this python
    > > 'feature'.....
    > >

    > Python here seems to be in complete agreement with Microft, or at least
    > with their C compiler (not entirely suprising since that's what Python
    > uses). The quote is indeed escaped:
    >
    > C:\temp>type t.c
    > #include <stdio.h>
    >
    > int main(int argc, char **argv)
    > {
    > int i;
    > for (i = 0; i < argc; i++)
    > {
    > printf("arg %d= %s\n", i, argv);
    > }
    > }
    >
    > C:\temp>t "c:\Program Files\" c:\temp
    > arg 0= t
    > arg 1= c:\Program Files" c:\temp
    >
    > C:\temp>
     
    Fuzzyman, Jun 7, 2004
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bernd Oninger
    Replies:
    0
    Views:
    784
    Bernd Oninger
    Jun 9, 2004
  2. ZOCOR

    XML Parser VS HTML Parser

    ZOCOR, Oct 3, 2004, in forum: Java
    Replies:
    11
    Views:
    837
    Paul King
    Oct 5, 2004
  3. Bernd Oninger
    Replies:
    0
    Views:
    835
    Bernd Oninger
    Jun 9, 2004
  4. Joel Hedlund
    Replies:
    2
    Views:
    546
    Joel Hedlund
    Nov 11, 2006
  5. Joel Hedlund
    Replies:
    0
    Views:
    321
    Joel Hedlund
    Nov 11, 2006
Loading...

Share This Page