reg exp and octal notation

Discussion in 'Python' started by Lucas Branca, Mar 5, 2004.

  1. Lucas Branca

    Lucas Branca Guest

    Could someone explain me the difference between the results below?

    ## $cat octals.txt
    ## \006\034abc

    import re

    a= "\006\034abc"
    preg= re.compile(r'([\0-\377]*)')
    res = preg.search(a)
    print res.groups()

    loader = open('./octals.txt', 'r')
    b = loader.readline()
    preg= re.compile(r'([\0-\377]*)')
    res = preg.search(b)
    print res.groups()


    RESULTS

    ('\x06\x1cabc',)

    ('\\006\\034abc\n',)


    Many thanks
    Lucas
    Lucas Branca, Mar 5, 2004
    #1
    1. Advertising

  2. Lucas Branca

    Ruud de Jong Guest

    Lucas Branca schreef:
    > Could someone explain me the difference between the results below?
    >
    > ## $cat octals.txt
    > ## \006\034abc
    >
    > import re
    >
    > a= "\006\034abc"
    > preg= re.compile(r'([\0-\377]*)')
    > res = preg.search(a)
    > print res.groups()
    >
    > loader = open('./octals.txt', 'r')
    > b = loader.readline()


    Look at the value of b at this point, you'll see:
    >>> b

    '\\006\\034abc\n'

    In other words, the backslashes are seen as literal backslashes.
    readline() does no evaluation of the string, it just copies the
    characters.

    Regards,

    Ruud

    > preg= re.compile(r'([\0-\377]*)')
    > res = preg.search(b)
    > print res.groups()
    >
    >
    > RESULTS
    >
    > ('\x06\x1cabc',)
    >
    > ('\\006\\034abc\n',)
    >
    >
    > Many thanks
    > Lucas
    >
    >
    Ruud de Jong, Mar 5, 2004
    #2
    1. Advertising

  3. Lucas Branca

    Peter Otten Guest

    Lucas Branca wrote:

    > Could someone explain me the difference between the results below?
    >
    > ## $cat octals.txt
    > ## \006\034abc
    >
    > import re
    >
    > a= "\006\034abc"
    > preg= re.compile(r'([\0-\377]*)')
    > res = preg.search(a)
    > print res.groups()
    >
    > loader = open('./octals.txt', 'r')
    > b = loader.readline()
    > preg= re.compile(r'([\0-\377]*)')
    > res = preg.search(b)
    > print res.groups()
    >
    >
    > RESULTS
    >
    > ('\x06\x1cabc',)
    >
    > ('\\006\\034abc\n',)


    a and b are two entirely different strings. Whatever similarity there
    appears to be is an artifact of Python's treatment of escape sequences -
    only in source code not in an arbitrary file.

    Your literal string:

    >>> s = "\006\034\n"
    >>> s

    '\x06\x1c\n'

    What you read from the text file:

    >>> t = "\\006\\034\n"
    >>> t

    '\\006\\034\n'

    Maybe it helps to learn what's really inside these two strings, so let's
    have a look at the ascii codes:

    >>> map(ord, s)

    [6, 28, 10]
    >>> map(ord, t)

    [92, 48, 48, 54, 92, 48, 51, 52, 10]

    Another example: in source code you can write the newline as

    >>> a = """

    .... """
    >>> b = "\n"
    >>> c = "\x0a"
    >>> d = "\012"
    >>> a,b,c,d

    ('\n', '\n', '\n', '\n')

    But if read from a file \n, \x0a, \012 would just be sequences of two or
    four characters.

    Only when you have understood the above you should return to regular
    expressions. Your regexp always matches the whole string - i. e. is
    redundant (and probably not what you want, but that you would need to
    explain in another post).

    [\0-\377] is just a fancy way of writing "match any character"
    * means "repeat the preceding as often as you want" (including zero times)

    Peter
    Peter Otten, Mar 5, 2004
    #3
  4. Lucas Branca

    Lucas Branca Guest

    -- snip --
    >> ('\x06\x1cabc',) string from source code


    >> ('\\006\\034abc\n',) same string read from file


    --snip --
    > In other words, the backslashes are seen as literal backslashes.
    > readline() does no evaluation of the string, it just copies the
    > characters


    yeah... you are right guys. I have matched two problems
    reg exp are innocents .

    Ok. Let's say so:
    I have to read each line of a file and strip a particular string from there
    (a string containing octal notation too)

    the problem is actually the file.readline() that doesn't return
    what I was expected to.

    pardon my 'newbyeeeee' but is there a way to read a line xy from that file
    and obtaining:

    line xy: \006\034abc

    ('\x06\x1cabc',)

    and not every single char in it like now ?
    ('\\006\\034abc\n',)

    (before I start to reinvent the wheel ....... :) )

    Thank you
    Lucas
    Lucas Branca, Mar 5, 2004
    #4
  5. Lucas Branca

    Jeff Epler Guest

    If you have a string and want to perform backslash-substitution on it,
    use python2.3's "string_escape" codec.

    Two examples:

    >>> s = "\\n"
    >>> s

    '\\n'
    >>> s.decode("string_escape")

    '\n'

    >>> "\x30"

    '0'
    >>> "\\x30"

    '\\x30'
    >>> "\\x30".decode("string_escape")

    '0'

    You can remove the trailing newline this way:
    if s.endswith("\n"): s = s[:-1]

    Jeff
    Jeff Epler, Mar 5, 2004
    #5
  6. Lucas Branca

    Lucas Branca Guest

    Great!
    It's just what I was looking for.
    (...and I read it in "what's new" this morning ......
    .... "boing boing" with my head now ... :) )

    Thank you very much



    "Jeff Epler" <> ha scritto nel messaggio
    news:...
    > If you have a string and want to perform backslash-substitution on it,
    > use python2.3's "string_escape" codec.
    >
    > Two examples:
    >
    > >>> s = "\\n"
    > >>> s

    > '\\n'
    > >>> s.decode("string_escape")

    > '\n'
    >
    > >>> "\x30"

    > '0'
    > >>> "\\x30"

    > '\\x30'
    > >>> "\\x30".decode("string_escape")

    > '0'
    >
    > You can remove the trailing newline this way:
    > if s.endswith("\n"): s = s[:-1]
    >
    > Jeff
    >
    Lucas Branca, Mar 5, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Simon Forman

    Re: Annoying octal notation

    Simon Forman, Aug 20, 2009, in forum: Python
    Replies:
    4
    Views:
    482
  2. James Harris

    Re: Annoying octal notation

    James Harris, Aug 21, 2009, in forum: Python
    Replies:
    1
    Views:
    416
    James Harris
    Aug 23, 2009
  3. Derek Martin

    Re: Annoying octal notation

    Derek Martin, Aug 21, 2009, in forum: Python
    Replies:
    101
    Views:
    2,401
    NevilleDNZ
    Sep 5, 2009
  4. aekalman
    Replies:
    6
    Views:
    130
    Ben Morrow
    Nov 22, 2004
  5. Reg Exp and sentences

    , Sep 30, 2005, in forum: Perl Misc
    Replies:
    8
    Views:
    166
    William James
    Oct 2, 2005
Loading...

Share This Page