Raw strings as input from File?

Discussion in 'Python' started by utabintarbo, Nov 24, 2009.

  1. utabintarbo

    utabintarbo Guest

    I have a log file with full Windows paths on a line. eg:
    K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
    \somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ; 1259006416

    As I try to pull in the line and process it, python changes the "\10"
    to a "\x08". This is before I can do anything with it. Is there a way
    to specify that incoming lines (say, when using .readlines() ) should
    be treated as raw strings?

    TIA
     
    utabintarbo, Nov 24, 2009
    #1
    1. Advertising

  2. utabintarbo

    MRAB Guest

    utabintarbo wrote:
    > I have a log file with full Windows paths on a line. eg:
    > K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
    > \somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ; 1259006416
    >
    > As I try to pull in the line and process it, python changes the "\10"
    > to a "\x08". This is before I can do anything with it. Is there a way
    > to specify that incoming lines (say, when using .readlines() ) should
    > be treated as raw strings?
    >

    ..readlines() doesn't change the "\10" in a file to "\x08" in the string
    it returns.

    Could you provide some code which shows your problem?
     
    MRAB, Nov 24, 2009
    #2
    1. Advertising

  3. utabintarbo wrote:
    > I have a log file with full Windows paths on a line. eg:
    > K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
    > \somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ; 1259006416
    >
    > As I try to pull in the line and process it, python changes the "\10"
    > to a "\x08".


    Python does no such thing. When Python reads bytes from a file, it
    doesn't interpret or change those bytes in any way. Either there is
    something else going on here that you're not telling us, or the file
    doesn't contain what you think it contains. Please show us the exact
    code you're using to process this file, and show us the exact contents
    of the file you're processing.

    --
    Carsten Haese
    http://informixdb.sourceforge.net
     
    Carsten Haese, Nov 24, 2009
    #3
  4. utabintarbo

    utabintarbo Guest

    On Nov 24, 3:27 pm, MRAB <> wrote:
    >
    > .readlines() doesn't change the "\10" in a file to "\x08" in the string
    > it returns.
    >
    > Could you provide some code which shows your problem?


    Here is the code block I have so far:
    for l in open(CONTENTS, 'r').readlines():
    f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
    if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
    shutil.rmtree(os.path.join(DIR1,f))
    if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
    shutil.rmtree(os.path.join(DIR2,f))

    I am trying to find dirs with the basename of the initial path less
    the extension in both DIR1 and DIR2

    A minimally obfuscated line from the log file:
    K:\sm\SMI\des\RS\Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz->/arch_m1/
    smi/des/RS/Pat/10DJ/121.D5-30\1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    11/24/2009 08:16:42 ; 1259068602

    What I get from the debugger/python shell:
    'K:\\sm\\SMI\\des\\RS\\Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz->/arch_m1/
    smi/des/RS/Pat/10DJ/121.D5-30/1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    11/24/2009 08:16:42 ; 1259068602'

    TIA
     
    utabintarbo, Nov 24, 2009
    #4
  5. utabintarbo

    Jon Clements Guest

    On Nov 24, 9:20 pm, utabintarbo <> wrote:
    > On Nov 24, 3:27 pm, MRAB <> wrote:
    >
    >
    >
    > > .readlines() doesn't change the "\10" in a file to "\x08" in the string
    > > it returns.

    >
    > > Could you provide some code which shows your problem?

    >
    > Here is the code block I have so far:
    > for l in open(CONTENTS, 'r').readlines():
    >     f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
    >     if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
    >         shutil.rmtree(os.path.join(DIR1,f))
    >         if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
    >                 shutil.rmtree(os.path.join(DIR2,f))
    >
    > I am trying to find dirs with the basename of the initial path less
    > the extension in both DIR1 and DIR2
    >
    > A minimally obfuscated line from the log file:
    > K:\sm\SMI\des\RS\Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz->/arch_m1/
    > smi/des/RS/Pat/10DJ/121.D5-30\1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    > 11/24/2009 08:16:42 ; 1259068602
    >
    > What I get from the debugger/python shell:
    > 'K:\\sm\\SMI\\des\\RS\\Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz->/arch_m1/
    > smi/des/RS/Pat/10DJ/121.D5-30/1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    > 11/24/2009 08:16:42 ; 1259068602'
    >
    > TIA


    jon@jon-desktop:~/pytest$ cat log.txt
    K:\sm\SMI\des\RS\Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz->/arch_m1/
    smi/des/RS/Pat/10DJ/121.D5-30\1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    11/24/2009 08:16:42 ; 1259068602

    >>> log = open('/home/jon/pytest/log.txt', 'r').readlines()
    >>> log

    ['K:\\sm\\SMI\\des\\RS\\Pat\\10DJ\\121.D5-30\\1215B-B-D5-BSHOE-MM.smz-
    >/arch_m1/\n', 'smi/des/RS/Pat/10DJ/121.D5-30\\1215B-B-D5-BSHOE-

    MM.smz ; t9480rc ;\n', '11/24/2009 08:16:42 ; 1259068602\n']

    See -- it's not doing anything :)

    Although, "Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz" and "Pat
    \x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz" seem to be fairly different -- are
    you sure you're posting the correct output!?

    Jon.
     
    Jon Clements, Nov 24, 2009
    #5
  6. utabintarbo

    Jon Clements Guest

    On Nov 24, 9:50 pm, Jon Clements <> wrote:
    > On Nov 24, 9:20 pm, utabintarbo <> wrote:

    [snip]
    > Although, "Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz" and "Pat
    > \x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz" seem to be fairly different -- are
    > you sure you're posting the correct output!?
    >


    Ugh... let's try that...

    Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz
    Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz

    Jon.
     
    Jon Clements, Nov 24, 2009
    #6
  7. utabintarbo

    Terry Reedy Guest

    utabintarbo wrote:
    > I have a log file with full Windows paths on a line. eg:
    > K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
    > \somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ; 1259006416
    >
    > As I try to pull in the line and process it, python changes the "\10"
    > to a "\x08".


    This should only happen if you paste the test into your .py file as a
    string literal.

    > This is before I can do anything with it. Is there a way
    > to specify that incoming lines (say, when using .readlines() ) should
    > be treated as raw strings?


    Or if you use execfile or compile and ask Python to interprete the input
    as code.

    There are no raw strings, only raw string code literals marked with an
    'r' prefix for raw processing of the quoted text.
     
    Terry Reedy, Nov 24, 2009
    #7
  8. On 2009-11-25, Rhodri James <> wrote:
    > On Tue, 24 Nov 2009 21:20:25 -0000, utabintarbo <>
    > wrote:
    >
    >> On Nov 24, 3:27 pm, MRAB <> wrote:
    >>>
    >>> .readlines() doesn't change the "\10" in a file to "\x08" in the string
    >>> it returns.
    >>>
    >>> Could you provide some code which shows your problem?

    >>
    >> Here is the code block I have so far:
    >> for l in open(CONTENTS, 'r').readlines():
    >> f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
    >> if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
    >> shutil.rmtree(os.path.join(DIR1,f))
    >> if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
    >> shutil.rmtree(os.path.join(DIR2,f))

    >
    > Ahem. This doesn't run. os.path.split() returns a tuple, and calling
    > os.path.splitext() doesn't work. Given that replacing the entire loop
    > contents with "print l" readily disproves your assertion, I suggest you
    > cut and paste actual code if you want an answer. Otherwise we're just
    > going to keep saying "No, it doesn't", because no, it doesn't.


    It's, um, rewarding to see my recent set of instructions being
    followed.

    >> A minimally obfuscated line from the log file:
    >> K:\sm\SMI\des\RS\Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz->/arch_m1/
    >> smi/des/RS/Pat/10DJ/121.D5-30\1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    >> 11/24/2009 08:16:42 ; 1259068602
    >>
    >> What I get from the debugger/python shell:
    >> 'K:\\sm\\SMI\\des\\RS\\Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz->/arch_m1/
    >> smi/des/RS/Pat/10DJ/121.D5-30/1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    >> 11/24/2009 08:16:42 ; 1259068602'

    >
    > When you do what, exactly?


    ;)

    --
    Grant
     
    Grant Edwards, Nov 25, 2009
    #8
  9. On Tue, 24 Nov 2009 13:20:25 -0800 (PST), utabintarbo
    <> declaimed the following in
    gmane.comp.python.general:

    >
    > Here is the code block I have so far:
    > for l in open(CONTENTS, 'r').readlines():
    > f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
    > if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
    > shutil.rmtree(os.path.join(DIR1,f))
    > if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
    > shutil.rmtree(os.path.join(DIR2,f))
    >
    > I am trying to find dirs with the basename of the initial path less
    > the extension in both DIR1 and DIR2
    >

    And just what are DIR1 and DIR2?

    So far as I can tell, the likely position of your problem is that
    THEY are the source of the problem, and you are joining them to a
    perfectly valid item.
    --
    Wulfraed Dennis Lee Bieber KD6MOG
    HTTP://wlfraed.home.netcom.com/
     
    Dennis Lee Bieber, Nov 25, 2009
    #9
  10. utabintarbo

    Jon Clements Guest

    On Nov 25, 3:31 am, Grant Edwards <> wrote:
    > On 2009-11-25, Rhodri James <> wrote:
    >
    >
    >
    > > On Tue, 24 Nov 2009 21:20:25 -0000, utabintarbo <>  
    > > wrote:

    >
    > >> On Nov 24, 3:27 pm, MRAB <> wrote:

    >
    > >>> .readlines() doesn't change the "\10" in a file to "\x08" in the string
    > >>> it returns.

    >
    > >>> Could you provide some code which shows your problem?

    >
    > >> Here is the code block I have so far:
    > >> for l in open(CONTENTS, 'r').readlines():
    > >>     f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
    > >>     if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
    > >>         shutil.rmtree(os.path.join(DIR1,f))
    > >>         if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
    > >>             shutil.rmtree(os.path.join(DIR2,f))

    >
    > > Ahem.  This doesn't run.  os.path.split() returns a tuple, and calling  
    > > os.path.splitext() doesn't work.  Given that replacing the entire loop  
    > > contents with "print l" readily disproves your assertion, I suggest you  
    > > cut and paste actual code if you want an answer.  Otherwise we're just  
    > > going to keep saying "No, it doesn't", because no, it doesn't.

    >
    > It's, um, rewarding to see my recent set of instructions being
    > followed.
    >
    > >> A minimally obfuscated line from the log file:
    > >> K:\sm\SMI\des\RS\Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz->/arch_m1/
    > >> smi/des/RS/Pat/10DJ/121.D5-30\1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    > >> 11/24/2009 08:16:42 ; 1259068602

    >
    > >> What I get from the debugger/python shell:
    > >> 'K:\\sm\\SMI\\des\\RS\\Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz->/arch_m1/
    > >> smi/des/RS/Pat/10DJ/121.D5-30/1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
    > >> 11/24/2009 08:16:42 ; 1259068602'

    >
    > > When you do what, exactly?

    >
    > ;)
    >
    > --
    > Grant


    Can't remember if this thread counts as "Edwards' Law 5[b|c]" :)

    I'm sure I pinned it up on my wall somewhere, right next to
    http://imgs.xkcd.com/comics/tech_support_cheat_sheet.png

    Jon.
     
    Jon Clements, Nov 25, 2009
    #10
  11. utabintarbo

    rzed Guest

    utabintarbo <> wrote in
    news:
    om:

    > I have a log file with full Windows paths on a line. eg:
    > K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
    > \somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ;
    > 1259006416
    >
    > As I try to pull in the line and process it, python changes the
    > "\10" to a "\x08". This is before I can do anything with it. Is
    > there a way to specify that incoming lines (say, when using
    > .readlines() ) should be treated as raw strings?
    >
    > TIA


    Despite all the ragging you're getting, it is a pretty flakey thing
    that Python does in this context:
    (from a python shell)
    >>> x = '\1'
    >>> x

    '\x01'
    >>> x = '\10'
    >>> x

    '\x08'

    If you are pasting your string as a literal, then maybe it does the
    same. It still seems weird to me. I can accept that '\1' means x01,
    but \10 seems to be expanded to \010 and then translated from octal
    to get to x08. That's just strange. I'm sure it's documented
    somewhere, but it's not easy to search for.

    Oh, and this:
    >>> '\7'

    '\x07'
    >>> '\70'

    '8'
    .... is realy odd.

    --
    rzed
     
    rzed, Dec 2, 2009
    #11
  12. utabintarbo

    Dave Angel Guest

    rzed wrote:
    > utabintarbo <> wrote in
    > news:
    > om:
    >
    >
    >> I have a log file with full Windows paths on a line. eg:
    >> K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
    >> \somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ;
    >> 1259006416
    >>
    >> As I try to pull in the line and process it, python changes the
    >> "\10" to a "\x08". This is before I can do anything with it. Is
    >> there a way to specify that incoming lines (say, when using
    >> .readlines() ) should be treated as raw strings?
    >>
    >> TIA
    >>

    >
    > Despite all the ragging you're getting, it is a pretty flakey thing
    >

    When the OP specified readline(), which does *not* behave this way, he
    probably deserved what you call "ragging." The backslash escaping is
    for string literals, which are in code, not in data files.

    In any case, there's a big difference between surprising (to you), and
    flakey.
    > that Python does in this context:
    > (from a python shell)
    >
    >>>> x = '\1'
    >>>> x
    >>>>

    > '\x01'
    >
    >>>> x = '\10'
    >>>> x
    >>>>

    > '\x08'
    >
    > If you are pasting your string as a literal, then maybe it does the
    > same. It still seems weird to me. I can accept that '\1' means x01,
    > but \10 seems to be expanded to \010 and then translated from octal
    > to get to x08. That's just strange. I'm sure it's documented
    > somewhere, but it's not easy to search for.
    >
    >

    Check in the help for "escape Strings". It's documented (in vers. 2.6,
    anyway) in a nice chart that backslash followed by 3 digits, is
    interpreted as octal. I don't like it much either, but it's inherited
    from C, which has worked that way for 30+ years.

    Online, see
    http://www.python.org/doc/2.6.4/reference/lexical_analysis.html, and
    look in section 2.4.1 for the chart.
    > Oh, and this:
    >
    >>>> '\7'
    >>>>

    > '\x07'
    >
    >>>> '\70'
    >>>>

    > '8'
    > ... is realy odd.
    >
    >

    Octal 70 is hex 38 (or decimal 56), which is the character '8'.

    DaveA
     
    Dave Angel, Dec 2, 2009
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Gaurav Gupta
    Replies:
    0
    Views:
    297
    Gaurav Gupta
    Oct 20, 2003
  2. Bill Janssen
    Replies:
    2
    Views:
    782
    Michael Hudson
    Mar 2, 2004
  3. Ben

    Strings, Strings and Damned Strings

    Ben, Jun 22, 2006, in forum: C Programming
    Replies:
    14
    Views:
    799
    Malcolm
    Jun 24, 2006
  4. Nagarajan
    Replies:
    4
    Views:
    328
    Nagarajan
    Aug 23, 2007
  5. Chris Carlen
    Replies:
    1
    Views:
    635
    Gabriel Genellina
    Sep 18, 2007
Loading...

Share This Page