Raw strings as input from File?

U

utabintarbo

I have a log file with full Windows paths on a line. eg:
K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
\somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ; 1259006416

As I try to pull in the line and process it, python changes the "\10"
to a "\x08". This is before I can do anything with it. Is there a way
to specify that incoming lines (say, when using .readlines() ) should
be treated as raw strings?

TIA
 
M

MRAB

utabintarbo said:
I have a log file with full Windows paths on a line. eg:
K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
\somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ; 1259006416

As I try to pull in the line and process it, python changes the "\10"
to a "\x08". This is before I can do anything with it. Is there a way
to specify that incoming lines (say, when using .readlines() ) should
be treated as raw strings?
..readlines() doesn't change the "\10" in a file to "\x08" in the string
it returns.

Could you provide some code which shows your problem?
 
C

Carsten Haese

utabintarbo said:
I have a log file with full Windows paths on a line. eg:
K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
\somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ; 1259006416

As I try to pull in the line and process it, python changes the "\10"
to a "\x08".

Python does no such thing. When Python reads bytes from a file, it
doesn't interpret or change those bytes in any way. Either there is
something else going on here that you're not telling us, or the file
doesn't contain what you think it contains. Please show us the exact
code you're using to process this file, and show us the exact contents
of the file you're processing.
 
U

utabintarbo

.readlines() doesn't change the "\10" in a file to "\x08" in the string
it returns.

Could you provide some code which shows your problem?

Here is the code block I have so far:
for l in open(CONTENTS, 'r').readlines():
f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
shutil.rmtree(os.path.join(DIR1,f))
if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
shutil.rmtree(os.path.join(DIR2,f))

I am trying to find dirs with the basename of the initial path less
the extension in both DIR1 and DIR2

A minimally obfuscated line from the log file:
K:\sm\SMI\des\RS\Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz->/arch_m1/
smi/des/RS/Pat/10DJ/121.D5-30\1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
11/24/2009 08:16:42 ; 1259068602

What I get from the debugger/python shell:
'K:\\sm\\SMI\\des\\RS\\Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz->/arch_m1/
smi/des/RS/Pat/10DJ/121.D5-30/1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
11/24/2009 08:16:42 ; 1259068602'

TIA
 
J

Jon Clements

.readlines() doesn't change the "\10" in a file to "\x08" in the string
it returns.
Could you provide some code which shows your problem?

Here is the code block I have so far:
for l in open(CONTENTS, 'r').readlines():
    f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
    if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
        shutil.rmtree(os.path.join(DIR1,f))
        if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
                shutil.rmtree(os.path.join(DIR2,f))

I am trying to find dirs with the basename of the initial path less
the extension in both DIR1 and DIR2

A minimally obfuscated line from the log file:
K:\sm\SMI\des\RS\Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz->/arch_m1/
smi/des/RS/Pat/10DJ/121.D5-30\1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
11/24/2009 08:16:42 ; 1259068602

What I get from the debugger/python shell:
'K:\\sm\\SMI\\des\\RS\\Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz->/arch_m1/
smi/des/RS/Pat/10DJ/121.D5-30/1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
11/24/2009 08:16:42 ; 1259068602'

TIA

jon@jon-desktop:~/pytest$ cat log.txt
K:\sm\SMI\des\RS\Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz->/arch_m1/
smi/des/RS/Pat/10DJ/121.D5-30\1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
11/24/2009 08:16:42 ; 1259068602
['K:\\sm\\SMI\\des\\RS\\Pat\\10DJ\\121.D5-30\\1215B-B-D5-BSHOE-MM.smz-
/arch_m1/\n', 'smi/des/RS/Pat/10DJ/121.D5-30\\1215B-B-D5-BSHOE-
MM.smz ; t9480rc ;\n', '11/24/2009 08:16:42 ; 1259068602\n']

See -- it's not doing anything :)

Although, "Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz" and "Pat
\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz" seem to be fairly different -- are
you sure you're posting the correct output!?

Jon.
 
J

Jon Clements

Although, "Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz" and "Pat
\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz" seem to be fairly different -- are
you sure you're posting the correct output!?

Ugh... let's try that...

Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz
Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz

Jon.
 
T

Terry Reedy

utabintarbo said:
I have a log file with full Windows paths on a line. eg:
K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
\somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ; 1259006416

As I try to pull in the line and process it, python changes the "\10"
to a "\x08".

This should only happen if you paste the test into your .py file as a
string literal.
This is before I can do anything with it. Is there a way
to specify that incoming lines (say, when using .readlines() ) should
be treated as raw strings?

Or if you use execfile or compile and ask Python to interprete the input
as code.

There are no raw strings, only raw string code literals marked with an
'r' prefix for raw processing of the quoted text.
 
G

Grant Edwards

.readlines() doesn't change the "\10" in a file to "\x08" in the string
it returns.

Could you provide some code which shows your problem?

Here is the code block I have so far:
for l in open(CONTENTS, 'r').readlines():
f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
shutil.rmtree(os.path.join(DIR1,f))
if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
shutil.rmtree(os.path.join(DIR2,f))

Ahem. This doesn't run. os.path.split() returns a tuple, and calling
os.path.splitext() doesn't work. Given that replacing the entire loop
contents with "print l" readily disproves your assertion, I suggest you
cut and paste actual code if you want an answer. Otherwise we're just
going to keep saying "No, it doesn't", because no, it doesn't.

It's, um, rewarding to see my recent set of instructions being
followed.
When you do what, exactly?

;)
 
D

Dennis Lee Bieber

Here is the code block I have so far:
for l in open(CONTENTS, 'r').readlines():
f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
shutil.rmtree(os.path.join(DIR1,f))
if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
shutil.rmtree(os.path.join(DIR2,f))

I am trying to find dirs with the basename of the initial path less
the extension in both DIR1 and DIR2
And just what are DIR1 and DIR2?

So far as I can tell, the likely position of your problem is that
THEY are the source of the problem, and you are joining them to a
perfectly valid item.
 
J

Jon Clements

.readlines() doesn't change the "\10" in a file to "\x08" in the string
it returns.
Could you provide some code which shows your problem?
Here is the code block I have so far:
for l in open(CONTENTS, 'r').readlines():
    f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
    if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
        shutil.rmtree(os.path.join(DIR1,f))
        if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
            shutil.rmtree(os.path.join(DIR2,f))
Ahem.  This doesn't run.  os.path.split() returns a tuple, and calling  
os.path.splitext() doesn't work.  Given that replacing the entire loop  
contents with "print l" readily disproves your assertion, I suggest you  
cut and paste actual code if you want an answer.  Otherwise we're just  
going to keep saying "No, it doesn't", because no, it doesn't.

It's, um, rewarding to see my recent set of instructions being
followed.
When you do what, exactly?

;)

Can't remember if this thread counts as "Edwards' Law 5[b|c]" :)

I'm sure I pinned it up on my wall somewhere, right next to
http://imgs.xkcd.com/comics/tech_support_cheat_sheet.png

Jon.
 
R

rzed

om:
I have a log file with full Windows paths on a line. eg:
K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
\somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ;
1259006416

As I try to pull in the line and process it, python changes the
"\10" to a "\x08". This is before I can do anything with it. Is
there a way to specify that incoming lines (say, when using
.readlines() ) should be treated as raw strings?

TIA

Despite all the ragging you're getting, it is a pretty flakey thing
that Python does in this context:
(from a python shell)'\x08'

If you are pasting your string as a literal, then maybe it does the
same. It still seems weird to me. I can accept that '\1' means x01,
but \10 seems to be expanded to \010 and then translated from octal
to get to x08. That's just strange. I'm sure it's documented
somewhere, but it's not easy to search for.

Oh, and this:'8'
.... is realy odd.
 
D

Dave Angel

rzed said:
om:



Despite all the ragging you're getting, it is a pretty flakey thing
When the OP specified readline(), which does *not* behave this way, he
probably deserved what you call "ragging." The backslash escaping is
for string literals, which are in code, not in data files.

In any case, there's a big difference between surprising (to you), and
flakey.
that Python does in this context:
(from a python shell)

'\x08'

If you are pasting your string as a literal, then maybe it does the
same. It still seems weird to me. I can accept that '\1' means x01,
but \10 seems to be expanded to \010 and then translated from octal
to get to x08. That's just strange. I'm sure it's documented
somewhere, but it's not easy to search for.
Check in the help for "escape Strings". It's documented (in vers. 2.6,
anyway) in a nice chart that backslash followed by 3 digits, is
interpreted as octal. I don't like it much either, but it's inherited
from C, which has worked that way for 30+ years.

Online, see
http://www.python.org/doc/2.6.4/reference/lexical_analysis.html, and
look in section 2.4.1 for the chart.
Oh, and this:

'8'
... is realy odd.
Octal 70 is hex 38 (or decimal 56), which is the character '8'.

DaveA
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,767
Messages
2,569,571
Members
45,045
Latest member
DRCM

Latest Threads

Top