File Object behavior

  • Thread starter Michael Castleton
  • Start date
M

Michael Castleton

When I open a csv or txt file with:

infile = open(sys.argv[1],'rb').readlines()
or
infile = open(sys.argv[1],'rb').read()

and then look at the first few lines of the file there is a carriage return
+
line feed at the end of each line - \r\n
This is fine and somewhat expected. My problem comes from then writing
infile out to a new file with:

outfile = open(sys.argv[2],'w')
outfile.writelines(infile)
outfile.close()

at which point an additional carriage return is inserted to the end of each
line - \r\r\n
The same behavior occurs with outfile.write(infile) also. I am doing no
processing
between reading the input and writing to the output.
Is this expected behavior? The file.writelines() documentation says that it
doesn't add line separators. Is adding a carriage return something
different?
At this point I have to filter out the additional carriage return which
seems like
extra and unnecessary effort.
I am using Python 2.4 on Windows XP sp2.
Can anybody help me understand this situation?

Thanks
 
S

Steven Bethard

Michael said:
When I open a csv or txt file with:

infile = open(sys.argv[1],'rb').readlines()
or
infile = open(sys.argv[1],'rb').read()

and then look at the first few lines of the file there is a carriage return
+
line feed at the end of each line - \r\n
This is fine and somewhat expected. My problem comes from then writing
infile out to a new file with:

outfile = open(sys.argv[2],'w')
outfile.writelines(infile)
outfile.close()

at which point an additional carriage return is inserted to the end of each
line - \r\r\n

Maybe because you're reading the file as binary ('rb') but writing it as
text ('w')::
'hello\r\n'

Looks like if you match your writes and reads everything works out fine.

STeVe
 
7

7stud

When I open a csv or txt file with:

infile = open(sys.argv[1],'rb').readlines()
or
infile = open(sys.argv[1],'rb').read()

and then look at the first few lines of the file there is a carriage return
+
line feed at the end of each line - \r\n
This is fine and somewhat expected. My problem comes from then writing
infile out to a new file with:

outfile = open(sys.argv[2],'w')
outfile.writelines(infile)
outfile.close()

at which point an additional carriage return is inserted to the end of each
line - \r\r\n
The same behavior occurs with outfile.write(infile) also. I am doing no
processing
between reading the input and writing to the output.
The file.writelines() documentation says that it
doesn't add line separators. Is adding a carriage return something
different?
At this point I have to filter out the additional carriage return which
seems like
extra and unnecessary effort.
I am using Python 2.4 on Windows XP sp2.
Can anybody help me understand this situation?

Thanks
The file.writelines() documentation says that it
doesn't add line separators. Is adding a carriage return something
different?
No.

Is this expected behavior?

According to Python in a Nutshell(p. 217) it is. On windows, in text
mode, when you write a \n to a file, the \n is converted to the system
specific newline (which is specified in os.linesep). For windows, a
newline is \r\n. Conversely, on windows, in text mode, when you read
a \r\n newline from a file, it is converted to a \n.
 
7

7stud

The file.writelines() documentation says that it
doesn't add line separators. Is adding a carriage return something
different?
No.

Is this expected behavior?

According to Python in a Nutshell(p. 217), it is. On windows, in
text
mode, when you write a \n to a file, the \n is converted to the
system
specific newline (which is specified in os.linesep). For windows, a
newline is \r\n. Conversely, on windows, in text mode, when you read
a \r\n newline from a file, it is converted to a \n.

I forgot to add that when you read or write in binary mode, no
conversion takes place. So, if you read \r\n from the file, your
input will contain the \r\n; and if you write \r\n to the file, then
the file will contain \r\n.
 
M

Michael Castleton

Thank you to both Steve and 7stud. You were right on with binary flag!
I thought I had tried everything...

Mike
 
B

Bruno Desthuilliers

Michael Castleton a écrit :
When I open a csv or txt file with:

infile = open(sys.argv[1],'rb').readlines()
or
infile = open(sys.argv[1],'rb').read()

and then look at the first few lines of the file there is a carriage return
+
line feed at the end of each line - \r\n

Is there any reason you open your text files in binary mode ?

Unless you're using the csv module (which requires such a mode - but
then you don't care since you're not working with the raw data
yourself), you should consider opening your files in text mode. This
should solve your problem (if not, then you have a problem with
universal newlines support in your Python install).

HTH
 
M

Michael Castleton

Bruno said:
Michael Castleton a écrit :
When I open a csv or txt file with:

infile = open(sys.argv[1],'rb').readlines()
or
infile = open(sys.argv[1],'rb').read()

and then look at the first few lines of the file there is a carriage
return
+
line feed at the end of each line - \r\n

Is there any reason you open your text files in binary mode ?

Unless you're using the csv module (which requires such a mode - but
then you don't care since you're not working with the raw data
yourself), you should consider opening your files in text mode. This
should solve your problem (if not, then you have a problem with
universal newlines support in your Python install).

HTH


Bruno,
No particular reason in this case. It was probably as a holdover from using
the csv module in the past. I'm wondering though if using binary on very
large
files (>100Mb) would save any processing time - no conversion to system
newline?
What do you think?
Thanks.
 
B

Bruno Desthuilliers

Michael Castleton a écrit :
Bruno said:
Michael Castleton a écrit :
When I open a csv or txt file with:

infile = open(sys.argv[1],'rb').readlines()
or
infile = open(sys.argv[1],'rb').read()

and then look at the first few lines of the file there is a carriage
return
+
line feed at the end of each line - \r\n
Is there any reason you open your text files in binary mode ?
(snip)


Bruno,
No particular reason in this case. It was probably as a holdover from using
the csv module in the past. I'm wondering though if using binary on very
large
files (>100Mb) would save any processing time - no conversion to system
newline?
What do you think?

I think that premature optimization is the root of all evil.

You'll have to do the processing by yourself then, and I doubt it'll be
as fast as the C-coded builtin newline processing.

Anyway, you can easily check it out by yourself - Python has timeit (for
micro-benchmarks) and a profiler.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,430
Messages
2,571,676
Members
48,796
Latest member
Greg L.

Latest Threads

Top