append one file to another

B

b83503104

Hi,

I want to append one (huge) file to another (huge) file. The current
way I'm doing it is to do something like:

infile = open (infilename, 'r')
filestr = infile.read()
outfile = open(outfilename, 'a')
outfile.write(filestr)

I wonder if there is a more efficient way doing this?
Thanks.
 
T

Thomas Guettler

Am Tue, 12 Jul 2005 06:47:50 -0700 schrieb (e-mail address removed):
Hi,

I want to append one (huge) file to another (huge) file. The current
way I'm doing it is to do something like:

infile = open (infilename, 'r')
filestr = infile.read()
outfile = open(outfilename, 'a')
outfile.write(filestr)

I wonder if there is a more efficient way doing this?
Thanks.

I guess (don't know), that this is faster:

for line in infile:
outfile.write(line)

At least if this a file with "lines".

If it is a binary file, you could read
N bytes at once: infile.read(N)

Thomas
 
B

b83503104

Thanks for the nice suggestions!

As a side question, you mentioned opening files in binary mode, in case
the code needs to run under Windows or cross-platform. What would
happen otherwise? Is it an issue of big little endian or some other
issue?
 
S

Steven D'Aprano

Hi,

I want to append one (huge) file to another (huge) file.

What do you call huge? What you or I think of as huge is not necessarily
huge to your computer.
The current
way I'm doing it is to do something like:

infile = open (infilename, 'r')
filestr = infile.read()
outfile = open(outfilename, 'a')
outfile.write(filestr)

I wonder if there is a more efficient way doing this?

Why? Is it not working? Is it too slow? Does it crash your computer?

If you have any expectation that you code needs to run under Windows, or
cross-platform, or contains binary data, you should open your files in
binary mode:

infile = open(infilename, 'rb')
outfile = open(outfilename, 'ab')

For raw copying, you should probably use binary mode even if they just
contain text. Better safe than sorry...

Then, if you are concerned that the files really are huge, that is, as big
or bigger than the free memory your computer has, read and write them in
chunks:

data = infile.read(64) # 64 bytes at a time is a bit small...
outfile.write(data)

Instead of 64 bytes, you should pick a more realistic figure, which will
depend on how much free memory your computer has. I suppose a megabyte is
probably reasonable, but you will need to experiment to find out.

Then when you are done, close the files:

infile.close()
outfile.close()

This is not strictly necessary, but it is good practice. If your program
dies, the files may not be closed properly and you could end up losing
data.
 
D

Danny Nodal

Its been a while since I last coded in Python, so please make sure you test
it before trying it so you don't clobber your existing file. Although it may
not be more effecient than what you are doing now or has been suggested
already, it sure cuts down on the typing.

open(outfilename,'a').write(open(infilename).read())

Regards.
 
S

Steven D'Aprano

Dear me, replying to myself twice in one day...

Then, if you are concerned that the files really are huge, that is, as big
or bigger than the free memory your computer has, read and write them in
chunks:

data = infile.read(64) # 64 bytes at a time is a bit small...
outfile.write(data)

Sorry, that should be in a loop:

data = "anything"
while data:
data = infile.read(64) # data will be empty when the file is read
outfile.write(data)
 
G

Grant Edwards

As a side question, you mentioned opening files in binary
mode, in case the code needs to run under Windows or
cross-platform. What would happen otherwise? Is it an issue
of big little endian or some other issue?

The end-of-line characters might get converted -- even if
they're not really "end-of-line" characters in the file in
question.
 
S

Steven D'Aprano

Thanks for the nice suggestions!

As a side question, you mentioned opening files in binary mode, in case
the code needs to run under Windows or cross-platform. What would
happen otherwise? Is it an issue of big little endian or some other
issue?

No, nothing to do with big and little endian issues. It is all to do with
the line delimiter, and possibly the end-of-file marker.

Windows uses '\r\n' as the line delimiter for text files. (Or is it
'\n\r'? I always forget...)

Old-style Macintosh used '\r', and (almost) everything else, including new
Macs running OS X, uses '\n'.

If you open files in text mode, there can be complications due to the
different line endings. To be perfectly frank, I only use Python under
Linux, so I don't have the foggiest idea of just what Bad Things can
happen. I know it is a big problem when using some FTP programs, which
have a tendency to destroy binary programs if you upload/download them in
text mode.

I just did some experiments here, and can't get anything bad to happen.
But whatever the problem is, my grand-pappy always told me, open the
danged file in binary mode and you can't go wrong.

*wink*

I have found some discussions here:

http://python.active-venture.com/tut/node9.html

"Windows makes a distinction between text and binary files; the
end-of-line characters in text files are automatically altered slightly
when data is read or written. This behind-the-scenes modification to file
data is fine for ASCII text files, but it'll corrupt binary data like that
in JPEGs or .EXE files. Be very careful to use binary mode when reading
and writing such files."

and here:

http://zephyrfalcon.org/labs/python_pitfalls.html

This website recommends:

"Solution: Use the correct flags -- 'r' for text mode (even on Unix), 'rb'
for binary mode."

but I've never had any problems using 'rb' for text files under Linux.

I'm also told that Windows uses ctrl-Z as the end-of-file marker, and if
it finds that character in the middle of a text file, it will assume the
file has finished and stop reading. But only in text mode, not binary. I
don't think that's a problem for Linux.
 
J

John Machin

Hi,

I want to append one (huge) file to another (huge) file. The current
way I'm doing it is to do something like:

infile = open (infilename, 'r')
filestr = infile.read()
outfile = open(outfilename, 'a')
outfile.write(filestr)

I wonder if there is a more efficient way doing this?

Don't wonder, like the ancient philosophers; be an empiricist :)


If the files are truly huge, you run the risk of exhausting real memory
and having to swap.

Try this:
Having opened the files,

for line in infile:
outfile.write(line)

Otherwise look at the docs for read the method and check out the "size"
argument.

General warnings: (1) If you want to be portable, consider text/binary
differences. (2) Consider what to do if the last line in <outfilename>
is not terminated.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top